harriyott.com

Refactoring with actions

Consider the following code, which fetches and parses some data (from a random gist on the web) into properties on an object:

public void Parse(string fileName)
{
    var userAccounts = new Collection();

    var html = new HtmlDocument();
    html.Load(fileName);
    var document = html.DocumentNode;
    var users = document.QuerySelectorAll("user");
    foreach (var userNode in users)
    {
        var userAccount = new UserAccount();

        var node = userNode.ChildNodes["name"];
        if (node != null && !string.IsNullOrEmpty(node.InnerText))
        {
            userAccount.Name = node.InnerText;
        }
        node = userNode.ChildNodes["company"];
        if (node != null && !string.IsNullOrEmpty(node.InnerText))
        {
            userAccount.Company = node.InnerText;
        }
        node = userNode.ChildNodes["blog"];
        if (node != null && !string.IsNullOrEmpty(node.InnerText))
        {
            userAccount.Blog = node.InnerText;
        }
        node = userNode.ChildNodes["email"];
        if (node != null && !string.IsNullOrEmpty(node.InnerText))
        {
            userAccount.Email = node.InnerText;
        }
        node = userNode.ChildNodes["id"];
        if (node != null && !string.IsNullOrEmpty(node.InnerText))
        {
            userAccount.Id = int.Parse(node.InnerText);
        }
        node = userNode.ChildNodes["followers-count"];
        if (node != null && !string.IsNullOrEmpty(node.InnerText))
        {
            userAccount.Followers = int.Parse(node.InnerText);
        }
        userAccounts.Add(userAccount);
    }
}

There's lots of duplicated code, and the first target for refactoring are the longest lines, which can be extracted into a method to check whether a child node is valid:

private static bool IsNodeValid(HtmlNode node)
{
    return node != null && !string.IsNullOrEmpty(node.InnerText);
}

Better, but there's still scope to reduce duplication.

...
node = userNode.ChildNodes["email"];
if (IsNodeValid(node))
{
    userAccount.Email = node.InnerText;
}
node = userNode.ChildNodes["id"];
if (IsNodeValid(node))
{
    userAccount.Id = int.Parse(node.InnerText);
}
...

Ideally, the blocks of five lines would be refactored into a new method. Because we're accessing different properties of the userAccount, we can't pass them in directly, nor can we pass them in as references, as they're not variables. Also, some of the properies are ints, so we'd need two methods.

We can, however, pass in an action to the new method, which performs the assignment (the only unique code in the five lines):

private void CopyNodeData(HtmlNode userNode, string name, Action func)
{
    var node = userNode.ChildNodes[name];
    if (node != null && !string.IsNullOrEmpty(node.InnerText))
    {
        func(node.InnerText);
    }
}

This now means each assignment can now take a single line:

CopyNodeData(userNode, "name", s => userAccount.Name = s);
CopyNodeData(userNode, "company", s => userAccount.Company = s);
CopyNodeData(userNode, "blog", s => userAccount.Blog = s);
CopyNodeData(userNode, "email", s => userAccount.Email = s);
CopyNodeData(userNode, "id", s => userAccount.Id = int.Parse(s));
CopyNodeData(userNode, "followers-count", s => userAccount.Followers = int.Parse(s));

The third parameter is a lambda that will be called only if the node is valid. We've removed lots of duplication, making the code more readable.

26 April 2013