The power of IEnumerable(T)

When working with collections of data to which a set of rules, filters or data transformations has to be applied I often see implementations which are constructing one list after another to hold data between different workflow steps. Those solutions can be inelegant, make code hard to read and consume unnecessary memory. Those issues can be easily addressed with help on IEnumerable(T) interface and extension methods.

First, imagine scenario in which we load data from external source, lets say a CSV file provided by customer. The data can be expressed by following entity:

    public class Entity
    {
        public int Id { get; set; }
        public int CategorydId { get; set; }
        public int UserId { get; set; }
        public DateTime Date { get; set; }
        public string Name { get; set; }
        public decimal Amount { get; set; }
    }

Now, before we can enter it to the system we need to normalise value in Name property. For this task we are using an implementation of INameCanonicalisator. Also we have to apply tax to the Amount. This calculation is done by implementation of IAmountTaxCalculator. Below are definitions of those interfaces:

    public interface INameCanonicalisator
    {
        string ToCanonicalForm(string name);
    }

    public interface IAmountTaxCalculator
    {
        decimal CalculateTax(decimal value);
    }

To make our example more interesting lets also assume that we are interested only in entries where normalised name starts with letter “a” and amount after taxes is equal to or greater then 50000. One way of implementing above requirements is as follow:

        private void ProcessData(IEnumerable<Entity> entities)
        {
            var entitiesWithCanonicalName = new List<Entity>();
            foreach (var entity in entities)
            {
                entity.Name = nameCanonicalisator.ToCanonicalForm(entity.Name);
                if (entity.Name.StartsWith("a"))
                    entitiesWithCanonicalName.Add(entity);
            }

            var entitiesWithRecalculatedTax = new List<Entity>();
            foreach (var entity in entitiesWithCanonicalName)
            {
                entity.Value = taxCalculator.CalculateTax(entity.Value);
                if (entity.Value >= 50000)
                    entitiesWithRecalculatedTax.Add(entity);
            }

            foreach (var entity in entitiesWithRecalculatedTax)
            {
                // code for saving an entity
            }
        }

First problem with above approach is that the method has multiple responsibilities (normalising name, calculating taxes, filtering and saving data). Other problem is related to memory consumption. Every new list has to create an array to hold entities and with big set of input data we are risking an OutOfMemoryException.

The first issue can be solved by moving chunks of code into separate methods leaving ProcessData responsible for managing workflow only:

        private void ProcessData(IEnumerable<Entity> entities)
        {
            var entitiesWithCanonicalName = CanonicaliseEntityNames(entities);
            var entitiesWithRecalculatedTax = EntitiesWithRecalculatedTax(entitiesWithCanonicalName);
            SaveEntities(entitiesWithRecalculatedTax);
        }

        private static IEnumerable<Entity> CanonicaliseEntityNames(IEnumerable<Entity> entities)
        {
            var entitiesWithCanonicalName = new List<Entity>();
            foreach (var entity in entities)
            {
                entity.Name = nameCanonicalisator.ToCanonicalForm(entity.Name);
                if (entity.Name.StartsWith("a"))
                    entitiesWithCanonicalName.Add(entity);
            }

            return entitiesWithCanonicalName;
        }

        private static IEnumerable<Entity> EntitiesWithRecalculatedTax(IEnumerable<Entity> entities)
        {
            var entitiesWithRecalculatedTax = new List<Entity>();
            foreach (var entity in entities)
            {
                entity.Value = taxCalculator.CalculateTax(entity.Value);
                if (entity.Value >= 50000)
                    entitiesWithRecalculatedTax.Add(entity);
            }
            return entitiesWithRecalculatedTax;
        }

        private static void SaveEntities(IEnumerable<Entity> entities)
        {
            foreach (var entity in entities)
            {
                // code for saving an entity
            }
        }

To address the second issue we need to find a way to not create new collection of items on different steps in our workflow. LINQ has a set or methods which can be applied to IEnumerable and which allow to process data “on the fly” without a need to create new collections. To solve memory problem we could rewrite our methods as follow:

        private static IEnumerable<Entity> CanonicaliseEntityNames(IEnumerable<Entity> entities)
        {
            return entities
                .Select(i =>
                {
                    i.Name = nameCanonicalisator.ToCanonicalForm(i.Name);
                    return i;
                })
                .Where(i => i.Name.StartsWith("a"));
        }

        private static IEnumerable<Entity> EntitiesWithRecalculatedTax(IEnumerable<Entity> entities)
        {
            return entities
                .Select(i =>
                {
                    i.Value = taxCalculator.CalculateTax(i.Value);
                    return i;
                })
                .Where(i => i.Value >= 50000);

        }

By using Select and Where methods from LINQ we defer execution of the code to a point when data is requested. We also avoid creating new collections as data is returned by an enumerator.

By using an extension methods we can make the code even more readable chaining calls to extensions:

        private void ProcessData(IEnumerable<Entity> entities)
        {
            entities
                .WithCanonicalName(nameCanonicalisator)
                .Where(i => i.Name.StartsWith("a"))
                .WithTaxApplied(taxCalculator)
                .Where(i => i.Value >= 50000);

            SaveEntities(entities);
        }

And below is a class with extension methods:

    public static class EntitiesEnumerableExtensions
    {
        public static IEnumerable<Entity> WithCanonicalName(this IEnumerable<Entity> entities, INameCanonicalisator nameCanonicalisator)
        {
            foreach (var entity in entities)
            {
                entity.Name = nameCanonicalisator.ToCanonicalForm(entity.Name);
                yield return entity;
            }
        }

        public static IEnumerable<Entity> WithTaxApplied(this IEnumerable<Entity> entities, IAmountTaxCalculator taxCalculator)
        {
            foreach (var entity in entities)
            {
                entity.Value = taxCalculator.CalculateTax(entity.Value);
                yield return entity;
            }
        }
    }
Advertisements

Arranging mocks using DSL

One of the biggest problems with unit tests is poor readability. Bad naming convention, long methods, hard to understand Arrangement and Assert parts are making unit tests one of the hardest code to read and refactor. In previous article,
Unit Tests as code specification
, I presented the way to increase readability of test method names and use them to create code specification. Now I would like to tackle the problem of unreadable test methods.

Most of unit tests methods start with test arrangement. It usually takes a form of setting up mocks and initialising local variables. It’s not uncommon to start the test with code similar to the one below:

_testService.Expect(i => i.Foo()).Throw(new WebException("The operation has timed out")).Repeat.Once();
_testService.Expect(i => i.Foo()).Return(9).Repeat.Once();

The example prepares _testService to throw a WebException the first time the method Foo is called and then return 9 when called again. There are few problems with this set-up. It requires careful reading and analysis of the code to understand the arrangement. If there are more set ups and you get confused about them, you will need to analyse it again. In the presented example, some information is concealed – the WebException represents a time out exception thrown by Google AdWords service, which is in opposition to CommunicationException thrown by web services such as Bing Ads.

The above example is fairly simple, but the situation gets worse when there is a need to set-up few mocks, or there are parameters to be passed into mocked method.

Lets see how above example could look when using DSL:

_testService
   .SimulateGoogleAdWords()
   .FailWithTimeOut().Once()
   .Then().Return(9);

We are using Domain Specific Language to say that test is for Google AdWords, where service throws time out when called first time and then returns 9 after retry. This version is easy to read and remember, carries the information that we are testing the case with Google AdWords, and it is easy to reference back to it.

The implementation of DSL is done using an extension methods on mocked object. Sample code associated with this article uses Rhino Mocks, but it can easily be converted to use other mocking frameworks.

Unit Tests as code specification

When asking people what is the purpose of writing unit tests we usually get following answer:

“To verify that the code actually does what it is supposed to do.”

Among other responses we will find that unit tests help to validate that changes are not breaking existing functionality (regression), or that practising TDD will guide the design. But are those the only purposes? Well, there is more. Because unit tests are executing our code, they can show how it is working. We can use them as a specification of the code. Well crafted tests, which have explaining names and are easy to read, create a live specification of the module, which is always up to date.

Whenever we need to analyse a class, whether because we are new to it or we are coming back, we can use reports from unit tests to get the understanding how the class is working and what is it’s contract.

To build a specification from unit tests, we need to keep them organised and apply proper naming convention.

Test class names

First of all, there is no need to keep all unit tests related to given class in one unit test class. We can, and actually should, have more than one unit test class per tested class. Good convention is to have a class per functionality.

Let’s consider an example of recoverable policy which retries service call on communication errors. Whenever the connection drops or times out we want to wait some time and retry the operation, using exponential back off. Here we have few functionalities which should be tested:

  • retrying on supported communication error
  • rethrowing unsupported exceptions
  • retrying logic

To organise our test suite let’s create a folder for all unit tests related to tested class. This will be name of tested class postfixed with word “Tests”. The postfix is important to avoid conflicts between the name of the class which we are going to tests and the name of the assembly containing tests. In out example it will be : CommunicationErrorRecoveryPolicyTests.

Unit tests classes are named after the functionality, such as: WhenOperationFailsWithCommunicationException, WhenFailsWithUnsupportedError or WhenHandlesError. The class name partially describes test Arrangement, i.e. what are the cases we are testing.

Test method names

The method name may optionally describe further test Arrangement. It also describes Assertion. Example method names for the WhenOperationFailsWithCommunicationException cases can be:

  • BecauseOfTimeoutThenRetires – which performs a test of the case when operation failed due to server time out
  • BecauseOfConnectionDropsThenRetires – for the case when remote server dropped connection

The naming convention is to elaborate more on Arrangement (BecauseOfTimeout) and then explain what is the result (ThenRetires). The first part can be skipped if all Arrangements are contained in class name. The second part is required and should not be omitted, same as method body should always have assertions.

Note, that with test method names there is no need to keep them concise as is the case with normal methods. They are not referenced anywhere and serve better its purpose when are expressive.

Generating specification

Once the naming convention is applied, we can use report from unit test runner to generate the specification. Below is a snapshot from ReSharper test runner:

Report from running Unit Tests can be used to generate code specification (example using ReSharper test runner).

Report from running Unit Tests can be used to generate code specification (example using ReSharper test runner).


I am grouping tests by Namespaces to create hierarchy which can be read like sentence:


CommunicationErrorRecoveryPolicyTests ->
   When operation fails with communication exception because of 502 then retries.
   When operation fails with communication exception because of time out then retries it.

To find out what exceptions are covered by the policy I will run associated tests (all contained in CommunicationErrorRecoveryPolicyTests folder) and look at generated report.

We can even go one step further, and autogenerate HTML documentation from unit test reports. This could be run by CI server after each build and saved among artefacts.

Accompanying source code

The accompanying source code can be found on github at: https://github.com/mariuszwojcik/SpecificationByExample. For examples related to this article check SpecificationByExample.Domain.UnitTests/SpecByExample.

Summary

Properly organising test suites adds another aspect to unit testing. Good, expressive naming convention allows for building a specification of the code. It also helps with future refactorings and unit tests maintenance. Using that simple method allows for increasing teams agility and productivity by helping developers quickly understand purpose of tests and workings of tested classes.

Using policies to handle exceptions while calling external services

Exception handling very easily gets ugly. Typical try...catch block clutters method and grows with any new exception discovered. Then, bits of code are copied between methods which require same error handling. Adding any new logic into error handling is a nightmare and with each new release it seems like the same errors are coming back.

Policies for handling exceptions

To overcome those problems we can extract logic related to exception handling into separate objects – policies. This will keep main business logic clear, allow reusing and make testing easy.

Here’s the definition for recoverable policy:

public interface IRecoverablePolicy<TResult>
{
   TResult Execute(Func<TResult> operation);
}

One example of recoverable policy is handling transient exceptions. Usually they require retrying method call after a small pause.

public class TransientExceptionRecoveryPolicy<TResult> : IRecoverablePolicy<TResult>
{
    public TResult Execute(Func<TResult> operation)
    {
        try
        {
            return operation.Invoke();
        }
        catch (TransientException)
        {
            Thread.Sleep(TimeSpan.FromSeconds(1));
            return Execute(operation);
        }
    }
}

Now, extending this code with a functionality to retry N times with and exponentially increasing backoff between attempts is fairly easy. Also, unit testing is straight forward. The policy can be reused to protect any method call which may throw a TransientException.

The policy for handling communication exceptions is very similar to the above.

To execute code with the policy we call it like this:

policy.Execute(()=> { // call a method on external service });

Handling functional cases

Above examples were covering non functional cases where external service may give transient error. But we can cover functional cases as well. Let’s imagine a scenario where we call a method to create an article and get back article ID. Once article is created we want to further use it’s ID, let’s say to store it in the system. The article title has to be unique, and when uniqueness is not satisfied, the DuplicateTitleException is thrown. Assuming we are not using any transactions, there potentially is a risk of operation failing between creating and article and storing it’s ID. To make the operation idempotent we need to detect DuplicateTitleException, fetch article’s ID and carry on. This can be easily achieved by wrapping call to create an article in the policy which on failure will get the ID and return it back. Below is an example implementation:

public class DuplicateTitleRecoveryPolicy : IRecoverablePolicy<long>
{
    private readonly IExternalService _service;

    public DuplicateTitleRecoveryPolicy(IExternalService service)
    {
        _service = service;
    }

    public long Execute(Func<long> operation)
    {
        try
        {
            return operation.Invoke();
        }
        catch (DuplicateTitleException e)
        {
            var article = _service.GetArticleByTitle(e.Title);
            return article.Id;
        }
    }
}

Composite policy

To cover the code with few policies, we can use a CompositePolicy which will execute set of policies one after another:

public class CompositeRecoverablePolicy<TResult> : IRecoverablePolicy<TResult>
{
    private readonly List<IRecoverablePolicy<TResult>> _policies;

    public CompositeRecoverablePolicy(IEnumerable<IRecoverablePolicy<TResult>> policies)
    {
        _policies = new List<IRecoverablePolicy<TResult>>(policies);
    }

    public TResult Execute(Func<TResult> operation)
    {
        var chainedPolicies = operation;

        foreach (var policy in _policies)
        {
            var localOperation = chainedPolicies;
            var currentPolicy = policy;
            chainedPolicies = () => currentPolicy.Execute(localOperation);
        }

        return chainedPolicies.Invoke();
    }
}

We pass all the policies required to cover the code in the constructor, and use composite policy to execute the code.

Extension methods

To further increase readability of the code we can use extension methods and create overloads which take policy as an extra parameter:

public interface IExternalService
{
    long CreateArticle(string title, string author, string body);
}

public static class ExternalServicePolicyExtensions
{
    public static long CreateArticle(this IExternalService service, string title, string author, string body, IRecoverablePolicy<long> policy)
    {
        return policy.Execute(() => service.CreateArticle(title, author, body));
    }
}

Now, the call to create an article will look more familiar:

var articleId = service.CreateArticle("post", "me", "empty", policy);

Sample code

The sample code can be found on github.

Summary

Policies are very powerful tool. They allow to decouple exception handling from the main code. This leads to smaller and simpler classes which adhere to Single Responsibility principle. By using a strategy pattern we also follow Open/Closed principle and make the application easy to extend. Policies are easy to reuse, hence it is easier to prevent code duplication (DRY principle). Each policy is treating single case and is completely independed from other business code which makes them extremely easy to unit test.

The best place to use policies is when making network calls, using external service or need to respond and recover from business errors.