Failure to Yield Right-of-Way

By Michael Flanakin @ 9:52 PM :: 2142 Views :: .NET, Development :: Digg it!

Yield

I'm one of the many .NET developers out there that neglects the enhancements in the framework. Not that I mean to, I just keep a running tally of things I need to catch up on, but rarely make the time to actually do any of them. In an effort to shame myself into taking care of a few of these things, I decided to dig into something I haven't spent any time trying to understand: the yield keyword, introduced in C# 2.0. I have to say, I was surprised at how simple it was... well, almost.

To attempt the obligatory textual description: yield, in conjunction with a return or break statement, tells the compiler that the code block should be treated as an iterator. This means the code block must return an instance of System.Collections.IEnumerable; but that will be almost completely hidden from you. All you need to do is "yield" each value within a loop. The compiler will wrap your code block and return each value as the enumerator is traversed.

There. Plain as day, right? Doubt it.

While reading about the feature, I was reminded about how crappy some help can be. I just wanted a code snippet to show me what I might do without the yield keyword and then what I'd do with it. Here's what you are probably writing today...

public List<User> GetUsers(IDataReader reader)
{
    // convert to list of users
    List<User> users = new List<User>();
    while (reader.Read())
    {
        User user = new User();
        // set user properties from reader
        users.Add(user);
    }

    // return
    return users;
}

This is pretty basic stuff. Now, let's look at how you'd do it with the yield keyword...

public List<User> GetUsers(IDataReader reader)
{
    // convert to list of users
    while (reader.Read())
    {
        User user = new User();
        // set user properties from reader
        yield return user;
    }
}

If you didn't catch it, we were able to get rid of the code that uses the List<User> instance. Sure, only 3 lines, but less code is typically better -- assuming we're not sacrificing readability. Those who're paying a little more attention probably noticed the fourth line that changed (well, technically, it was the first): the return type. Since yield only knows about IEnumerable (and IEnumerable<T>, by proxy), we have to change the return type to match that. I have to admit, I didn't like this. Using IEnumerable basically means I'm stuck with foreach blocks, which I hate using. This led me to investigating performance.

If you really want to know about the performance benefits of for vs. foreach, check out Joe Duffy's blog Syndicated feed. Joe works on the PLINQ team and has a very nice post about perf considerations. From the limited tests I ran, I started to see horrid performance when using yield. Then, I reallized I probably needed to bump up my iterations to make it a bit more meaningful. Once I got into 10-50,000 iterations, I started seeing yield come out on top -- or, at least making it a better race. This goes along with what Joe talks about: you pay the cost of having the enumerator, which costs a lot, but will make up for it over the long haul, assuming you have a lot of iterations.

This isn't the whole story, tho. I ran this on a single core machine. Using a multi-core machine will produce better results. Why? Because yield is multi-threaded. What actually happens is, when you call a method that uses yield, it maintains a reference to that method. Then, your code will get an enumerator for it, typically via a foreach block. All this happens without actually touching your method. Within the foreach block, you actually reference the instance associated with the enumerator's location (i.e. users[i] in a for block). When you access the instance, that's when .NET actually digs into your method to get the next instance. The benefit of this is that you only process what you need to process. If you only need to loop thru 10 of the 1000 records, you only process 10, whereas all 1000 would be loaded into memory with the typical approach.

It's all a bit fuzzy until you play with it. I'd recommend creating a simple test to walk thru it yourself, if you really want to get a feel for it. It's as simple as debugging a test. As a matter of fact, here's a simple MSTest project that walks thru it. Hopefully, this helps you understand what's going on.

Ratings