loops and anonymous methods in C#
Time for another C# post. I won't call this one a WTF, but it doesn't really sit well with me, either.
List<SomeHandler> handlers = new List<SomeHandler>(); for (int i = 0; i < 10; i++) { handlers.Add(new SomeHandler(delegate() { Console.WriteLine(i); })); } foreach (SomeHandler h in handlers) { h(); }
What would you expect the output of the above code to be? Something like the following, right?
0 1 2 3 4 5 6 7 8 9
WRONG! What you actually get is this:
10 10 10 10 10 10 10 10 10 10
Huh? Why is that? To understand why this happens, you need to understand how anonymous methods work in C#. You see, these "anonymous methods" in C# are actually more than that; they are closures. The body of an anonymous method can reference any variable in the scope in which it is defined, even if that variable is not defined in the scope in which the anonymous method is executed. That's why the anonymous methods we created above can reference the variable i
. See that word? "Reference"? The anonymous methods contain references to i
. Thus, each anonymous method has a reference to the same memory location, which the termination of the loop left at 10.
Why references? In short, it's the difference between a simple anonymous method and a closure. We need references in order for the following code to work.
class SomeForm : Form { private int numberOfClicks = 0; private Button button; private Button otherButton; public SomeForm() { //Blah blah setup controls blah blah button.Click += delegate(object sender, EventArgs e) { numberOfClicks++; }; } public int NumberOfClicks { get { return numberOfClicks; } } }
The intent is that the numberOfClicks
variable be incremented whenever the button is clicked so that we can keep track of how many times the button has been clicked over the course of the program's execution. If the anonymous method didn't have a reference to the same memory location referenced by numberOfClicks
everywhere else in the class, then the NumberOfClicks
property would always return 0. Clearly, though, there are times when you don't really want that. How would you change the code at the top so that it outputs 0 through 9? Like this:
List<SomeHandler> handlers = new List<SomeHandler>(); for (int i = 0; i < 10; i++) { int value = i; handlers.Add(new SomeHandler(delegate() { Console.WriteLine(value); })); } foreach (SomeHandler h in handlers) { h(); }
Why does this work when the original version didn't? The variable value
is defined inside the loop. That means that it is effectively a new variable in each loop iteration, so each anonymous method has a reference to a different memory location.
This is not a WTF because it makes sense if you think about it, and the argument against it isn't obvious (at least an informed argument; an uninformed argument is always easy to make). I still don't like it, though, since it violates the principle of least surprise. In the case of a for
loop, it probably doesn't make sense to have the compiler automatically slip that int value = i
in there, since in some cases, that's not what the user wants. However, I would argue that it does make sense in a foreach
loop. In a for
loop, you can define the variable being operated on completely outside of the loop construct. Heck, you don't even need to operate on a variable! for(;;)
is a perfectly valid for
loop. foreach
is different, though. You must declare the variable in the foreach
loop construct. There is no way to reference that variable outside of the loop except through anonymous methods. In fact, one can simulate the following foreach
loop
foreach (double value in list) { DoSomething(value); }
like this:
for (int i = 0; i < list.Count; i++) { double value = list[i]; DoSomething(value); }
The first construct will exhibit the problem; the second will not. Thus, it can come as a surprise when all of the anonymous methods have a reference to the same memory location. Yet another case where C# (like Java) neglecting to properly distinguish between values and references ends up confusing people.
That said, I only got burned by this once (arguably twice, but the real problem the second time was lack of comments to explain the seemingly useless variable creation), and I only ended up using anonymous methods because I was abusing an API. In the new version of the code now under development, I avoid using anonymous methods at all because I don't abuse the API.
TL;DR: don't use an anonymous method if it doesn't make the code significantly cleaner, and if you do use one, be careful with it.