For some code I'm working on, I've been teaching myself how to build and modify expression trees. I can say initially I found myself a little intimidated.
Whether your code modifies elements within the tree, translates the expressions into another form, or simply visualizes the nodes, it will need to recursively visit each node in the tree. There's quite a variety of node types in System.Linq.Expressions. Writing code to support visiting all those node types just isn't my idea of fun.
If you consider for a moment that Microsoft's own libraries had to do just that to support LINQ to SQL, one would imagine that there must be a class in the framework that simplifies this. And there is! The class is ExpressionVisitor.
Of course, it's conveniently marked internal and you can't use it. :-)
Fortunately, code for an ExpressionVisitor is available as part of a how-to article on MSDN:
How to: Implement an Expression Tree Visitor
Expression trees are immutable so you cannot modify the expression nodes in place unlike some other common tree models. Much like how a string operation returns a copy of the string with the modifications in place, the same has to be done with expression trees. Expression tree visitors come to the rescue here to make it easy to visit all the nodes, return copies of the ones you want to keep, and build new nodes for the changes you want to make.
MSDN has a follow up how-to that covers how to do this:
How to: Modify Expression Trees
For the scenario I'm investigating, serializing expression trees, I realized that if I have any hope in getting it to work, I need to evaluate part of the tree to remove references to locally accessed variables (closures) in the lambdas. The reason for this is two-fold:
- C# and Visual Basic implement closures by generating non-serializable classes. Let's say I come up with a solution for creating a serializable set of expression nodes. I would still have the dilemma that these classes were not serializable even though the fields they contain are serializable. We don't have control over how the compiler generates the closures, so it would be best just to get them out of the picture.
- If the idea is to serialize the tree, for persistence or sending over a network, why serialize information that can be calculated once?
I worked through it in my head that to achieve this I would have to walk the tree and find which expression nodes depended on the fields in the closures. Better yet, find the nodes that are not dependent on the parameters passed into the expression. The idea is that if I find nodes that are not dependent on the parameters, I can evaluate them and replace them with constant expressions. This requires finding the node, wrapping the node in a lambda expression, compiling the lambda into a delegate, invoking the delegate, and finally creating a new constant expression with the value returned from the delegate invocation.
The real trick is finding at what level can you replace the nodes with a constant expression. Given enough time, I'm sure I could've figured out a slick way of doing this myself. But, again, I am not a glutton for punishment. LINQ to SQL has to take a similar course of action before it attempts to translate the LINQ expression into a SQL statement. There's got to be code out there that does some of this dirty work for me.
If you look deep enough into System.Data.Linq in Reflector, you'll eventually find something called Funcletize that does this dirty work.
I don't feel comfortable taking code out of Reflector and plugging it into a solution. I have to really "get it" so I can be sure to customize it to my own needs. Finding this information was useful in that it did point me in the right direction. Most LINQ providers need to do this work. Although, I'm not writing a LINQ provider the problem space overlaps a great deal with what I'm trying to do. It was high time I got to reading some articles and blog postings I had bookmarked about implementing custom LINQ providers.
I found just what I was looking for in Matt Warren's series on building an IQueryable provider. Specifically, part 3 of his series he discusses performing partial evaluation to replace expression nodes with constant expressions. The write-up is good and the code even better!
With Matt's code, I can get the expression down to the bare minimum for serialization. The next step will be to transform the tree into a serializable form. I'm not sure if that's necessarily going to be as involved as these steps or if it'll just require a lot of busy work. I'm suspecting more the latter case as my current thinking suggests that I'll be building for all intents and purposes a hierarchy of types that are equivalent to the expression types in System.Linq.Expressions. The difference would be that mine are serializable.