Linq provider : an attempt… part 4

Using the Visitor pattern to manipulate expression trees

In the previous posts of this series, I started to describe a partial implementation of a Linq provider. The aim of this provider is to get data from a service, and in order to actually return that data to the caller, it implies some expression tree manipulations. After we obtained the data from the service, we want to inject an access to that actual data in the query expression tree, instead of the QueryableDummyData<Person> instance, which is in fact only a placeholder.

In order to replace that QueryableDummyData<Person> by a Person[] array, we use a class called ExpressionTreeConstantReplacer, which derives from the .NET 4.0 built-in ExpressionVisitor class, which in turn is based upon the Visitor pattern.

Applied to expression trees, the Visitor pattern can be used to generate an expression tree from the visited one, to accumulate some data while visiting the tree, or even both at the same time. Each visit method overload returns an expression, which is by default the unchanged expression it has just visited. It can return a new tree, but is should not modify the visited expression. This approach has the advantage to have no side-effect on the visited expression tree.

As a sample, the visitor pattern can be used to transform the following expression tree :

to this expression tree, which is the same but without the p => p.Age >= 21 where clause :

The ExpressionVisitor class is meant to be derived from, in order to create tree-manipulation classes. When the ExpressionVisitor’s Visit method is called, it determines the type of the node given as an entry parameter, and recursively visits the node and all his children, using each time the method corresponding to the node’s type.

This allows the developer of a derived class to override only the methods that needs to be. For instance, as we want to replace a Constant node with an other one, the class ExpressionTreeConstantReplacer derives from ExpressionVisitor and overrides the method VisitConstant.

internal class ExpressionTreeConstantReplacer<TReplacement>
    : ExpressionVisitor
{
    private Type originalType;
    private TReplacement replacementConstant;

    internal ExpressionTreeConstantReplacer(
        Type originalType, TReplacement replacementConstant)
    {
        this.originalType = originalType;
        this.replacementConstant = replacementConstant;
    }

    protected override Expression VisitConstant(
        ConstantExpression c)
    {
        if (c.Type == this.originalType)
            return Expression.Constant(this.replacementConstant);
        else
            return c;
    }
}

The constructor of this class takes two arguments : the type of the original constant to replace, and the new value. Then during the expression tree visiting process, when a node of type Constant is detected, the VisitConstant method is called, and if its type matches the type to replace, a new Constant node encapsulating the replacement value is returned.

In order to make the replacement call more expressive, a helper class is also introduced, in order to take advantage of C#’s type inference, which is available on static methods but not on constructors :

internal class ExpressionTreeConstantReplacer
{
    internal static Expression CopyAndReplace<TReplacement>(
        Expression expression,
        Type originalType,
        TReplacement replacementConstant)
    {
        var modifier =
            new ExpressionTreeConstantReplacer<TReplacement>(
                originalType,
                replacementConstant);
        return modifier.Visit(expression);
    }
}

This is what allows us to write the final replacement call this way :

Expression finalExpressionTree =
    ExpressionTreeConstantReplacer
        .CopyAndReplace(
            filteredExpressionTree,
            typeof(QueryableDummyData<Person>),
            queryablePersons);

Next time, I’ll describe the TreeVisualizer class I wrote to generate the expression trees visualizations in this post.

Posted in LINQ | Tagged , , , | Comments Off

Linq provider : an attempt… part 3

Finally doing something

In the previous post of this series, I introduced the first classes implied in setting-up the Linq provider : QueryableDummyData and DummyQueryProvider, but these classes weren’t doing much of the real work. We were finally getting to the DummyQueryContext class… which I said was going to actually do something.

DummyQueryContext is an internal class, which exposes a single method :

internal static object Execute(Expression expression,
                               bool isEnumerable)

This method’s goal is to take an expression tree as its input, and to return the expected results for the query expression. Up to this point, the calling chain looks like this :

The first implementation on the Execute method is then :

internal static object Execute(Expression expression,
                                bool isEnumerable)
{
    // First, we call the web service and get an array of persons
    IEnumerable<Person> persons;

    using (DummyService.PeopleFinderClient service =
        new DummyService.PeopleFinderClient())
    {
        persons = service.FindPeople(new DummyService.SearchCriteria());
    }

    // We use Ling to Objects to get an IQueryable instance of persons
    var queryablePersons = persons.AsQueryable();

    // We transform the expression tree
    Expression finalExpressionTree =
        ExpressionTreeConstantReplacer
            .CopyAndReplace(
                expression,
                typeof(QueryableDummyData<Person>),
                queryablePersons);

    // Finally, based on the new tree, we either create a query or
    // execute with the Linq to Objects provider
    IQueryProvider provider = queryablePersons.Provider;
    if (isEnumerable)
        return provider.CreateQuery(finalExpressionTree);
    else
        return provider.Execute(finalExpressionTree);
}

Remember : my test case is only to get an enumerator and check that we can MoveNext. In order to make the test pass, we don’t have to worry about filtering or sorting in a any way, so the previous implementations calls the service with no particular parameter, and the service returns all its data.

For now, the trickiest part is the expression tree transformation. Its goal is to build a new expression tree where the QueryableDummyData<Person> instance is replaced with the Person[] array returned by the service. Once this replacement done, the Execute method will let the Linq to Objects provider run the query against the array. The isEnumerable is used to know whether the method must return a scalar value (via the queryablePersons.Provider.Execute method) or an IEnumerable (via the queryablePersons.Provider.CreateQuery method). For our first test, we want to get an IEnumerable<Person>.

The manipulated expression tree is built upon the following code snippet :

var queryablePersons =
    new QueryableDummyData<Person>();

Here is a representation of the generated expression tree :

And a representation of the final expression tree, once the QueryableDummyData<Person> has been replaced by the Person[] array retrieved from the service :

Next time, we’ll talk about the visitor pattern, and see how the ExpressionTreeModifier actually makes the replacement.

Posted in LINQ | Tagged , , , | Comments Off

Linq provider : an attempt… part 2

Setting up the scene

To build my sample provider, I’ll be querying a sample Web Service. First, Let’s see the big picture of what I’m trying to do :

  • DummyWs : my sample provider will be built against a web service. That service will return objects from its model (Person objects first, maybe some other types later), according to search criteria.
  • DummyWsModel : the model classes and entities will reside in an service-agnostic assembly that will be shared across the projects.
  • DummyWsLinqer : the Linq provider will be based on the service and the model.
  • Tests : this project will use the Linq provider to consume data from the service, but will not hold any direct reference to the service. The idea here is to check both :
    • that the data retrieved data is correct,
    • that the parts of the query that can be handled by the service, well, have been handled by the service and not by the client.

Here is a graph representing the projects dependencies :

I will start with a very simple Model, with a single entity called Person. I’ll probably make the model richer in the next steps, when I have already something working. For now, a Person has an ID, which is a Guid, a first name, a last name and an age.

My first goal will be to make the provider effectively call the service, and return the results. To do so, I will have to capture the Expression tree representing the call, and replace the expression representing the persons by the results of the service.

I’ve based my work upon the Visual Studio samples, and in particular the walkthrough: “Creating an IQueryable LINQ Provider”. Most of the basic code is taken from this sample. But although this sample is not very complicated, it mixes several topics and goals and I’ve found it uneasy to understand. This is why I wanted to make my own implementation and make baby steps.

As I know that my service will return data, the simplest possible test against my provider is to get an enumerator and check that I can at least iterate once :

public void GivenAProviderWhenIGetAnEnumeratorThenICanMoveNext()
{
    QueryableDummyData<Person> queryablePersons =
        new QueryableDummyData<Person>();

    bool exists = false;
    using (IEnumerator<Person> enumerator =
        queryablePersons.GetEnumerator())
    {
        exists = enumerator.MoveNext();
    }

    Assert.IsTrue(exists);
}

The previous code is written in the “Tests” project, and from the first schema we know that only the DummyWsModel and DummyWsLinqer projects are referenced. Person is a class from the Model, and QueryableDummyData<T> is the entry point to express queries against the model. Please note that there is nothing here which refers directly to the web service.

From the MSDN sample, I’ve taken only the core components :

  • a class to be used as the entry point : QueryableDummyData<T>, an implementation of IOrderedQueryable<T>
  • the provider : DummyQueryProvider, an implementation of IQueryProvider,
  • DummyQueryContext, the class which really handles the queries, and transforms them so that they can really be executed.
  • TypeSystem, an internal helper class.

QueryableDummyData, DummyQueryProvider and TypeSystem are basically copy-pasted (shame on me) from the MSDN sample, and quite not modified apart from the renaming. DummyQueryContext is the place where the work gets really done.

QueryableDummyData implements IOrderedQueryable<T>, which means IQueryable<T>, IEnumerable<T>, IOrderedQueryable, IQueryable and IEnumerable. Out of these, the real implementations needed are 3 properties from IQueryable and the generic and non-generic GetEnumerator overloads from IEnumerable<T> and IEnumerable. This class does no real work and the GetEnumerator implementations delegate the work to the Provider’s Execute method.

DummyQueryProvider implements IQueryProvider, but for now the only relevant methods are the two Execute overloads (generic and non-generic). Again, these methods do nothing more that to delegate the work to the DummyQueryContext class.

As I said before, DummyQueryContext is the place where something finally happens. To know how, simply continue to the next post !

Posted in LINQ | Tagged , , , | Comments Off

Linq provider : an attempt… part 1

The call of the AST

I have this will, for quite a long time, to try to implement a basic LINQ provider. The general idea is to play with AST (Abstract Syntax Tree) manipulations, using my language of choice : C#. I also have the temptation to make that attempt in F#, which is a particularly well adapted language to manipulate abstract syntax trees, but I don’t feel ready for that yet… maybe later !

In order to prepare that implementation, which will be based upon the IQueryable interface, I’m first going to refer to an article written by Jon Skeet in his EDULINQ series a year ago : Out-of-process queries with IQueryable.

If you don’t know Jon Skeet, he’s a bit like the Chuck Norris of programming. Maybe some facts about him will make it clearer…

My favourite one is this one :

Jon Skeet has already written a book about C# 5.0.

It’s currently sealed up.

In three years, Anders Hejlsberg is going to open the book to see if the language design team got it right.

My first step towards preparing the implementation has been to read and re-read Jon’s post in order to understand it better. I’ve also translated it in French, it’s available here.

If you want to follow me implementing the LINQ provider… come back soon !

Posted in LINQ | Tagged , , , | Comments Off

Extension Methods and inheritance – part 2

Things to know before using LINQ Providers (LINQ to something) or PLINQ (Parallel LINQ to objects)

Today I wanted to continue my explanations on extensions methods and why one should know how they work before trying to use LINQ providers or PLINQ. The thing to remind from my previous post is that when it comes to extension methods, method resolution occurs at compile time and not at runtime. We will see today why you should care !

Imagine that you want to execute several CPU-bound time-consuming actions, and you’ve spotted a parallelization opportunity… I’ve tried to keep the sample as simple as it could, so here is a delegate that spin-waits and returns the id of the thread on which it runs :

Func<int, int> spinWaitAndGetThreadId = 
    i =>
    {
        Thread.SpinWait(100000000);
        return Thread.CurrentThread.ManagedThreadId;
    };

If we call that delegate using either a “normal” select or a “PLINQ” select, we can see the difference, in that several managed threads are used in the parallel case (which is obviously why PLINQ has been introduced !).

int[] sequential = Enumerable.Range(0, 16)
    .Select(spinWaitAndGetThreadId)
    .ToArray();

int[] parallel = Enumerable.Range(0, 16)
    .AsParallel()
    .Select(spinWaitAndGetThreadId)
    .ToArray();

Assert.IsTrue(sequential.Distinct().Count() == 1);
Assert.IsTrue(parallel.Distinct().Count() > 1);

The AsParallel extension method return an instance of ParallelQuery, which also implements IEnumerable. But the next method calls are no longer IEnumerable’s extension methods, but the ParallelQuery’s extension methods.

If any call in the sequence after AsParallel (either instance method, static method or extension method call) returns an IEnumerable, then the extension method calls will no longer be those of ParallelQuery and the computation will become sequential again.

For instance, if we have a method such as this one :

public static IEnumerable<T> NaiveCustomReverse<T>(
    this IEnumerable<T> source)
{
    Stack<T> stack = new Stack<T>(source);
    foreach (T item in stack)
    {
        yield return item;
    }
}

And we change the previous parallel query to :

int[] parallel = Enumerable.Range(0, 16)
    .AsParallel()
    .NaiveCustomReverse()
    .Select(spinWaitAndGetThreadId)
    .ToArray();

Then the assertion is false, because everything has run on the same thread.

Post-Scriptum :
Besides PLINQ, I also wanted to give a sample using LINQ to SQL. But while working on the code samples, I’ve also finished reading the e-book built out of the EDULINQ blog series by Jon Skeet. I recommend it to anyone, this has really been a pleasure to read it !

The very last operator implemented in that series is AsEnumerable. It gives exactly the kind of explanation I was hoping to provide ! So I’m not going to compete and I’ll finish with a quote from his blog :

“it’s all about changing the compile-time type of the expression”

Posted in LINQ | Tagged , , | Comments Off