Friday, 5 October 2012

The ten most common misconceptions in LINQ—and how they set people awry.

Here are ten root causes of the most common misunderstandingsdistilled from many hundreds of questions on the LINQ forums.


Myth #1

All LINQ queries must start with the ‘var’ keyword. In fact, the very purpose of the ‘var’ keyword is to start a LINQ query!

The var keyword and LINQ queries are separate concepts. The purpose of var is to let the compiler guess what type you want for a local variable declaration (implicit typing).

 For example, the following:
 
var s = "Hello"; 

is precisely equivalent to:
string s = "Hello"; 

because the compiler infers that s is a string.

Similarly, the following query:
 
string[] people = new [] { "Tom", "Dick", "Harry" };
var filteredPeople = people.Where (p => p.Length > 3); 

is precisely equivalent to:
 
string[] people = new [] { "Tom", "Dick", "Harry" };
IEnumerable<string> filteredPeople = people.Where (p => p.Length > 3); 

You can see here that all that we're achieving with var is to abbreviate IEnumerable<string>. Some people like this because it cuts clutter; others argue that implicit typing can make it less clear what's going on.
Now, there are times when a LINQ query necessitates the use of var. This is when projecting an anonymous type:


string[] people = new [] { "Tom", "Dick", "Harry" };
var filteredPeople = people.Select (p => new { Name = p, p.Length }); 

Here is an example of using an anonymous type outside the context of

LINQ query:
 
var person = new { Name="Foo", Length=3 };

Myth #2 

All LINQ queries must use query syntax.

There are two kinds of syntax for queries: lambda syntax and query syntax (or query comprehension syntax). Here's an example of lambda syntax:
 
string[] people = new [] { "Tom", "Dick", "Harry" };
var filteredPeople = people.Where (p => p.Length > 3); 

Here's the same thing expressed in query syntax:
 
string[] people = new [] { "Tom", "Dick", "Harry" };
var filteredPeople = from p in people where p.Length > 3 select p; 

Logically, the compiler translates query syntax into lambda syntax. This means that everything that can be expressed in query syntax can also be expressed in lambda syntax. Query syntax can be a lot simpler, though, with queries that involve more than one range variable. (In this example, we used just a single range variable, p, so the two syntaxes were similarly simple).

Not all operators are supported in query syntax, so the two syntax styles are complementary. For the best of both worlds, you can mix query styles in a single statement (see Myth #5 for an example).

Myth #3

To retrieve all customers from the customer table, you must perform a query similar to the following:
       
          var query = from c in db.Customers select c;

The expression:
from c in db.Customers select c 

is a frivolous query! You can simply go:
 
db.Customers

Similarly, the following LINQ to XML query:
 
var xe = from e in myXDocument.Descendants ("phone") select e;

can be simplified to:
 
var xe = myXDocument.Descendants ("phone");

And this:
 
Customer customer = (from c in db.Customers where c.ID == 123 select c)
                    .Single();

can be simplified to:
 
Customer customer = db.Customers.Single (c => c.ID == 123);


Myth #4

To reproduce a SQL query in LINQ, you must make the LINQ query look as similar as possible to the SQL query.

LINQ and SQL are different languages that employ very different concepts.
Possibly the biggest barrier in becoming productive with LINQ is the "thinking in SQL" syndrome: mentally formulating your queries in SQL and then transliterating them into LINQ. The result is that you're constantly fighting the API!

Once you start thinking directly in LINQ, your queries will often bear little resemblance to their SQL counterparts. In many cases, they'll be radically simpler, too.
Myth #5 

To do joins efficiently in LINQ, you must use the join keyword.
This is true, but only when querying local collections. When querying a database, the join keyword is completely unnecessary: all ad-hoc joins can be accomplished using multiple from clauses and subqueries. Multiple from clauses and subqueries are more versatile too: you can also perform non-equi-joins.


Better still, in LINQ to SQL and Entity Framework, you can query association properties, alleviating the need to join altogether! For instance, here's how to retrieve the names and IDs of all customers who have made no purchases:

from c in db.Customers
where !c.Purchases.Any()
select new { c.ID, c.Name } 

Or, to retrieve customers who have made no purchases over $1000:

from c in db.Customers
where !c.Purchases.Any (p => p.Price > 1000)
select new { c.ID, c.Name } 
 
Notice that we're mixing fluent and query syntax. See LINQPad for more examples on association properties, manual joins, and mixed-syntax queries.
Myth #6 

Because SQL emits flat result sets, LINQ queries must be structured to emit flat result sets, too.

This is a consequence of Myth #4. One of LINQ's big benefits is that you can:
  1. Query a structured object graph through association properties (rather than having to manually join)
  2. Project directly into object hierarchies
The two are independent, although 1 helps 2. For example, if you want to retrieve the names of customers in the state of WA along with all their purchases, you can simply do the following:

from c in db.Customers
where c.State == "WA"
select new
{
   c.Name,
   c.Purchases    // An EntitySet (collection)
}

The hierarchical result from this query is much easier to work with than a flat result set!

We can achieve the same result without association properties as follows:

from c in db.Customers
where c.State == "WA"
select new
{
   c.Name,
   Purchases = db.Purchases.Where (p => p.CustomerID == c.ID)
}

Myth #7 

To do outer joins in LINQ to SQL, you must always use DefaultIfEmpty().
This is true only if you want a flat result set. The examples in the preceding myth, for instance, translate to a left outer join in SQL, and require no DefaultIfEmpty operator.

Myth #8 

A LINQ to SQL or EF query will be executed in one round-trip only if the query was built in a single step.

LINQ follows a lazy evaluation model, which means queries execute not when constructed, but when enumerated. This means you can build up a query in as many steps as you like, and it won't actually hit the server until you eventually start consuming the results.

For instance, the following query retrieves the names of all customers whose name starts with the letter 'A', and who have made at least two purchases. We build this query in three steps:

var query = db.Customers.Where (c => c.Name.StartsWith ("A"));
query = query.Where (c => c.Purchases.Count() >= 2);
var result = query.Select (c => c.Name);

foreach (string name in result)   // Only now is the query executed!
   Console.WriteLine (name);

Myth #9 

A method cannot return a query, if the query ends in the 'new' operator
The trick is to project into an ordinary named type with an object initializer:

public IQueryable<NameDetails> GetCustomerNamesInState (string state)
{
   return
      from c in Customer
      where c.State == state
      select new NameDetails
      {
         FirstName = c.FirstName,
         LastName = c.LastName
      };
}

NameDetails is a class that you'd define as follows:
 
public class NameDetails
{
   public string FirstName, LastName;
}

Myth #10 

The best way to use LINQ to SQL is to instantiate a single DataContext to a static property, and use that shared instance for the life of the application.
This strategy will result in stale data, because objects tracked by a DataContext instance are not refreshed simply by requerying.

Using a single static DataContext instance in the middle tier of a distributed application will cause further trouble, because DataContext instances are not thread-safe.

The correct approach is to instantiate fresh DataContext objects as required, keeping DataContext instances fairly short-lived. The same applies with Entity Framework.

No comments :