Many people use entity framework but are not aware of what deferred execution is and how it may affect their projects. In this article, we are going to have a little talk about it and see how things work.
LINQ to entities and deferred execution
We take into account that we all have some understanding of entity and LINQ or lambda expressions.
To go through this post we created a table called Videogames containing four columns (ID, Name, MainCharacterName and Publisher) and filled it with some popular vintage videogames. Let's start with the following piece of code.
string videogameNames = "";
var MarioVideogames = db.Videogames.Where(v => v.MainCharacterName == "Mario");
foreach (var videogame in MarioVideogames)
videogameNames += videogame.Name + "<br/>";
Let's say our table contained only two Videogames having Mario MainCharacterName: Super Mario Bros and Super Mario Bros 2. The resulting string would be
Super Mario Bros<br/>Super Mario Bros 2<br/>
(There's nothing wrong in using lambda expressions when talking about LINQ to entities. It's all the same in the end and I personally think that it is much easier to read. If you preferred LINQ queries you could have written
var MarioVideogames = from videogame in db.Videogames where videogame.MainCharacterName == "Mario" select videogame
instead )
Nothing strange so far. What is interesting about it is the deferred execution we talked about. In other words, the MarioVideogames variable will not be filled with data the moment it is declared but rather the moment it is used. To be exact, the moment it is iterated.
The IEnumerable interface requires the implementation of the GetEnumerator() method, which returns an enumerator to iterate through a collection, regardless of whether that collection is an IEnumerable, an IQuerable, or even a List.
So whenever we have a variable based on IEnumerable and we want to do something with that (that could be using foreach, First, Average etc) we need to iterate through that variable and as a result ask the query to be executed.
Why do we need deferred execution?
All right, since we know the query is not executed instantly, what would happen in the following case?
var MarioVideogames = db.Videogames.Where(v => v.MainCharacterName == "Mario");
string firstVideogameName = MarioVideogames.First().Name;
string lastVideogameName = MarioVideogames.Reverse().First().Name;
Sure, firstVideogameName would be "Super Mario Bros" and lastVideogameName would be "Super Mario Bros 2". Yet, what's interesting is that we requested the database twice. First, is when we executed MarioVideogames' query to get firstVideogameName value, second in the case of lastVideogameName.
Was that a good thing? Typically, yes it is. There's no reason in requesting data before we need it. Take a look at this.
var MarioVideogames = db.Videogames.Where(v => v.Name.Contains("Mario"));
if(type == 1)
MarioVideogames = MarioVideogames.Where(v => v.MainCharacterName == "Mario");
else if(type == 2)
MarioVideogames = MarioVideogames.Where(v => v.MainCharacterName == "Wario");
In this case, MarioVideogames contains different data, depending on the type. Let's say, by default MarioVideogames would return 50 rows of videogames, where in case of type == 1 or type == 2, would return way less videogames. If things did not work that way, we would ask the database for these 50 rows, and after we got all of them, then we would filter out the ones who got the MainCharacterName we want. On the contrary, what happens now is we ask the database for videogames containing Mario in their Name and having specific MainCharacterName at the same time. The records we get will be the ones we actually want.
If for any reason we did want to execute the query at a given moment and get the data into memory we may use something like ToList or ToArray method. This way, instead of a simple IEnumerable, we can create a list that is stored in memory and has no longer anything to do with the database.
var MarioVideogames = db.Videogames.Where(v => v.Name.Contains("Mario")).ToList();
if(type == 1)
MarioVideogames = MarioVideogames.Where(v => v.MainCharacterName == "Mario").ToList();
else if(type == 2)
MarioVideogames = MarioVideogames.Where(v => v.MainCharacterName == "Wario").ToList();
This time MarioVideogames will return the 50 videogames at once, and then filter out the ones we don't need.
Does it always work?
Now, you may wonder, why would we want to "deactivate" deferred execution? It seems to work just fine even in the following case.
var videogames = db.Videogames.Where(v => v.MainCharacterName == "Mario");
//Someone changes Super Mario Bros into Super Mario Bros 3 at this point
string firstVideogameName = videogames.First().Name;
string lastVideogameName = videogames.Reverse().First().Name;
If the query had already been executed before we needed it to, then firstVideogameName would be "Super Mario Bros", which would by now be outdated. So, as you can see deferred execution works and by default we get no such problems. Unless of course you do not wish to have updated data from the moment your program starts executing.
Anyway, now here's something interesting.
Videogame firstMarioVideoGame = null;
var MarioVideogames = db.Videogames.Where(v => v.MainCharacterName == "Mario");
if(MarioVideogames.Any())
{
//Someone removed all Mario games from the database at this point
firstMarioVideoGame = MarioVideogames.First();
}
This one would end up in a NullReferenceException and what's really bad about it, is there is no easy way of having both deferred execution and ensure that such a case will not occur at the same time.
Sure, you can handle such exceptions if you like, but the thing is you can’t always avoid inconsistency issues. For example
var MarioVideogames = db.Videogames.Where(v => v.MainCharacterName == "Mario");
int MarioVideogamesCounter = MarioVideogames.Count();
//Someone removes Super Mario Bros from the database at this point
foreach (var videogame in MarioVideogames)
videogameNames += videogame.Name + "<br/>";
Now, the counter says we have two videogames but we see only one videogame name.
Here's another example where deferred execution is no good.
List<string> videogameNamesList = GetVideogameNamesList();
//videogameNamesList is a list containing videogame names
var NintendoVideogames = db.Videogames.Where(v => v.Publisher == "Nintendo");
foreach(string name in videogameNamesList)
{
var selectedNintendoVideogames = NintendoVideogames.Where(nv => nv.Name == name);
//.........
}
This time we request data as many times as videogameNamesList's contents are. If it consisted of one thousand strings, we would request data one thousand times.
Enumerating the query using ToList would do some good in such cases. Still, handling 50 Nintendo videogames is fine. What would you do if your database had tens of thousands of Nintendo videogames? Things would go really slow if you had to fetch fifty thousand records in order to get no more than five of them in the end.
Caching could be used to get us out of such trouble, even though there is no easy way out of this. It all depends on what you want to do, what problems may arise and how you expect your project to work.
Summary
Creating a LINQ query on entity framework does not instantly get us data from the database. Instead, that will happen the moment we need them. This is called deferred execution and it helps us avoid unnecessary database requests. However this may also turn out to be more of a trouble than help so we should always keep an eye on how our queries are executed.