Refactoring nightmare Entity Framework query with performance in mind – Part I

I think every dev has a code he’s not particularly proud of. Sometimes you start with relatively simple code, but as the time passes you keep adding functions and after some time you realize you’ve created a monster! You feed it as it still works and does its job, but an idea to refactor it just gives you a headache.. You learn to live with it and as you stay out of each other’s way, all is fine..

Creating the monster

My monster is a Web Api method to fetch the processes handled by maintenance department. It was trivial at the beginning, just getting some processes from processes table, joining it with some other tables to replace foreign keys with actual data. Everything done with Entity Framework.

var items = (from p in db.JDE_Processes
                                 join uuu in db.JDE_Users on p.FinishedBy equals uuu.UserId into finished
                                 from fin in finished.DefaultIfEmpty()
                                 join t in db.JDE_Tenants on p.TenantId equals t.TenantId
                                 join u in db.JDE_Users on p.CreatedBy equals u.UserId
                                 join at in db.JDE_ActionTypes on p.ActionTypeId equals at.ActionTypeId
                                 join uu in db.JDE_Users on p.StartedBy equals uu.UserId into started
                                 from star in started.DefaultIfEmpty()
                                 join pl in db.JDE_Places on p.PlaceId equals pl.PlaceId
                                 where p.TenantId == tenants.FirstOrDefault().TenantId && p.StartedOn >= dFrom && p.StartedOn <= dTo
                                 orderby p.CreatedOn descending
                                 select new Process
                                 {
                                     ProcessId = p.ProcessId,
                                     Description = p.Description,
                                     StartedOn = p.StartedOn,
                                     StartedBy = p.StartedBy,
                                     StartedByName = star.Name + " " + star.Surname,
                                     FinishedOn = p.FinishedOn,
                                     FinishedBy = p.FinishedBy,
                                     FinishedByName = fin.Name + " " + fin.Surname,
                                     ActionTypeId = p.ActionTypeId,
                                     ActionTypeName = at.Name,
                                     IsActive = p.IsActive,
                                     IsFrozen = p.IsFrozen,
                                     IsCompleted = p.IsCompleted,
                                     IsSuccessfull = p.IsSuccessfull,
                                     PlaceId = p.PlaceId,
                                     PlaceName = pl.Name,
                                     Output = p.Output,
                                     TenantId = p.TenantId,
                                     TenantName = t.TenantName,
                                     CreatedOn = p.CreatedOn,
                                     CreatedBy = p.CreatedBy,
                                     CreatedByName = u.Name + " " + u.Surname
                                 });

As this method is a pivotal part of my application, there was constant need to extend the response with extra table fields. Half of the problem if the fields to be added were from table that is in 1-to-1 relation with processes table, then simple join statement to desired table was getting the job done. It was far worse when it had to contain a kind of aggregation field from a table that is in many-to-1 relation with processes e.g. number of handlings each process had (1 process = many handlings).

Process to handlings relationship

Being Database-first kind of dev, I knew 2 ways to tackle the problem: either making the query aggregation query with lots of grouping or putting extra field as a kind of subquery. As I thought aggregation queries are better performance-wise, I decided to use those. Btw, EF’s aggregation queries’ syntax is rather uggly comparing to clean & simple syntax of plain SQL.

var items = (from p in db.JDE_Processes
                                 join uuu in db.JDE_Users on p.FinishedBy equals uuu.UserId into finished
                                 from fin in finished.DefaultIfEmpty()
                                 join t in db.JDE_Tenants on p.TenantId equals t.TenantId
                                 join u in db.JDE_Users on p.CreatedBy equals u.UserId
                                 join at in db.JDE_ActionTypes on p.ActionTypeId equals at.ActionTypeId
                                 join uu in db.JDE_Users on p.StartedBy equals uu.UserId into started
                                 from star in started.DefaultIfEmpty()
                                 join lsu in db.JDE_Users on p.LastStatusBy equals lsu.UserId into lastStatus
                                 from lStat in lastStatus.DefaultIfEmpty()
                                 join pl in db.JDE_Places on p.PlaceId equals pl.PlaceId
                                 join s in db.JDE_Sets on pl.SetId equals s.SetId
                                 join a in db.JDE_Areas on pl.AreaId equals a.AreaId
                                 join h in db.JDE_Handlings on p.ProcessId equals h.ProcessId into hans
                                 from ha in hans.DefaultIfEmpty()
                                 where p.TenantId == tenants.FirstOrDefault().TenantId && p.CreatedOn >= dFrom && p.CreatedOn <= dTo
                                 group new { p, fin, t, u, at, started, lastStatus, lStat, pl, s, a, ha }
                                 by new
                                 {
                                     p.ProcessId,
                                     p.Description,
                                     p.StartedOn,
                                     p.StartedBy,
                                     p.FinishedOn,
                                     p.FinishedBy,
                                     p.PlannedFinish,
                                     p.PlannedStart,
                                     p.PlaceId,
                                     pl.SetId,
                                     SetName = s.Name,
                                     pl.AreaId,
                                     AreaName = a.Name,
                                     p.Reason,
                                     p.CreatedBy,
                                     CreatedByName = u.Name + " " + u.Surname,
                                     p.CreatedOn,
                                     p.ActionTypeId,
                                     p.Output,
                                     p.InitialDiagnosis,
                                     p.RepairActions,
                                     p.TenantId,
                                     p.MesId,
                                     p.MesDate,
                                     TenantName = t.TenantName,
                                     p.IsActive,
                                     p.IsCompleted,
                                     p.IsFrozen,
                                     p.IsSuccessfull,
                                     ActionTypeName = at.Name,
                                     FinishedByName = fin.Name + " " + fin.Surname,
                                     StartedByName = star.Name + " " + star.Surname,
                                     PlaceName = pl.Name,
                                     LastStatus = p.LastStatus == null ? (ProcessStatus?)null : (ProcessStatus)p.LastStatus, // Nullable enums handled
                                     p.LastStatusBy,
                                     LastStatusByName = lStat.Name + " " + lStat.Surname,
                                     p.LastStatusOn
                                 } into grp
                                 orderby grp.Key.CreatedOn descending
                                 select new Process
                                 {
                                     ProcessId = grp.Key.ProcessId,
                                     Description = grp.Key.Description,
                                     StartedOn = grp.Key.StartedOn,
                                     StartedBy = grp.Key.StartedBy,
                                     StartedByName = grp.Key.StartedByName,
                                     FinishedOn = grp.Key.FinishedOn,
                                     FinishedBy = grp.Key.FinishedBy,
                                     FinishedByName = grp.Key.FinishedByName,
                                     ActionTypeId = grp.Key.ActionTypeId,
                                     ActionTypeName = grp.Key.ActionTypeName,
                                     IsActive = grp.Key.IsActive,
                                     IsFrozen = grp.Key.IsFrozen,
                                     IsCompleted = grp.Key.IsCompleted,
                                     IsSuccessfull = grp.Key.IsSuccessfull,
                                     PlaceId = grp.Key.PlaceId,
                                     PlaceName = grp.Key.PlaceName,
                                     SetId = grp.Key.SetId,
                                     SetName = grp.Key.SetName,
                                     AreaId = grp.Key.AreaId,
                                     AreaName = grp.Key.AreaName,
                                     Output = grp.Key.Output,
                                     TenantId = grp.Key.TenantId,
                                     TenantName = grp.Key.TenantName,
                                     CreatedOn = grp.Key.CreatedOn,
                                     CreatedBy = grp.Key.CreatedBy,
                                     CreatedByName = grp.Key.CreatedByName,
                                     MesId = grp.Key.MesId,
                                     InitialDiagnosis = grp.Key.InitialDiagnosis,
                                     RepairActions = grp.Key.RepairActions,
                                     Reason = grp.Key.Reason,
                                     MesDate = grp.Key.MesDate,
                                     PlannedStart = grp.Key.PlannedStart,
                                     PlannedFinish = grp.Key.PlannedFinish,
                                     LastStatus = grp.Key.LastStatus,
                                     LastStatusBy = grp.Key.LastStatusBy,
                                     LastStatusByName = grp.Key.LastStatusByName,
                                     LastStatusOn = grp.Key.LastStatusOn,
                                     OpenHandlings = grp.Where(ph => ph.ha.HandlingId > 0 && (ph.ha.IsCompleted == null || ph.ha.IsCompleted==false)).Count(),
                                     AllHandlings = grp.Where(ph=>ph.ha.HandlingId > 0).Count()
                                 });

All this mess just to get OpenHandlings and AllHandlings numbers from handlings table.. But it was only the very first act of this tragedy. Then I needed to add also aggregated field from JDE_ProcessAssigns table (many-to-1 to Process), not to complicate existent groupings even more, I decided to add it the subquery way this time:

AssignedUsers = (from pras in db.JDE_ProcessAssigns
                 join uu in db.JDE_Users on pras.UserId equals uu.UserId
                 where pras.ProcessId == grp.Key.ProcessId
                 select uu.Name + " " + uu.Surname)

As it worked and didn’t introduce additional complexity to my code, I thought that’s the way to go and implemented few more properties like this. For example, I needed to a field that summed up GivenTime parameter from all associated ProcessActions (many-to-1 with Process):

GivenTime = (from prac in db.JDE_ProcessActions
             join a in db.JDE_Actions on prac.ActionId equals a.ActionId
             where prac.ProcessId == grp.Key.ProcessId
             select a.GivenTime).Sum(),

I won’t put all the additional properties here as you’ve probably already got the idea..

Hitting the wall

Finally, after implementing few more properties like presented above, I hit the wall. I needed to implement yet another property that fetched all AbandonReasons assigned to all ProcessActions associated with this process. I decided to use the strategy that has worked so far and did it the subquery way:

AbandonReasons = (from pas in db.JDE_ProcessActions
                  join ars in db.JDE_AbandonReasons on pas.AbandonReasonId equals ars.AbandonReasonId into ar
                  from reasons in ar.DefaultIfEmpty()
                  where pas.ProcessId == grp.Key.ProcessId
                  select reasons.Name
                  ).Distinct()

Of course, it worked but at the same time it made my query take about 2 times longer to complete.. It hasn’t been all roses even before – the query was taking about 4 seconds to complete, and with AbondonReasons, it was taking almost 10 seconds. Quite unexpected, that introduction rather small chunk of code has so significant impact on processing time. I think that maybe my query reached a point where it was too complex/messy for EF algorithms to optimize it well behind the scene.. Nevertheless, it was the last straw, the long overdue decision to refactor the query was finally taken. How? Jump to part II to find out:)

Dodaj komentarz