I think every dev has a code he’s not particularly proud of. Sometimes you start with relatively simple code, but as the time passes you keep adding functions and after some time you realize you’ve created a monster! You feed it as it still works and does its job, but an idea to refactor it just gives you a headache.. You learn to live with it and as you stay out of each other’s way, all is fine..
Creating the monster
My monster is a Web Api method to fetch the processes handled by maintenance department. It was trivial at the beginning, just getting some processes from processes table, joining it with some other tables to replace foreign keys with actual data. Everything done with Entity Framework.
var items = (from p in db.JDE_Processes
join uuu in db.JDE_Users on p.FinishedBy equals uuu.UserId into finished
from fin in finished.DefaultIfEmpty()
join t in db.JDE_Tenants on p.TenantId equals t.TenantId
join u in db.JDE_Users on p.CreatedBy equals u.UserId
join at in db.JDE_ActionTypes on p.ActionTypeId equals at.ActionTypeId
join uu in db.JDE_Users on p.StartedBy equals uu.UserId into started
from star in started.DefaultIfEmpty()
join pl in db.JDE_Places on p.PlaceId equals pl.PlaceId
where p.TenantId == tenants.FirstOrDefault().TenantId && p.StartedOn >= dFrom && p.StartedOn <= dTo
orderby p.CreatedOn descending
select new Process
{
ProcessId = p.ProcessId,
Description = p.Description,
StartedOn = p.StartedOn,
StartedBy = p.StartedBy,
StartedByName = star.Name + " " + star.Surname,
FinishedOn = p.FinishedOn,
FinishedBy = p.FinishedBy,
FinishedByName = fin.Name + " " + fin.Surname,
ActionTypeId = p.ActionTypeId,
ActionTypeName = at.Name,
IsActive = p.IsActive,
IsFrozen = p.IsFrozen,
IsCompleted = p.IsCompleted,
IsSuccessfull = p.IsSuccessfull,
PlaceId = p.PlaceId,
PlaceName = pl.Name,
Output = p.Output,
TenantId = p.TenantId,
TenantName = t.TenantName,
CreatedOn = p.CreatedOn,
CreatedBy = p.CreatedBy,
CreatedByName = u.Name + " " + u.Surname
});
As this method is a pivotal part of my application, there was constant need to extend the response with extra table fields. Half of the problem if the fields to be added were from table that is in 1-to-1 relation with processes table, then simple join statement to desired table was getting the job done. It was far worse when it had to contain a kind of aggregation field from a table that is in many-to-1 relation with processes e.g. number of handlings each process had (1 process = many handlings).
Being Database-first kind of dev, I knew 2 ways to tackle the problem: either making the query aggregation query with lots of grouping or putting extra field as a kind of subquery. As I thought aggregation queries are better performance-wise, I decided to use those. Btw, EF’s aggregation queries’ syntax is rather uggly comparing to clean & simple syntax of plain SQL.
var items = (from p in db.JDE_Processes
join uuu in db.JDE_Users on p.FinishedBy equals uuu.UserId into finished
from fin in finished.DefaultIfEmpty()
join t in db.JDE_Tenants on p.TenantId equals t.TenantId
join u in db.JDE_Users on p.CreatedBy equals u.UserId
join at in db.JDE_ActionTypes on p.ActionTypeId equals at.ActionTypeId
join uu in db.JDE_Users on p.StartedBy equals uu.UserId into started
from star in started.DefaultIfEmpty()
join lsu in db.JDE_Users on p.LastStatusBy equals lsu.UserId into lastStatus
from lStat in lastStatus.DefaultIfEmpty()
join pl in db.JDE_Places on p.PlaceId equals pl.PlaceId
join s in db.JDE_Sets on pl.SetId equals s.SetId
join a in db.JDE_Areas on pl.AreaId equals a.AreaId
join h in db.JDE_Handlings on p.ProcessId equals h.ProcessId into hans
from ha in hans.DefaultIfEmpty()
where p.TenantId == tenants.FirstOrDefault().TenantId && p.CreatedOn >= dFrom && p.CreatedOn <= dTo
group new { p, fin, t, u, at, started, lastStatus, lStat, pl, s, a, ha }
by new
{
p.ProcessId,
p.Description,
p.StartedOn,
p.StartedBy,
p.FinishedOn,
p.FinishedBy,
p.PlannedFinish,
p.PlannedStart,
p.PlaceId,
pl.SetId,
SetName = s.Name,
pl.AreaId,
AreaName = a.Name,
p.Reason,
p.CreatedBy,
CreatedByName = u.Name + " " + u.Surname,
p.CreatedOn,
p.ActionTypeId,
p.Output,
p.InitialDiagnosis,
p.RepairActions,
p.TenantId,
p.MesId,
p.MesDate,
TenantName = t.TenantName,
p.IsActive,
p.IsCompleted,
p.IsFrozen,
p.IsSuccessfull,
ActionTypeName = at.Name,
FinishedByName = fin.Name + " " + fin.Surname,
StartedByName = star.Name + " " + star.Surname,
PlaceName = pl.Name,
LastStatus = p.LastStatus == null ? (ProcessStatus?)null : (ProcessStatus)p.LastStatus, // Nullable enums handled
p.LastStatusBy,
LastStatusByName = lStat.Name + " " + lStat.Surname,
p.LastStatusOn
} into grp
orderby grp.Key.CreatedOn descending
select new Process
{
ProcessId = grp.Key.ProcessId,
Description = grp.Key.Description,
StartedOn = grp.Key.StartedOn,
StartedBy = grp.Key.StartedBy,
StartedByName = grp.Key.StartedByName,
FinishedOn = grp.Key.FinishedOn,
FinishedBy = grp.Key.FinishedBy,
FinishedByName = grp.Key.FinishedByName,
ActionTypeId = grp.Key.ActionTypeId,
ActionTypeName = grp.Key.ActionTypeName,
IsActive = grp.Key.IsActive,
IsFrozen = grp.Key.IsFrozen,
IsCompleted = grp.Key.IsCompleted,
IsSuccessfull = grp.Key.IsSuccessfull,
PlaceId = grp.Key.PlaceId,
PlaceName = grp.Key.PlaceName,
SetId = grp.Key.SetId,
SetName = grp.Key.SetName,
AreaId = grp.Key.AreaId,
AreaName = grp.Key.AreaName,
Output = grp.Key.Output,
TenantId = grp.Key.TenantId,
TenantName = grp.Key.TenantName,
CreatedOn = grp.Key.CreatedOn,
CreatedBy = grp.Key.CreatedBy,
CreatedByName = grp.Key.CreatedByName,
MesId = grp.Key.MesId,
InitialDiagnosis = grp.Key.InitialDiagnosis,
RepairActions = grp.Key.RepairActions,
Reason = grp.Key.Reason,
MesDate = grp.Key.MesDate,
PlannedStart = grp.Key.PlannedStart,
PlannedFinish = grp.Key.PlannedFinish,
LastStatus = grp.Key.LastStatus,
LastStatusBy = grp.Key.LastStatusBy,
LastStatusByName = grp.Key.LastStatusByName,
LastStatusOn = grp.Key.LastStatusOn,
OpenHandlings = grp.Where(ph => ph.ha.HandlingId > 0 && (ph.ha.IsCompleted == null || ph.ha.IsCompleted==false)).Count(),
AllHandlings = grp.Where(ph=>ph.ha.HandlingId > 0).Count()
});
All this mess just to get OpenHandlings and AllHandlings numbers from handlings table.. But it was only the very first act of this tragedy. Then I needed to add also aggregated field from JDE_ProcessAssigns table (many-to-1 to Process), not to complicate existent groupings even more, I decided to add it the subquery way this time:
AssignedUsers = (from pras in db.JDE_ProcessAssigns
join uu in db.JDE_Users on pras.UserId equals uu.UserId
where pras.ProcessId == grp.Key.ProcessId
select uu.Name + " " + uu.Surname)
As it worked and didn’t introduce additional complexity to my code, I thought that’s the way to go and implemented few more properties like this. For example, I needed to a field that summed up GivenTime parameter from all associated ProcessActions (many-to-1 with Process):
GivenTime = (from prac in db.JDE_ProcessActions
join a in db.JDE_Actions on prac.ActionId equals a.ActionId
where prac.ProcessId == grp.Key.ProcessId
select a.GivenTime).Sum(),
I won’t put all the additional properties here as you’ve probably already got the idea..
Hitting the wall
Finally, after implementing few more properties like presented above, I hit the wall. I needed to implement yet another property that fetched all AbandonReasons assigned to all ProcessActions associated with this process. I decided to use the strategy that has worked so far and did it the subquery way:
AbandonReasons = (from pas in db.JDE_ProcessActions
join ars in db.JDE_AbandonReasons on pas.AbandonReasonId equals ars.AbandonReasonId into ar
from reasons in ar.DefaultIfEmpty()
where pas.ProcessId == grp.Key.ProcessId
select reasons.Name
).Distinct()
Of course, it worked but at the same time it made my query take about 2 times longer to complete.. It hasn’t been all roses even before – the query was taking about 4 seconds to complete, and with AbondonReasons, it was taking almost 10 seconds. Quite unexpected, that introduction rather small chunk of code has so significant impact on processing time. I think that maybe my query reached a point where it was too complex/messy for EF algorithms to optimize it well behind the scene.. Nevertheless, it was the last straw, the long overdue decision to refactor the query was finally taken. How? Jump to part II to find out:)