The owls are not what they seem: analyzing multi-modal distributions

Fernando Cuenca
Nov 19, 2024
3 min read

When we start collecting Lead Time data and visualizing it in a histogram, we'd love to see a smooth shape that we can neatly fit under the prototypical, left-skewed curve we saw in training classes. Something like this:

However, more often than not the shapes we see are a little more "messy". Perhaps something like this:

"Not to worry!", you say. "I learnt in training that this is a multi-modal distribution, where each hump is the sign of a different work item type."

Indeed, after doing some analysis of the items represented in the chart, you discover that the items contained under each "hump" correspond mostly (although not exclusively) to items of different nature:

Case closed, right? Well, maybe not...

First, you will probably notice that the composition of the "humps" is not perfectly homogeneous. For example, some "New Features" will show up under the hump where most of the "Tech Debt" items are, or perhaps were delivered as fast as the Defects typically where.

Moreover, what if you find that all the items in the data set are of the same type? what can then explain the multiple humps?

We need to remember that the chart we're looking at is reflecting time, and more specifically, the impact of delay. Different kinds of work likely have exposure to different kinds of delay, so if we have (accidentally or unwittingly) mixed work of different kinds in the same data set, that will result in items clumping under the different humps. But really what's "clumping" items together is the fact that they ended up experiencing the same delay, and maybe that happened because of the same (or similar) reason.

So, let's say that we don't just consider the kind of work for the items, but also we investigate what happened to them. what were the factors that contributed to items taking what they took to be delivered. Perhaps we'll find that, for example, most items under the same hump were exposed to the same external dependency, or affected by the same environment outage, etc.

Classifying work based on their delay risk exposure may prove to be more useful and actionable than just knowing that they are different work item types. A "multi-modal" distribution could be a signal of something much more interesting that just different work types.

By the way, this kind of analysis can be useful even if we don't see multi-modal distributions. Consider the following example:

At first glance, it might fit the usual shape for an (extremely) left-skewed curve. But let's say you take a closer look at the item that fell to the far right (taking 70 days), and you discover that the main reason it took that long was its dependency on an external specialist (a DBA, for example). You decide, then, to analyze all the other items to find which ones also had a DBA dependency, and this is what you find:

This version of the diagram splits each bar in the previous chart, showing the number of items that required the DBA (in red), and those that didn't.

By looking at this picture, you can conclude that items that don't have a dependency on the external DBA (green) are usually delivered pretty fast (7 days in most cases), but those that have the external dependency (red) normally take twice as long, and in extreme cases, a lot longer.

With these insights, now you can make some decisions about what to do about all this. Perhaps this can help you make the case for breaking the external DBA dependency; or maybe it can serve as a way of setting delivery expectations, or monitoring progress of work to expedite if needed.

Much more interesting, indeed.... 😉

SQUIRRELNORTH

CONTACT US

The owls are not what they seem: analyzing multi-modal distributions

Recent Posts

Comments

SIGN UP TO FOR OUR NEWSLETTER

SquirrelNorth

CONTACT US