Recently I have been looking into how to track overall improvement in a code-base due to the introduction and continued use of testing. There are a number of metrics that are already defined, ranging from fairly complex ones that can measure quality across multiple projects in a large company, to metrics suited for a single-developer product. That being said, I haven’t run across any that I liked for my situation. To overcome this, I came up with the How Stupid Did It Make Us Look (HSDIMUL) metric.

The Requirements

First let’s look at the requirements I have for a test metric:

  1. Reflects potential damage to reputation - the metric needs to reflect the reality that a bug that gets caught by a customer is more damaging to the company than one caught internally (in more than just cost).
  2. Addresses the project as a whole - metrics such as code coverage can be very useful in the right context, but this metric needs to indicate how the project as a whole is doing, not just some particular portion.
  3. Simple - it should be easy for anyone to look at how the metric is calculated and understand why they scored the way they did.

Terms

  • Project Health - metrics related to how a project as a whole is doing, covering everything from the code itself to the development process. Includes metrics such as HSDIMUL.
  • Code Health - metrics that indicate the quality of the code (bug free, maintainable, etc). Includes metrics such as code coverage.
  • HSDIMUL Value - a value related to an issue that indicates what phase of development life-cycle the issue was discovered in.
  • HSDIMUL Score - a calculated value, averaging a set of HSDIMUL Values to provide a metric.

The HSDIMUL Metric

To meet the requirements above, I propose the HSDIMUL Metric - or the How Stupid Did It Make Us Look Metric. This metric assigns a number to various stages in the lifecycle where a bug can be discovered - with the lowest value (at 0) being a bug discovered by the developer before committing/merging to a location where the rest of the team can see things, to a customer using the product in a production environment (this varies depending on the project’s preferences). As you will see below, these values are not assigned linearly, instead adding a large amount of weight to bugs found by or in-front of customers. With these numbers assigned, collect them at the end of each development cycle, calculate the average, and that is your overall HSDIMUL score - the lower the score, the less stupid you looked. By tracking the HSDIMUL score over time, you can measure how effective your tests have been at improving the quality of your product as delivered to customers.

Default HSDIMUL Values

We’ll get into how to come up with HSDIMUL values customized to your project later on, but the following chart shows the default values.

Life-Cycle Stage HSDIMUL Value Expected Developer Reaction Emoticon Representation
Uncommitted/In Single-Dev Branch 0 Happy - after all this is a bug that doesn’t have to be reported! 😁
Discovered By Code Review Process 1 Pleased - its a little embarrassing, but no big deal 😌
Discovered During Acceptance or Functional Testing 2 Mildly Concerned - now more unit tests need to written, or the code needs to be rewritten to properly confirm to requirements 😑
Discovered by QA 10 Concerned - that’s what they’re there for but still… 😓
Discovered in Demo/Support with Customer 15 Very Concerned - how did we miss this! 😟
Reported by Customer from a Production System (Non-Critical) 30 Getting Nervous - now things are getting serious 😨
Critical Outage 50 PANIC! 😱

Customizing HSDIMUL Values

Customizing HSDIMUL values, or even adding additional life-cycle phases is encouraged within reason - to customize HSDIMUL to reflect the realities of your project. There are only two hard-and-fast rules related to customizing values:

  1. There must always be a significant increase in the step-size of values for bugs seen by a customer. This is core to the definition of HSDIMUL - a bug seen in-house is far less a problem then one seen outside.
  2. Values must always increase with the life-cycle stage. The later it is, the harder it is to fix, thus the stupider it makes you look.

Let’s look at two examples that might require adjustment.

  1. Your software is safety critical. Since your reputation is very important, and any issues seen by a customer could cost lives (as well as money), so you need to increase the values for anything facing a customer. You might set your top (Critical Outage) value to 100 here.
  2. Your software is used in-house. Since you don’t directly face any paying customers, reputation is less of a problem, so you can adjust the values downwards to reflect this. You might set your Critical Outage value to 30 here, for example.

Applying HSDIMUL

Calculating HSDIMUL Score

There are two ways of calculating HSDIMUL scores, though some implementations may find that they wish to add other calculations to add to their understanding of their project health.

  1. Dev Cycle/Sprint HSDIMUL Score - Calculate the average score of all tickets within a given time-frame. Every development cycle, calculate the HSDIMUL score of all tickets reported in that cycle, and track the score over time to ensure the HSDIMUL score monitor for improvement or problems.
  2. Cumulative Open Issue HSDIMUL Score - this gives you insight into the current state of your project, since it takes into account all known bugs (whereas the Sprint Score only captures how many bugs have been seen in a period of time) A high score means that customers are probably seeing a lot of issues, even if the Sprint Score is low.

Setting HSDIML Values

HSDIMUL values should be set as part of ticket creation. If they are discovered by a developer as part of the typical development process, the value should be self-reported. Issues that are reported outside of the typical development process (by a customer, QA, or working on the code at a later date), the leadership or QA should be responsible for setting the value, to avoid conflicts of interests on the part of the developers.

Note: Scores of 0 should never be reported as issues.

Interpreting HSDIMUL Score

Once you have calculated the HSDIMUL Score, the resulting value provides a feel for the over-all project health, and can be judged on a scale similar to the base HSDIMUL value. A high score means that you are frequently customers to bugs and indicates issues in your development and QA processes, while a lower score means that most bugs are being caught before shipping and suggests that your processes are functioning well. A medium score is most likely an indication that you are exposing some bugs already, and are at high risk of many more occurring if processes are not improved.

Any score of 5 or above is medium, and 10 and up is high.

It is recommended that you adjust the values, rather than changing score interpretation, since that provides you with more ability to emphasize the lifecycle phases you are concerned about.

Taking Action

Since HSDIMUL is an over-all project health metric, exact actions cannot be defined for how to respond to a high score. An investigation is required to understand what is failing in the development process.

Typically, however, the following areas are good starting points for improving the HSDIMUL Score.

  • Improving functional and integration testing - this is especially useful if you have code-health metrics such as code coverage that shows the project code to largely be bug free
  • Improving unit testing - if most of the errors occur because of basic programming mistakes, you probably need to step up unit testing
  • Code review - helps improve code health to avoid programming mistakes
  • Improve requirement communications - it is not uncommon for leadership and developers to speak the same words, and mean very different things. Take the time required to be sure the communication is effective, with diagrams, meeting, sketches - whatever works best.

Conclusion

Hopefully the HSDIMUL Metric will prove useful for measuring the effectiveness of tests over time. I have no immediate plans to implement it in any particular issue tracking system, though I may look at doing so in the future.