[go: up one dir, main page]

DEV Community

Cover image for Data Quality: Technical Debt From Hell
Miguel Barba
Miguel Barba

Posted on

Data Quality: Technical Debt From Hell

This post was originally published here.

Technical debt can take many forms as most of you probably know it. Data quality is just one of those forms.

Data quality is a major issue in large organisations, where multiple systems, applications and services interact between themselves, exchanging and altering data. Incoherences will always occur. Either because someone made a mistake, either because there’s an unidentified bug somewhere, either because the system’s architecture isn’t as robust as it should or simply because people prefer to ignore it (this last one happens way more than it should, trust me!). This will contribute for a consistent and sometimes quiet but steady increase of your technical debt.

Don’t let your selves be fooled. It’s easy to start getting reckless regarding data quality, furthermore when you’re working on a data migration project for several months, for example, and you simply start to reach a point when, even though you don’t want to, you start getting tired and making mistakes. Hell, sometimes you’re just having a bad day… It can happen to any of us!

There are several ways to address this issue. Here are some ways:

  1. Let’s start by stating the obvious. Make sure your services are the least error-prone possible. This is where all technical debt should start being addressed: before it even has a chance to exist. I know I’m being a dreamer, unrealistic or whatever but in a perfect world, this is how it should work.
  2. It’s nice when you have a team dedicated to performing data analysis and comparisons between different systems and enforcing data repairs in order to correct the data; this action should be performed as often as possible on a regular basis.
  3. Understanding what’s causing the incoherence; it may be very difficult to achieve it but when you manage to do it, the probability of eradicating it will be extremely high. This all looks great and relatively simple to achieve but then enters another common problem in any organisation: people. We are our worst enemies most of the cases when we end up going in opposite directions when trying to solve some problem we have in common. Fortunately, this doesn’t happen always and as time goes by the tendency it’s becoming quite the opposite, although there’s still a large margin for improvement (there always is!).

The idea of writing this post was a direct result of identifying a data incoherence carried throughout 4 system upgrades and migrations and that finally haunted someone (me in this particular case) more than 16 years later the flawed data was first created. Nightmarish, don’t you think? What’s yours?

Photo credit: T a k (https://www.flickr.com/photos/takashi/18862634/) via VisualHunt / CC BY-NC

Top comments (4)

Collapse
 
trickvi profile image
Tryggvi Björgvinsson

Nice post. I'm the author of an upcoming book (in early access program now) called The Art of Data Usability which is at its core about data quality. I've never thought of data quality as technical debt. That's a really nice way to frame it. I really like it :)

One thing I'd recommend is setting up monitoring of your quality attributes (like the incoherency you talk about). You monitor the attributes to make sure the quality continues to stay at the level you want (that you don't start collecting technical debt again) but you do it from the start (when you start the working on lowering the debt) to know when you've reached that level of quality. As you said, we start making mistakes, have a bad day or something. Monitoring quality helps us stay focused.

You can think of it as data quality tests. You monitor afterwards for regression testing and you develop the metrics and monitoring before you start as some sort of a TDD approach.

Again, a really good post and a fresh perspective on data quality.

Collapse
 
m1pko profile image
Miguel Barba

Thanks for the feedback.

And congrats on your book, by the way!

"One thing I'd recommend is setting up monitoring of your quality attributes" - Yes, that would be the ideal scenario and it used to happen here but unfortunately the team responsible for doing it is from another department and our priorities and approaches to problem solving aren't always as aligned as they should be, so this ends up having a negative impact when it comes to detect and correct data issues on a regular basis.

Collapse
 
alanmbarr profile image
Alan Barr

Painful stuff. One area I have seen recently is not owning or having a solid handle on the full domain model of one's data. Or even just being at the whim of a third parties representation of it. Huge effort behind this if it's not considered from the beginning or early on.

Collapse
 
m1pko profile image
Miguel Barba

I was reading the latest post by John Allspaw when I realized that it sums up perfectly the concept I was referring to when I wrote this post:

"My main argument isn’t that technical debt’s definition has morphed over time; many people have already made that observation. Instead, I believe that engineers have used the term to represent a different (and perhaps even more unsettling) phenomenon: a type of debt that can’t be recognized at the time of the code’s creation. They’ve used the term “technical debt” simply be- cause it’s the closest descriptive label they’ve had, not because it’s the same as what Cunningham meant. This phenomenon has no countermeasure like refactoring that can be applied in anticipation, because it’s invisible until an anomaly reveals its presence."

Feel free to read the complete post here, because it's quite worth it!