Understanding data quality dimensions and metrics.
In this last part of our data quality blog series, we look at the 7 dimensions of data quality and the ways you can assess and fix data quality issues. If you haven’t read our first two articles on the main causes of poor data quality yet, check out:
Top causes of poor data quality and how to fix them- Technology (click here)
Top causes of poor data quality and how to fix them – People and processes (click here)
Data Quality Is Not An IT Problem
It’s a business problem.
Poor data quality can affect an organisation’s ability to make timely and accurate decisions.
Every organisation needs to realise that managing data quality is not an IT problem. Whilst IT staff may help fix poor data quality, the responsibility for data lies with the business. For example, you wouldn’t expect IT to be responsible for your organisation’s financial data. It should be the responsibility of your finance team.
Data Quality Dimensions
Data quality dimensions allow you to capture specific attributes so you can measure your data correctly and manage its quality effectively. When assessing these quality dimensions, you need to consider which ones are more important than others within your organisation. For example, is the timeliness of the data more important than its accuracy?
Let’s look at the core dimensions of data quality.
Is the data accurate and valid? If yes, to what level?
Accuracy refers to the degree to which the data matches the reality of the event or object it describes. A simple example is having a correct physical address, email address or phone number.
How complete is the data? Are there known gaps?
Data is considered complete when all necessary information is available for use.
Remember that data can be complete even if some optional details are missing. What’s important is that the data has been captured against what was supposed to be captured.
For example, a customer record is complete if you have captured all of the mandatory data that you require to know about your customer.
Is the data consistent with related datasets, agreed standards and formats?
Consistency of data means that the data is collected, grouped, structured, and stored in a uniform and standardised way regardless of where it is captured or collected. This requires standard concepts, business and data definitions, and classifications to be implemented and agreed upon in terms of their meanings and interpretation.
A key thing to watch out for is data that appears to be similar in nature but may have been captured for different reasons and use. This can result in confusion or misinterpretation.
Is the data relevant to the end-user? How well does it meet their needs?
Relevance refers to the extent to which the data meets the defined purpose which initiated its collection or creation. For the data to be relevant it must be indicative of the environment in which it was collected or created, as well as reflective of the situation it is attempting to describe.
Think about what should and shouldn’t be included, as well as the reference period. Check if there are known gaps and the granularity of the required data.
Is your data available when you need it?
Timeliness refers to the degree to which data is available and accessible when needed. Remember that not all data is needed in real-time, so part of this is to define who needs the data and when.
What was the collection method? Was it consistent?
Data collection methods must be appropriate for the type of data being collected. For example, automated extraction and submission techniques may be more appropriate than manual data entry where the data already exists in other systems. Whereas a survey may be better where the data is to be collected from many individuals.
Collection methods must be consistent, especially if the same data is collected multiple times or is to be compared to other data assets.
Is your data serving the purpose you are collecting or capturing it for?
Data is considered fit for purpose when it is appropriate for its intended use. The purpose could include decision making, reporting, and regulatory or administrative requirements.
In this context, the purpose against which the data is measured is the original intended purpose. Data may not be fit for purpose when the future use of the data is not known at the time the original data was collected.
An understanding of who the potential users might be and their expectations of data quality must also be balanced against the original business intent of the data asset. Fitness for purpose can be subjective and therefore difficult to measure, but is a key component of data quality across all dimensions.
Assessing The Quality Of Your Data
Use data quality metrics to assess the quality of your business data.
Data metrics give you a benchmark of the quality of your data. They also allow you to determine how widespread your data quality issues are.
Considerations When Defining Your Data Metrics
Not all your data needs to be 100% perfect all the time.
You may find that critical operational data may need to be perfect but data used for secondary purposes such as research doesn’t have to be.
- Make sure you have a baseline measurement and target to start with. Knowing where you are at and what you need to accomplish with your data will help you establish your data metrics
- Have as many metrics as you can that can be measured via data profiling. It’s important to remember that subjective metrics can often be too heavily based on opinion, so try to have as many based on facts from profiling the data
- Apply metrics to the data that will ultimately deliver business value. Consider which data quality dimensions you should prioritise
- Profile your data against your defined metrics so you can fully understand your data, and identify errors or where the data needs to be improved. This will allow you to discover any issues before they turn into major business problems
Assess whether it‘s a system, a process or a human-error issue.
System issues can happen when users enter incorrect data caused by the system they are using. For example, the system may allow free text fields or lack validation upon entry. Process issues on the other hand can result from not enforcing effective business processes or when processes are not aligned with the business objectives.
Human errors are caused by someone’s behaviour, or can simply be a mistake. For example, an employee creates a new customer record because they did not search first, or chooses the first item in a drop-down because it’s easier. It could also be that the people within your organisation do not understand the value of the data and its importance to the organisation, and therefore don’t enter quality data.
Make Recommendations And Fix Issues
Determine the impact and causes of your data quality issues so you can prioritise what needs fixing. Once you’ve done this, you can then identify what kind of solution you need to apply.
This is where you focus on the data that needs to be fixed instead of addressing the cause.
When taking this approach keep in mind that you will probably need to repeat this step sometime in the future. This is a completely valid measure as addressing the cause might not always be possible for an organisation. This step is taken when the effort to fix the cause of the issue outweighs the benefits.
For example, the cause may be an old system that cannot be improved and the ultimate solution would be to buy and implement a new system. However, you may find that the cost and effort of this solution far outweigh the cost of applying regular tactical fixes. You may not have the budget for a large new implementation. In this case, a tactical fix would be the better option.
Strategic + Tactical Fix
Ideally, you want to address both the cause of the poor data and the problematic data itself.
Applying both strategic and tactical fixes can significantly improve the quality of your data and stop issues from resurfacing. If this is possible for your organisation, then this is the best recourse. This means you may need to make changes to your systems, your processes and/or the behaviours of the staff who collect, create and manage the data. You can check our previous articles for tips on how to address the causes of poor data quality:
Data Agility is driven to help organisations make better business decisions through the application of their data. If you want help in measuring and managing your data quality contact us today.