Data Quality: The Key to Reliable Decision-Making

Many executives in data-driven organisations believe that the data they collect does not have the necessary precision and reliability. This leads to concerns about garbage in, garbage out. Discover the key characteristics of data quality and why it is paramount for solid decision-making. Explore how you can improve this within your organisation today!

characteristics of data quality

7-minute read

What is Data Quality?

The quality of data is the extent to which the data is suitable for the intended purpose. This is determined by a number of characteristics, the required level of which will vary from situation to situation.

Why is Good Data Important?

Initially, the importance of data was not in question in many organisations. Low data quality can have serious consequences and is often the source of inaccurate analyses, extra costs, unreliable plans, and operational blunders. Good data has different advantages:

  • Competitive advantage: Organisations that have higher data quality than competitors or that understand and use data better than competitors have a competitive advantage. Better quality means identifying opportunities before the competition.
  • More trust: One of the main barriers to becoming a data-driven organisation is a lack of trust in data quality. The higher the data quality, the more confidence users will have in the results of data analyses. When outcomes are reliable, the amount of guesswork and risk involved in decision-making is reduced.
  • Better decision-making: As an organisation makes increasingly data-based decisions, it must be able to base those judgements on reliable, accurate, and complete data.
  • Higher productivity: Data users can be more productive when they have access to high-quality data. They can concentrate their time on analysing data rather than validating and correcting faults.
  • Better efficiency: Good data helps reduce unnecessary costs. For example, an accurate customer dataset minimises the number of incorrect deliveries of products or e-mails.
  • Prevents reputational damage: This can range from minor, everyday damage that businesses may be unaware of (for example, wrong spelling of names, sending e-mails to deceased persons) to major public relations disasters.
  • Avoids fines: Good quality is also crucial for data compliance, especially in highly regulated industries. Failure to properly implement these rules can lead to fines from, among others, the privacy watchdog. When data in an organisation is disorganised or poorly managed, it is more difficult to demonstrate compliance.

11 Characteristics of Data Quality

The degree of data quality is expressed in a number of characteristics or dimensions. These can be objective (number of errors or missing values) or subjective (fitness for purpose). Because the goal defines the relevance and required quality level of data, naming generic characteristics is difficult; features can also overlap. The most commonly used characteristics of data quality are: accuracy, reliability, completeness, consistency, timeliness, and uniqueness.

  • Relevance: This is a more subjective and comprehensive assessment of data quality. Data is useless if it is not relevant to the intended purpose. That's why it's crucial to define goals so you know what kind of data you need to know and what level of quality you need to collect.
  • Completeness: The extent to which a dataset contains all the values necessary to complete the task at hand. Identifying an incomplete dataset is different from looking for empty cells. The lack of first names is not a problem for an e-mail campaign, but it is if you want to sort this dataset by name. Another example is that having a complete customer base makes it possible to personalise communication with customers.

    The percentage of missing relevant values in a dataset can be calculated vertically (attribute level) or horizontally (record level).

  • Reliability: The degree to which data is true and factual.
  • Validity: Data is considered valid if it has the correct format, type, and range. This may differ based on the country, sector, or standards used. Here are several examples:
    • Data type: numeric, boolean, labels.
    • Range: values must be within a certain interval; for example, a birth year of 201 is invalid because it is outside the date range.
    • Patterns: When dates do not meet established standards, they are considered invalid, for example, MM-DD-YYYYY for a date of birth.
    • The strict requirement that a telephone number must contain only digits makes validation easier and prevents errors, so for Dutch telephone numbers: 13 digits, 0031 instead of +31, and no spaces or hyphens.
    • Identification numbers instead of names that can be spelled in many ways.
  • Accuracy: How effectively does the data describe the real-world conditions it is trying to describe? This is one of the most important properties of high-quality data. Accuracy can be checked by comparing data with a reliable source.
  • Identifiability: The extent to which data records are uniquely identifiable and the dataset is free of duplicate records.
  • Consistency: Similar data recorded in different sources should have the same meaning, structure, and format. This determines reliability. The chance of inconsistencies increases as the number of sources increases. Data in one location can be updated, but not in another.

    For example, data must all have the same structure (+31 versus 0031 for telephone numbers or 10:00 PM versus 10:00 PM) or the same unit (kg versus gramme).

  • Currentality: How current is the data? As time passes, the data becomes less useful and less accurate. More current data is more likely to reflect contemporary reality.
  • Metadata: Data about data; the quality of the description of the dataset (definitions, abbreviations, units, calculation methods, structure, sources).
  • Open data: Open data facilitates transparency, accountability, and public participation, for example, by quickly identifying data inaccuracies. Important obstacles are the commercial value of data and the sensitivity of data.
  • Accessibility: How easily and quickly is the required data available? A user who needs isolated data must overcome numerous difficulties to obtain this data. This is not only a waste of time but also increases the chance that data will be out of date when it becomes available. Sensitive data is often not made public or only shared under strict restrictions.

Is There a Standard Approach to Improving Data Quality?

No, because the definition of high data quality will vary depending on the intended purpose. The quality of data is more about ensuring that it is fit for purpose than meeting strict standards. Although there is no standard approach, there are general conditions for good data quality:

  • Data Collection: What type of data is needed to achieve your goals, as well as the techniques for collecting and managing this data? And, very importantly, investigate the impact of data quality on the task at hand.
  • Data Quality Standards: Establish per goal standards by identifying the characteristics of the data and the desired level. Which data do you store, delete, and correct?
  • Data correction: Create rules for correcting data. What to do about missing values, errors, and outliers. Data cleaning is the process of improving the quality of data.
  • Data integration and distribution: This is about how data is exchanged between departments. Data quality issues often arise at this stage, as data can be changed during this exchange.
  • Knowledge management: Explain in a data catalogue the results of quality measurements and user experiences with datasets, especially if they are regularly reused. This prevents users from having to constantly reinvent the wheel. Improving data quality is an ongoing effort to identify potential issues affecting the quality of the organisation's data capital.

Symbio6 & Data Quality

In short, good data is the backbone of effective decision-making, operational efficiency, and confidence in the information we use. It therefore deserves the necessary attention and investment to ensure that the data is reliable and useful. Symbio6 helps customers improve this success factor and thus achieve better automated decision-making.

Start transforming garbage in and garbage out into high-value in and high-value out today.