Main challenges in B2B data management and what to do with them

Data quality: the stumbling block to data-driven marketing

Bad data. This seems to be the modern villain preventing us from having a data-driven business environment. This concept keeps coming up in conversations, reports, planning and forecasting. It seems easy and logical to blame inaccurate data for preventing us from reaching our goals or conclusions.

According to a Harvard Business Review study, half (50%) of people working with data are wasting their time searching for data, finding and correcting errors, and seeking confirmatory sources for data they don't trust.

To single out inaccurate data as the reason or the main obstacle to achieving our goals is just scratching the surface of a symptom that can have many causes. Let's take a closer look.

What is bad data?

First, what do we consider "bad data", often described as "dirty" or "inaccurate" data? In simple terms, it is data that contains errors, such as misspellings or punctuation, incomplete data, outdated data, duplicate records in the database, and incorrect data associations. Bag data is data that our teams don't trust, or worse, it's data that we trust but shouldn't trust.

So what causes bad data? Many things. Inaccurate data is the result or manifestation of a series of events.

Below are presented one by one, the root cause of the problem and possible steps that can be taken for a quick solution.

Incomplete data

Cause: This can manifest itself in a variety of ways: data completely missing or partially entered. Incompleteness not only limits the information we can get from the data (such as reports and analysis), but also restricts any data-driven operations, such as AI/ML.

Solution: Implement data creation "gatekeepers" that prevent the creation of incomplete data. For example, in forms, an auto-complete feature can be used or provide suggestions to the user, from a robust set of external referential data to complete the form, and ensure that required fields are completed intelligently through data quality checks.

Duplicate data

Cause: This occurs when records fortuitously share attributes with other records in the database. When duplicate data exists in the data ecosystem, the consequences can include over-accounting when aggregating data, leading to incorrect values in reporting and analysis, wasted outreach efforts and confusion. Business management becomes increasingly challenging as the effects of duplicate data accumulate.

Solution: Understanding which "duplicates" to keep, delete or archive requires understanding business needs. Manage data through grouping (i.e., combining) techniques. Group similar versions of these records as members of that group. Choose the best version as the main entity and the rest as members. This is a systematic way to eliminate duplicates in the data. Since not all duplicates are the same, you may want to keep some (due to business or regulatory needs) and keep them within a manageable group. This is what is known as a master or golden record.

The machine learning model in D&B Connect Manage, Dun & Bradstreet's latest data management offering, can drive group-centric resolution of nearly 100% of duplicates to create reliable master records in data sets. What used to be (and still is) a complicated task for most companies is now achievable.

Disparate source systems (data silos)

Cause: It is almost inevitable to have many different source systems. In fact, a 2021 Dun & Bradstreet study found that, on average, technologies used in sales and marketing use at least 10 tools. Today's complex business situation practically mandates this. Being able to manage them as part of the system can be a daunting task. While they may not share the same processes, the data may need to relate to other data sets. The concepts of data warehouses, data lakes and now data grids were conceived to make it possible and scalable to manage data coming from different systems.

Solution: The knee-jerk reaction is to set up a data lake, but this is not sufficient to bring all the data together in one place. Without curating, qualifying and governing the data entering the lake, it could easily become a data swamp. In addition to technically securing the flow of data through connections, such as APIs, thought must be given to mastering the data in the data lake using clustering methodologies to relate data from disparate sources into a common environment. Being able to create a master record by clustering similar entities will provide a more robust understanding of data overlap and net newness. Having a matching/combining engine will help manage both existing and new data sources in the data lake.

Data decay

Cause: Of all enterprise master data, contact data appears to be degrading the fastest. In some areas, data can be found degrading at a rate of 34% annually. This can be alarming for data driven organizations as they derive decision making information from it. The above statistic can be quite discouraging, as we are becoming more and more reliant on data to drive our businesses. The current economic situation makes it even more important to pay attention to the decline of data. Companies going out of business or supply chain issues are a few examples that add complexity to the expected mergers, acquisitions and divestitures the market is experiencing. How can we ensure that data remains relevant?

Solution: Data enrichment. You must be able to periodically cross-check data against a trusted source of external reference data. As the saying goes, don't throw out the baby with the bathwater. It is all too easy to label current data assets as deficient due to poor performance or by hearing anecdotes from those who rely on those assets. Work with external or third-party sources to obtain updated attributes on existing contact data. As mentioned earlier, we face data decay at a rate of 34% or more per year. We need to have an effective enrichment program even with your organization's data accuracy threshold. Performing them in an ad hoc manner could be detrimental to your users, as it will not scale. You need to provide an enrichment strategy and timeline and communicate to stakeholders.

Conclusion: A Case for Data Governance

These recommendations and best practices are just pieces of a larger puzzle. There is a strong need for data governance to establish policies and meet data quality standards in order to stop the flow of poor data into our data assets. The good news is that many of the proposed solutions are achievable, and can be automated at large scale with AI and ML.

The recommendations above, in addition to understanding where, when and how to implement these steps, are crucial to your data strategy. The solution and the root of the problem are the same: data governance. It is a function we can no longer do without. Our increasing dependence on data proves it.

Main challenges in B2B data management and what to do with them

Giang Monday, August 28, 2023