How to measure data quality

How to measure data quality

November 02, 2017

We all want to have perfect and real-time data on our lead and account records. You can setup a proper data management system, have a high standard for data governance, and get the right data integrations, but the biggest roadblock will always be finding the right vendor. There are a myriad of data vendors out there - how do you evaluate which one is right for you?

The initial challenge is determining what data points matter to you. Your team may rely heavily on location and employee count for qualifying leads, but may care less about social media presence. Hence, the latter probably shouldn’t be given as much weight when deciding on a vendor.

The bigger challenge is figuring out whether a prospective data vendor meets those requirements. Many vendors will claim to understand your industry and boast higher quality data than other vendors. Some might throw around terms like 'big data' and 'business intelligence' to impress you. But what constitutes good data and how do you sift through the noise?

This guide outlines how to evaluate data quality/coverage and pick the partner that will best serve your needs.

What is Good Data?

Every company has its own specific data requirements that adhere to their data profiling needs. However, there are three key considerations to keep in mind during your evaluation:

  1. Coverage — what is the overall match rate on your data?

  2. Depth — for records returned, what is the match rate on the specific data points that matter to you?

  3. Accuracy — is the data returned correct?

Identify the contenders

The first step in preparing for your evaluation is to determine which data points are important to you. There will always be standard data points that most vendors share, such as the email address, location, industry, etc., but you may also need less common information like the company’s tech stack to help your team make better decisions. No vendor will have everything, so you should start your evaluation by determining which fields are absolutely crucial, and which are nice to have, to avoid running into data quality issues.

Pro tip

Ranking data points in order of importance will help you narrow down your list of potential data partners.

Assemble your test data

Any vendor worth their salt will be willing to run a data test for you during the sales process. Most data companies talk a big game, so it’s important for you to take control of the test by providing a sample list with which you’ll be able to verify correct results. One example could be customer data from companies that you’ve done business with, as you should have a good grasp of their details.

Ideally, you’ll already have a verified source for the data points you’ll be testing for, even if you have to manually source them for your sample. Keep in mind that this should be an accurate representation of the data you’ll need in production. If your customers are global, or in multiple industries, you’ll need to make sure your dataset reflects that.

You’ll also want to make sure to you have enough records in your test to be statistically significant. We recommend 500-1000 records for any data test.

Before sending your data test, you’ll want to remove any field that the vendor isn’t using as an input, but keep the original list handy as you’ll be using it to compare with the data that each vendor sends back.

If you’re unsure about what to include, you can download our data quality tool below as a template:

Download the Data Test Template

Once the results from the vendors have been returned, you can use the steps below to evaluate their data.

1) Evaluating Coverage

The all important match rate. Luckily this is the easiest part of your evaluation. Simply tally up what percentage of the records you submitted were returned with data. There are, however, a couple things to keep in mind.

The first is that not all vendors treat matches equally. Some will return data they are unsure of with a corresponding “likelihood” score (usually as a percent), while others (like Clearbit) only return data that has hit a certain threshold of verification. You may need to determine what level of certainty you are comfortable with, and filter out records that don’t meet your standards for a true apples-to-apples comparison.

alt

Some vendors return a low likelihood score on freemail, whereas Clearbit marks it as False.

Each vendor will have its own strong suits - some may have excellent coverage in the United States but weak coverage globally, others may be better at enterprise than mid-market. You may need to split your test data into multiple cohorts when evaluating. For example, if you have a focus on companies in the Asia-Pacific region, the match rates for that cohort should be evaluated separately from the rest of the set.

2) Evaluating Depth

Once you’re confident in a vendor’s overall match rate, you’ll want to make sure that each individual data point has a solid match rate. Again, each vendor has its strengths and this is where the field prioritization you did earlier comes in handy. Make sure that the fields that are absolutely necessary are covered.

If you want to get fancy, you can even assign a weight to each field, and multiply that by the corresponding match rates. By summing the resulting scores, you have an absolute rating to compare vendors with. This is already included in our data test template.

alt

As before, keep in mind that some vendors will use filler values where data isn’t found, so you may need to do some filtering for proper comparison.

3) Evaluating Accuracy

This is the trickiest of the three to properly assess, and you’d be surprised how many customers skip this step, or simply test a couple records by hand, before declaring a winner. Keep in mind that inaccurate data is often worse than no data at all. If you are using it in customer communication for example, getting someone’s name or title wrong can quickly sour a relationship. If the data is in a CRM, incorrect company info can automatically disqualify a potentially lucrative lead.

Preparing your list properly will make your life a lot easier. If you already have the data available, you can simply compare it against your original dataset. If not, it’s still worth having someone go through a few hundred rows manually to make sure everything is above board. Here are a couple things to look for:

  1. Some standardization will always occur - VP may be returned as Vice President, etc. For this reason, you’ll probably want to review the data manually rather than comparing exclusively with an excel formula.

  2. Keep an eye out for old data on fields that change frequently like job title and Alexa rank. Vendors refresh their dataset at different rates, which is a common reason for incorrect information.

Data that scales with you

While this may seem like an involved decision-making process, choosing the data partner that will grow with you and that can be confidently integrated into your business processes from the day one should be the ultimate goal. Doing so will ensure you don't have to worry about poor data that might affect your sales/marketing efforts or customer experience in the future. Putting in the effort now to create a proper data test will pay dividends for years to come.


What is sales automation? 5 ways to boost revenue

Salesby Emily Pick on May 31, 2023

Sales automation can streamline the sales pipeline, from lead generation to closing a sale. Learn how it works and how it can benefit your sales process

11 cold prospecting email templates to improve response rate

Salesby Clearbit Team on February 23, 2023

Cold prospecting email templates are a powerful way to engage with prospects. Use these effective cold email templates to get your sales campaigns off to a great start.

The Standard in B2B Data

Now reinvented with Artificial Intelligence—Clearbit is the first AI Native Data Provider. Enrich your records, reveal buying intent, and connect with your ideal customers.

image-hero