80% increase in industry categorization coverage
When we launched company categories (late last spring), they covered ~30% of companies. We could generate categories for most public companies and a good portion of highly visible private companies as well.
Customers quickly started using them for signup routing, lead scoring, and segmentation analysis. All powerful use cases that drive immediate value.
As we started looking at expanding coverage from public and prominent private companies, we found that many self-reported sources were often incorrect and had depressingly low coverage.
Enter some machine learning
Over the past few months we've built out a ML categorization system. We've trained ~140 unique categories that let us apply relevant sectors, industry groups, industries,and sub industries to any website based on the test of the site itself.
Those tags are then mapped to a standardized industry hierarchy (similar to GICS).
For example, coinbase.com returned no industry when we were pulling from public data sources, just a general internet tag. With the new machine learning system we can now apply these categories and tags to it:
"category": {
"sector": "Financials",
"industryGroup": "Diversified Financial Services",
"industry": "Diversified Financial Services",
"subIndustry": "Finance"
},
"tags": [
"Finance",
"Financial Services",
"Internet"
],
It's safe to say that moving to machine learning has had an enormous effect on our accuracy and coverage.
With our previous system, taking a random 5,000 private companies with less than 50 employees, we were able to categorize 15%. With the new system, we now can categorize 95% of these companies. A massive upgrade for the Enrichment API.
For all of you who lived through the previous drought of the categorization attributes, El Niño is here.
See the full list of our updated industries
PRO TIP:
These categories are also searchable via the Discovery API where you can build highly targeted company lists.