How ZoomInfo streamlined data quality across billions of records using Telmai’s ML-powered DQ monitoring across Iceberg, GCS,BigQuery and Snowflake

Discover how ZoomInfo ensures data accuracy at scale, leveraging Telmai’s ML-powered data quality monitoring to process billions of records across Iceberg, GCS, and Snowflake to deliver reliable business insights to over 35,000 clients.

ZoomInfo (NASDAQ: ZI) is a market leader in data and intelligence for B2B sales and marketing teams. ZoomInfo’s AI-powered go-to-market platform helps more than 35,000 businesses find, acquire, and grow their customers.

With over 1.5 billion data points processed daily, including over 110 million published company profiles and 420 million professional profiles, ZoomInfo’s success is built on delivering accurate, reliable, and timely data and signals to its customers. To maintain this high standard for data quality, ZoomInfo has partnered with Telmai to transform its approach to data quality. Through Telmai’s AI-powered real-time data quality monitoring, automated anomaly detection, and remediation workflows, ZoomInfo can swiftly detect and address data quality issues, minimize operational costs, and ensure its clients consistently receive high-quality, actionable insights that drive business growth.

Key success metrics

  • Data quality insights by profiling a large critical data asset: ~3 hours
  • Time to deployment in VPC environment: ~2 weeks
  • Time to Value Post Deployment, i.e., DQ tickets in JIRA: <1 day

Overview

Founded in 2007, ZoomInfo has established itself as a leader in B2B go-to-market intelligence, helping thousands of enterprises worldwide reach their ideal customers through comprehensive, real-time data insights. With a vast customer base and annual product revenue north of a billion dollars, ZoomInfo’s journey demonstrates the transformative impact of effectively using data as a strategic asset.

Notable takeaways:

  • Time-to-value both on onboarding and resolving DQ issues via Telmai
  • Ensuring customer experience and trust through consistent, accurate, and reliable data flow 
  • Automated proactive data quality monitoring in complex data ecosystem 
  • ML-driven anomaly detection and automated remediation workflows at scale
  • Applied to a scalable architecture using by combining Telmai connections to a federated query system, Apache Trino, combined with open data formats, Iceberg, and data warehouses across multiple clouds

Challenge: moving past reactive processes to manage and automate data quality across billions of records in a complex data ecosystem

At ZoomInfo, data quality forms the foundation of its product offerings, driving the business’s ability to provide accurate, timely insights to its customers. As Ethan Peck, Director of Data Engineering at ZoomInfo, pointed out, any lapses in data quality could directly impact customer experience and trust, a risk they cannot afford.

Managing data of this level of complexity and at this scale made this even more challenging. Daily, ZoomInfo processes terabytes of data from over half a billion contacts and over a hundred million companies. Much of this data is predominantly semi-structured, with hundreds of intricate, nested attributes. ZoomInfo invested significant resources, capital, and time to ensure the highest data quality for its customers. However, these efforts had limited scalability, which forced the business to become more reactive to unknown data quality issues rather than proactive in identifying new opportunities to improve the quality of its data products.

While their existing workflows were designed to address known issues, they needed help catching unknown or unpredictable data quality issues in real-time. Known data quality issues were addressed through traditional rule-based frameworks, but unexpected anomalies or unknown issues proved challenging to tackle and often went unnoticed until the compounding impact of issues, often referred to as ‘drift,’ had started becoming noticeable by some customers. ZoomInfo is continuously raising the bar in the industry through its commitment to ensuring the highest quality of data. Integrating Telmai into ZoomInfo’s workflows not only reinforces this commitment but also fortifies their status as industry leaders, keeping them well ahead of their competitors.

Evaluation: leveraging advanced solutions to automate data quality and anomaly detection at scale

To overcome this challenge, Zoominfo evaluated various data quality and observability vendors. Most of these tools could not handle the sheer scale of data being processed or the requirements of ZoomInfo’s massive data ecosystem. Further, most SaaS-based data observability tools typically offload processing to the vendor’s infrastructure via excessive queries, which is resource-intensive and creates a pricing model that drives up operational costs as more data is processed.

Building on lessons from previous evaluations, ZoomInfo sought an advanced solution to automate data quality at scale while addressing real-time issues without spiking cloud costs. 
Telmai’s ML-powered data quality monitoring aims to enhance ZoomInfo’s data quality workflows, making it possible to detect issues proactively and resolve anomalies at the record level through automated methods.

Telmai’s cloud-native infrastructure, built on Google Cloud Platform (GCP), perfectly aligns with ZoomInfo’s existing architecture, allowing for a smooth deployment. Operating entirely within ZoomInfo’s GCP environment, Telmai allowed ZoomInfo to maintain complete control over its data without any unnecessary external data transfers, eliminating the need to offload processing tasks to third-party infrastructure, a common drawback with other solutions.

Solution: ML-driven data quality monitoring at scale with automated profiling, anomaly detection, and remediation workflows

Focusing on ease of use and scalability, Telmai provided ZoomInfo with a seamless way to ensure data quality across its complex ecosystem, with minimal time and effort needed for initial setup. Connecting and scanning new data sources, regardless of size or format, was completed within minutes, a sharp contrast to the weeks or months required by other solutions. Telmai was able to onboard complex tables with hundreds of nested attributes and enable automated profiling and monitoring of billions of records across terabytes of data without disrupting ongoing operations.

Telmai’s intuitive interface allowed various teams at ZoomInfo to adopt the platform without extensive training. Both business and data engineering teams could quickly scan datasets and receive precise alerts, identifying and resolving record-level data quality issues in real time. Telmai’s ML-powered anomaly detection flagged unexpected numeric values in name fields, spotting just three problematic records out of millions. By providing precise and relevant alerts, Telmai ensures no alert fatigue and empowers teams to focus on critical data quality concerns.

ZoomInfo has assessed Telmai’s data quality binning feature and its potential to enhance their operational workflows. By automatically distinguishing between valid data and suspicious records, the ZoomInfo team is excited to eventually leverage this feature to help ensure a continuous flow of accurate and reliable data through their pipeline by flagging potentially problematic records for further investigation.

ZoomInfo is also evaluating Telmai’s integration capabilities with federated query systems like Apache Trino, combined with open data formats like Iceberg, to create a scalable data architecture that improves data accessibility, quality, and adaptability, positioning them to handle evolving data needs effectively.

Why Telmai

While various vendors were considered, ZoomInfo ultimately selected Telmai for the following reasons:

  • Proactive data quality monitoring over billions of records and terabytes of semi-structured data sources without sampling
  • Use of AI to identify data patterns and detect unpredictable errors without prior knowledge about the data
  • Ease of integration with data sources and technologies irrespective of size or format
  • Automate remediation and trigger corrective actions to relevant alerts sent to stakeholders
  • Visual investigator to understand data patterns and drill down into record-level
  • Performance and scale at low cloud cost

See what’s possible with Telmai

Request a demo to see the full power of Telmai’s data observability tool for yourself.