Getting your data AI ready starts with open data standards

Experts break down how open data standards like Iceberg, Delta, and Hudi drive AI scalability by improving interoperability and governance. Learn key takeaways from the webinar on ensuring data quality in AI ecosystems.

Blog Hero Icons open standards

Anoop Gopalam

February 24, 2025

AI models are only as effective as the data they rely on, yet many organizations struggle with poor data quality, lack of interoperability, and rigid architectures. As AI adoption accelerates, ensuring data consistency and governance through open standards is becoming a key priority for data leaders.

This challenge was the focus of the second episode of the Data Quality Series, where Ravit Jain and Mona Rakibe hosted a panel discussion with industry leaders Piethein Strengholt, Scott Haines, and Pankaj Yawale. The conversation explored how open data standards are essential for building scalable, AI-ready data ecosystems.

In this blog, we highlight key takeaways from the discussion, diving into how enterprises can scale AI adoption by improving data quality, interoperability, and governance through open standards.

What does open data architecture mean and why does it matter?

The discussion started with a simple but essential question: What does open data architecture mean, and why is it important in today’s AI-driven world?

Scott Haines set the stage by explaining that open architectures are about flexibility and interoperability. He explained how, in the past, enterprises were forced into proprietary ecosystems, where integrating a new tool required lengthy, complex implementations. “A true open architecture means you shouldn’t need a solutions architect to integrate a new tool into your data pipeline. It should just work.” That’s the promise of open formats like Iceberg, Delta, and Hudi—they let businesses store data once and access it from multiple systems without friction.

Pankaj Yawale added that open architectures help businesses avoid vendor lock-in, allowing them to choose the best tools for different use cases. “Why limit yourself to one vendor’s ecosystem when you can make your data accessible across multiple platforms?” he asked. For ZoomInfo, this approach has been game-changing. Adopting open table formats makes their data queryable across Snowflake, BigQuery, Starburst, and other platforms, ensuring they aren’t locked into one specific stack.

However, while open architectures provide greater control and flexibility, Piethein Strengholt pointed out that they also come with challenges, particularly around metadata management and governance. “We have open catalogs, but we lack true open metadata standards,” he explained. AI models don’t just need access to data—they need to understand the context in which that data was created. Without consistent metadata and lineage tracking, even the most advanced AI systems risk misinterpreting information or producing unreliable insights.

How do Open table formats improve AI readiness?

The discussion shifted to how open table formats like Iceberg, Delta, and Hudi enhance AI readiness. The consensus was clear: AI models require not just large datasets, but also consistent, well-structured, and interoperable data to provide accurate insights.

For Pankaj, schema evolution is one of the biggest advantages of open table formats when it comes to AI. Traditional data warehouses and closed systems struggle with schema changes, making it difficult to adapt to new business needs without breaking existing pipelines. However, with open formats like Iceberg, businesses can modify schemas without disrupting downstream AI models. “With open formats like Iceberg, you can evolve your schema without disrupting downstream systems. That flexibility is critical for AI applications that require constantly evolving datasets.”

Scott built on this, emphasizing that AI pipelines depend on high-quality, well-governed data. In traditional closed architectures, AI teams often struggle to access data stored in proprietary formats, leading to data duplication and inefficiencies. Open table formats, on the other hand, allow AI models to ingest data from multiple sources without extra transformation steps. “The ability to store data once and query it from multiple platforms makes a huge difference,” Scott explained. “With open formats, AI teams don’t have to keep making copies of the same dataset just to work with it in different tools.”

But while schema evolution and accessibility improve AI readiness, Piethein cautioned that data governance is just as important. “AI models don’t just need data; they need the right data with clear lineage and governance,” he pointed out. Open table formats help ensure that data is versioned, auditable, and traceable, but enterprises still need strong metadata management practices to prevent AI systems from being trained on incomplete or outdated data.

How can enterprises maintain data consistency and governance in open architectures?

With open table formats making AI-ready data more accessible and scalable, the next challenge is governance. If organizations don’t enforce data consistency, metadata tracking, and lineage management, they risk AI models training on incomplete or conflicting data. So how can enterprises ensure that open architectures remain well-governed while still benefiting from flexibility?

Piethein was quick to point out that governance in open architectures isn’t just about enforcing rules—it’s about ensuring that data remains meaningful and traceable across different systems. “We have open catalogs, but we lack true open metadata standards,” he explained. The challenge isn’t just making data available, but making sure AI models and analysts understand its context. Without consistent metadata management and lineage tracking, enterprises risk feeding AI models with incomplete or misinterpreted data.

For Pankaj, the key to data consistency in open architectures lies in real-time validation and anomaly detection. Since open formats allow multiple platforms to interact with the same dataset, schema drifts, missing values, or duplicate records can easily creep in. “We put data quality gates at multiple stages of our pipelines,” Pankaj explained. “It’s not just about checking data at the final stage; we validate it at every step so that bad data never makes it to production.”

Scott emphasized that data governance isn’t just a technical problem—it’s an organizational one. Too often, governance is treated as an afterthought, rather than baked into the data strategy from the start. “If governance is handled manually, it won’t scale,” he warned. Nike focuses on automating governance policies within its open data architecture so that data consistency is enforced programmatically rather than relying on manual interventions.

What’s next for Open data standards and AI-driven decision-making?

While open formats have already enhanced data accessibility and flexibility, the shift toward true interoperability and standardized metadata frameworks is still unfolding across the industry.

For Scott, the future of open data standards is all about seamless interoperability across cloud providers and AI platforms. He pointed out that while formats like Iceberg, Delta, and Hudi have solved many data accessibility challenges, enterprises still struggle with metadata fragmentation. “We’re seeing AWS, Snowflake, and Microsoft Fabric all doubling down on open data formats,” Scott noted. “In a few years, proprietary data storage may become the exception, not the norm.” He predicts that as cloud providers continue to compete, they will further invest in making open data formats a native part of their ecosystems, making data portability even easier.

Piethein Strengholt believes that while table formats are evolving rapidly, there’s still a critical gap in metadata standardization. “We need truly open metadata standards,” he emphasized. Right now, different platforms handle metadata in different ways, making cross-platform AI decision-making more complex than it needs to be. He envisions a future where metadata, lineage tracking, and governance policies become standardized across platforms, enabling AI models to better understand the full context of the data they process.

Pankaj stated that the biggest shift will be in real-time AI-driven decision-making. He highlighted how open table formats are already enabling organizations to process massive datasets more efficiently, but the next step is making AI models more responsive to live data changes. “The more we standardize how we store and access data, the faster AI systems can adapt to new information,” he explained. This means AI won’t just work with historical data—it will be able to make real-time, adaptive decisions based on the latest available insights.

Conclusion

AI-ready data starts with open standards, but success depends on governance. While open table formats improve data flexibility and scalability, metadata tracking, interoperability, and real-time consistency still need industry-wide standardization.

As cloud providers continue to double down on open data architectures, enterprises that prioritize data consistency, metadata management, and schema evolution will gain a competitive edge in AI adoption.

Want to dive deeper into these insights? Click here to check out the full interview for an in-depth look at how open data standards are shaping AI-ready data ecosystems and solving key governance challenges.

Passionate about data quality? Get expert insights and guides delivered straight to your inbox – click here to subscribe to our newsletter now.

  • On this page

See what’s possible with Telmai

Request a demo to see the full power of Telmai’s data observability tool for yourself.