Skip to main content

Why AI Infrastructure Will Repeat Data Infrastructure's Growing Pains (And What We Can Do About It)

·6 mins

As someone who spent the last 18 months shepherding Apache Airflow through its evolution to 3.0, I’ve been watching the current AI infrastructure boom with a mix of excitement and concern. The patterns emerging in 2025 feel remarkably similar to the data infrastructure landscape of 2018-2020, and not always in good ways.

After mapping the AI infrastructure landscape and exploring governance frameworks, I keep returning to one central question: Are we learning from data infrastructure’s evolution, or are we doomed to repeat its mistakes?

We’re witnessing the same cycle of fragmentation, reinvention, and eventual consolidation that defined the early days of modern data infrastructure. AI infrastructure will mature eventually. The real question is whether we’ll learn from data infrastructure’s growing pains or repeat the same mistakes at an even larger scale.

The Parallel Universe of Infrastructure Evolution #

In 2018, every data team was building their own orchestration layer. Airflow was gaining traction, but so were dozens of other workflow tools. Teams were spinning up custom Spark clusters, building bespoke ETL pipelines, and creating one-off solutions for data quality monitoring. The tools existed, but the ecosystem was a patchwork of incompatible solutions.

Diagram showing parallel evolution timelines of Data Infrastructure (2018-2022) and AI Infrastructure (2024-2028). Data infrastructure progressed through Fragmentation, Tool Proliferation, Consolidation, Governance, and Maturity stages. AI infrastructure shows similar pattern: Vector DB Explosion, LLMOps Proliferation, Integration Challenge, Governance Gap, and Decision Point. Warning indicators highlight current AI infrastructure risks at Integration Challenge and Governance Gap stages, while a success indicator marks Data Infrastructure’s completed maturity. Connecting lines show parallel evolution patterns between the two timelines.
Infrastructure evolution follows predictable patterns. Data infrastructure’s 2018-2022 journey from fragmentation to maturity mirrors AI infrastructure’s current trajectory. We’re approaching the critical governance gap phase—the same inflection point where data infrastructure teams faced their biggest challenges.

Fast-forward to today’s AI infrastructure landscape, and the parallels are striking. Vector databases have become the new data lakes—essential but overwhelming in their variety. AI platforms are proliferating like orchestrators did in 2019. Every team is building their own RAG implementation, their own model serving layer, their own AI observability stack.

The fundamental dynamic is identical: powerful new capabilities are emerging faster than the standards and governance needed to use them responsibly at scale.

Features Are Easy, Trust Is Hard #

Here’s the lesson that took me years to fully internalize through my work at Astronomer (a managed service for Apache Airflow)—from onboarding enterprise customers as a solutions architect to shaping Airflow 3 strategy as a product manager: building features is the easy part. Building trust is what separates toys from production-grade infrastructure, and is so much harder.

You can spin up a vector database over a weekend. You can build a decent RAG pipeline in a few days. But getting an enterprise to bet their customer-facing AI applications on your infrastructure? That requires proven reliability, robust governance, and seamless integration with existing systems.

This is where the AI infrastructure space is making the same mistakes we made in early data infrastructure. There’s tremendous focus on feature velocity and technical capabilities, but insufficient attention to the harder problems of trust, interoperability, and long-term sustainability.

This connects directly to the governance gaps I see across the AI infrastructure stack. Teams are building capabilities without the frameworks needed to operate them reliably at scale.

The Interoperability Imperative #

The winners in data infrastructure weren’t necessarily the most feature-rich tools—they were the ones that played well with others. Airflow succeeded because it integrated with everything: your existing databases, your cloud providers, your monitoring tools, your security systems. dbt won because it fit seamlessly into existing data team workflows rather than requiring a complete rebuild.

The same principle applies to AI infrastructure, but the blast radius is bigger. Data pipeline failures delay dashboards and reports. AI system failures make wrong decisions at scale, potentially affecting real users in real time.

Yet we’re seeing the same fragmentation patterns emerge. Teams are building isolated AI stacks that don’t integrate well with existing data infrastructure, security systems, or governance frameworks. The result is a proliferation of point solutions that work great in demos but create operational nightmares at scale.

The Governance Gap #

Perhaps the most concerning parallel is the governance gap. In the early days of data infrastructure, teams focused on building capabilities first and figuring out governance later. This led to data quality disasters, privacy incidents, and regulatory compliance nightmares that we’re still dealing with today.

AI infrastructure is making the same mistake, but with potentially larger consequences. We’re deploying models without robust monitoring. We’re building AI workflows without clear lineage tracking. We’re creating systems that make decisions we can’t easily explain or audit.

The regulatory environment around AI is evolving rapidly, but most AI infrastructure stacks aren’t designed with compliance in mind. When the inevitable governance requirements materialize, many teams will find themselves needing to rebuild their AI infrastructure from the ground up.

Learning From Open Source Evolution #

One area where AI infrastructure might avoid data infrastructure’s mistakes is in open source sustainability. The data infrastructure ecosystem saw numerous promising open source projects burn out their maintainers or get acquired and abandoned. Projects struggled with the classic open source challenge: how to balance community needs with sustainable development.

The AI infrastructure space has a chance to get this right from the beginning. We’re seeing some promising models emerge, from foundation-backed projects to innovative commercial open source approaches. But we need to be intentional about building sustainable ecosystems rather than just sustainable products.

The Path Forward: Lessons for Builders #

Based on what I learned helping guide Airflow’s evolution, here are the key lessons AI infrastructure builders should internalize:

Start with integration, not isolation. The most successful infrastructure tools are the ones that enhance existing workflows rather than requiring complete rewrites. Design for interoperability from day one.

Governance is a feature, not an afterthought. Build in monitoring, lineage tracking, and auditability from the beginning. It’s much harder to retrofit these capabilities later.

Community health matters more than feature velocity. Sustainable open source projects require intentional community building and maintainer sustainability, not just rapid feature development.

Trust is earned through reliability, not marketing. Focus on building rock-solid core capabilities before adding flashy features. Enterprise adoption depends on proven reliability under real-world conditions.

The Stakes Are Higher This Time #

Data infrastructure’s growing pains were painful, but they were mostly contained within data teams and analytics use cases. AI infrastructure failures have the potential for much broader impact, affecting customer-facing applications and real-time decision-making systems.

This means we can’t afford to take as long to mature as data infrastructure did. We need governance frameworks before we have major production incidents. We need sustainable community models before we have another wave of maintainer burnout.

The good news is that we have a playbook. The data infrastructure space’s evolution provides a clear roadmap for what works and what doesn’t. The question is whether we’ll use it.

The AI infrastructure space is moving faster than data infrastructure ever did, but the fundamental challenges around trust, governance, and ecosystem health remain the same. We have an opportunity to learn from history rather than repeat it. The question is whether we’ll take it.