What's a Data Lake and What Does It Mean For My Open Source Stack?
Data lakes on open table formats like Iceberg are a popular way to manage large datasets for analytics, data science, and AI. This talk explains how data lakes work and how to adapt open source analytic stacks to use them. First, we'll tour projects like Arrow, Iceberg, and Unity Catalog that make data lakes possible. Next, we'll see how analytic engines like DuckDB, ClickHouse, and Spark are adapting. Finally, we'll survey a few projects that enable applications written in Python, Golang, or Rust to deliver fast queries. You'll have to build the app yourself, but this talk will show you a path to use data lakes and open source successfully.
Data lakes are becoming a cornerstone of modern analytics, powering data science, AI, and large-scale reporting. But for many developers, the ecosystem of tools and terminology—Iceberg, Arrow, catalogs, and engines—feels confusing and fragmented. This talk cuts through the noise and explains how open source projects are making data lakes usable, flexible, and powerful. We’ll start by exploring the open table formats and standards, such as Iceberg and Arrow, that form the foundation of today’s data lakes. Then we’ll look at how popular analytic engines—including DuckDB, ClickHouse, and Spark—are evolving to take advantage of these formats. Finally, we’ll survey projects that make it easier for applications written in languages like Python, Golang, and Rust to query data lakes efficiently. Designed for a broad technical audience, this session provides a clear map of the open source data lake landscape. You don’t need prior experience with data lakes—just curiosity about how to integrate these technologies into your stack and a willingness to explore what’s next. Takeaways:
- Understand what data lakes are and why open table formats matter.
- Learn the role of Iceberg, Arrow, and Unity Catalog in modern architectures.
- See how engines like DuckDB, Spark, and ClickHouse adapt to data lakes.
- Discover emerging tools for Python, Golang, and Rust applications.
- Gain clarity in a fast-evolving, vendor-heavy ecosystem.