LH-150
Sessions in LH-150
What's a Data Lake and What Does It Mean For My Open Source Stack?
Data lakes on open table formats like Iceberg are a popular way to manage large datasets for analytics, data science, and AI. This talk explains how data lakes work and how to adapt open source analytic stacks to use them. First, we'll tour projects like Arrow, Iceberg, and Unity Catalog that make data lakes possible. Next, we'll see how analytic engines like DuckDB, ClickHouse, and Spark are adapting. Finally, we'll survey a few projects that enable applications written in Python, Golang, or Rust to deliver fast queries. You'll have to build the app yourself, but this talk will show you a path to use data lakes and open source successfully.
The Tip of the Iceberg
A deep dive into the Iceberg table format, examining the rationale for its creation, internal mechanics, and advanced capabilities. Drawing from years of production experience, this talk offers both theoretical foundations and practical insights for engineers considering adopting Iceberg.
Pub/Sub for Tables: Shift Left for Data Integration
Traditional data pipelines are costly and hard to operate and maintain. Tabsdata introduces Pub/Sub for Tables, a declarative approach to data movement by allowing teams to publish structured datasets. This model shifts data quality and governance left, simplifying both the development and operation of data stacks.
Data Debt : The Hidden Malaise Impacting Your Business
Exploring how hidden data quality issues silently erode business metrics — and strategies to detect, prevent, and fix data debt at scale.
Strands and AgentCore
In this session we will learn techniques to build agents using Strands (an Agent development kit open-sourced by AWS), challenges with moving agents into production, secure deployment and scaling of agents on AWS with AgentCore.