The Tip of the Iceberg

Data Engineering Intermediate
 at  LH-150

A deep dive into the Iceberg table format, examining the rationale for its creation, internal mechanics, and advanced capabilities. Drawing from years of production experience, this talk offers both theoretical foundations and practical insights for engineers considering adopting Iceberg.

This talk is an introduction to the Iceberg table format. It's meant as an examination of why table formats like Iceberg came to be, a deep dive into how Iceberg works, and a tour of some of its more advanced features. It also covers the necessary care and feeding of Iceberg tables, and some of its currently-rough edges, informed by multiple years of running Iceberg in production with clients. In this talk, we'll cover:

  • The origins of Iceberg, and the motivation for table formats generally
  • How Iceberg works internally, with an emphasis on how it provides transactional semantics on top of object storage
  • Some of the fancier features of Iceberg, including branching, time-travel and the write-audit-publish pattern
  • Compaction, garbage collection, and the small files problem in Iceberg.
  • The importance of data catalogs when working with Iceberg.
    I hope that the audience will come away knowing more about Iceberg than they did before, and having a better idea of whether Iceberg is a good fit for their systems.
Tickets