Domain Modeling in the Age of AI: Why the Semantic Layer Is the New Bottleneck

Data Engineering Intermediate

The hardest part of scaling a data platform is not infrastructure, it is getting teams to agree on what a customer is. How domain modeling, not tooling, moved the needle at Zillow.

The hardest problem in scaling a data platform is not choosing between Kafka and Kinesis or picking a lakehouse format. It is getting six teams to agree on what a customer is, and whether that is the same thing as a user, a consumer, or an account. This talk draws on domain modeling for a large platform organization at Zillow, where multiple acquired brands produce overlapping data: same concepts with different names, same names with different meanings. After years of pipelines, governance tooling, and quality dashboards, the thing that actually moved the needle was the unglamorous work of domain modeling: classifying data into clear categories, standardizing naming, and building a shared vocabulary that producers and consumers both understand. It covers the pain points that forced the issue (dashboards that undercounted because teams named the same action differently, schemas that broke across teams, a data dictionary nobody used) and what came out of it: an event classification taxonomy, naming conventions that match how teams already think, a living domain dictionary seeded from production data, and an AI-assisted pipeline that makes naming and boundary gaps immediately visible.

What you will take away:

  • A practical classification framework you can adapt to your own organization
  • The standardization fights worth having, and the ones to let go
  • How to make the case for domain modeling when leadership wants features, not foundations
  • A realistic picture of where AI helps and where it gives a false sense of progress