OpenSearch: (Just About) Everything You Need to Know About Its Architecture

Data Engineering

Seth Muthukaruppan

Consultant at Instaclustr

OpenSearch is an incredibly powerful search engine and analytics suite for ingesting, searching, visualizing, and analyzing your data and it is fully open source. This Apache 2.0-licensed and community-driven collection of technologies harnesses an architecture that combines the powers of Elasticsearch 7.10.2, Kibana 7.10.2 and Apache Lucene. With OpenSearch, users gain a distributed framework featuring particularly powerful scalability, high availability, and database-like capabilities.

Attendees at this DataCon LA presentation will come away understanding OpenSearch's architecture and its building-block technology components, including:

-- Apache Lucene utilization. Learn how this high-performance Java-based search library utilizes Lucene's inverted search index to delivers incredibly fast search results (while supporting natural language, wildcard, fuzzy, and proximity searches).
-- OpenSearch cluster architecture. An OpenSearch cluster is a distributed and horizontally-scalable collection of nodes, which are differentiated based on the operations they perform. Attendees will learn the specific functions of master, master-eligible, data, client, ingest nodes.
-- Data organization. Understand how OpenSearch organizes data into indices (which contain documents, which contain fields).
-- Internal data structures. Get an in-depth look at how OpenSearch achieves scalability and reliability by breaking up indices into shards and segments, and utilizes translogs.
-- Aggregations. See how OpenSearch enables its advanced built-in analytics capabilities through the power of aggregations.