Applying Probabilistic Algorithms

Data Engineering

Grant Kushida

Head of Engineering at Conversion Logic

We have seen dramatic improvements to job runtimes and associated costs by applying probabilistic algorithms when appropriate. With big-data jobs running at scale, computing exact answers is often overkill - instead, we can often answer the question ""accurately enough"" by approximating a reasonably-correct answer. For our use case (marketing analytics) we have seen benefit from: - Approximate set membership (Bloom filter) - Approximate cardinality (Hyper Log-Log) This talk will focus on use-cases, considerations and impact; not on the details of the algorithms or implementation.