Applying Probabilistic Algorithms
Data Engineering
We have seen dramatic improvements to job runtimes and associated costs by applying probabilistic algorithms when appropriate. With big-data jobs running at scale, computing exact answers is often overkill - instead, we can often answer the question ""accurately enough"" by approximating a reasonably-correct answer. For our use case (marketing analytics) we have seen benefit from: - Approximate set membership (Bloom filter) - Approximate cardinality (Hyper Log-Log) This talk will focus on use-cases, considerations and impact; not on the details of the algorithms or implementation.