Joinability Analysis at Scale with HyperLogLog
Speaker(s): John Myers
The need to operationalize and monetize data is growing more and more. Disparate and huge data sets are being correlated and joined to derive new insights and to analyze for potential privacy concerns prior to sharing data publicly. Determining the joinability of extremely large datasets is non-trivial as it often requires massive scans across tables to do unique counting and set operations.
However, with the power of Redis combined with streaming technologies and microservice architectures, real-time joinability analysis between multiple massive datasets is achievable using the lightweight HyperLogLog data structure and some pretty basic math operations. This talk will extend some recent research from Google into the Redis ecosystem and show how fast and very efficient joinability analysis can be conducted on streaming data across multiple dimensions.