Sampling from data streams

Sampling data from a continuous stream of data is a useful technique to efficiently extrapolate information from a potentially large body of data. There are a couple of sampling strategies in literature that vary in their degree of complexity. I'd like to introduce you to a rather simple sampling strategy that is easy to implement as well as easy to reason about and might take you a long way until you have to go for more advanced solutions. I'm talking about Bernoulli sampling.

more ...

Hi there! I'm Markus!

I'm an independent freelance IT consultant, a well-known expert for Apache Kafka and Apache Solr, software architect (iSAQB certified) and trainer.

How can I support you?