Sampling from data streams

Sampling data from a continuous stream of data is a useful technique to efficiently extrapolate information from a potentially large body of data. There are a couple of sampling strategies in literature that vary in their degree of complexity. I'd like to introduce you to a rather simple sampling strategy that is easy to implement as well as easy to reason about and might take you a long way until you have to go for more advanced solutions. I'm talking about Bernoulli sampling.

more ...

