Duplicate insensitive cardinality estimation in distributed databases

From wiki.eser.org

Jump to: navigation, search

Contents

[edit] approach: Flajolet-Martin sketches (fm sketches)

[edit] videos

[edit] papers

[edit] implementations

  1. Berkeley "P2" distributed dataflow overlay network project
  2. AT&T research sketch library
  3. AT&T research implementation of 'count sketch
  4. Rutgers MassDAL Public Code Bank: Sketches, Frequent Items, Changes (Deltoids)
    • java and C source code downloads of various sketching algorithm implementations

[edit] count sketches

  • proposed in 2002
  • problem: in sublinear space, estimate the number of distinct item IDs in a data set with only one pass
    • must support union operator of results from subsets
    • must support small per item processing overhead
    • must support small space relative to stream space
    • note: exact count is impossible without linear space

[edit] papers and presentations

[edit] implementations of count sketches

Personal tools