• Peter Dillinger's avatar
    Basic MultiGet support for partitioned filters (#6757) · bae6f586
    Peter Dillinger 创作于
    Summary:
    In MultiGet, access each applicable filter partition only once
    per batch, rather than for each applicable key. Also,
    
    * Fix Bloom stats for MultiGet
    * Fix/refactor MultiGetContext::Range::KeysLeft, including
    * Add efficient BitsSetToOne implementation
    * Assert that MultiGetContext::Range does not go beyond shift range
    
    Performance test: Generate db:
    
        $ ./db_bench --benchmarks=fillrandom --num=15000000 --cache_index_and_filter_blocks -bloom_bits=10 -partition_index_and_filters=true
        ...
    
    Before (middle performing run of three; note some missing Bloom stats):
    
        $ ./db_bench --use-existing-db --benchmarks=multireadrandom --num=15000000 --cache_index_and_filter_blocks --bloom_bits=10 --threads=16 --cache_size=20000000 -partition_index_and_filters -batch_size=32 -multiread_batched -statistics --duration=20 2>&1 | egrep 'micros/op|block.cache.filter.hit|bloom.filter.(full|use)|number.multiget'
        multireadrandom :      26.403 micros/op 597517 ops/sec; (548427 of 671968 found)
        rocksdb.block.cache.filter.hit COUNT : 83443275
        rocksdb.bloom.filter.useful COUNT : 0
        rocksdb.bloom.filter.full.positive COUNT : 0
        rocksdb.bloom.filter.full.true.positive COUNT : 7931450
        rocksdb.number.multiget.get COUNT : 385984
        rocksdb.number.multiget.keys.read COUNT : 12351488
        rocksdb.number.multiget.bytes.read COUNT : 793145000
        rocksdb.number.multiget.keys.found COUNT : 7931450
    
    After (middle performing run of three):
    
        $ ./db_bench_new --use-existing-db --benchmarks=multireadrandom --num=15000000 --cache_index_and_filter_blocks --bloom_bits=10 --threads=16 --cache_size=20000000 -partition_index_and_filters -batch_size=32 -multiread_batched -statistics --duration=20 2>&1 | egrep 'micros/op|block.cache.filter.hit|bloom.filter.(full|use)|number.multiget'
        multireadrandom :      21.024 micros/op 752963 ops/sec; (705188 of 863968 found)
        rocksdb.block.cache.filter.hit COUNT : 49856682
        rocksdb.bloom.filter.useful COUNT : 45684579
        rocksdb.bloom.filter.full.positive COUNT : 10395458
        rocksdb.bloom.filter.full.true.positive COUNT : 9908456
        rocksdb.number.multiget.get COUNT : 481984
        rocksdb.number.multiget.keys.read COUNT : 15423488
        rocksdb.number.multiget.bytes.read COUNT : 990845600
        rocksdb.number.multiget.keys.found COUNT : 9908456
    
    So that's about 25% higher throughput even for random keys
    Pull Request resolved: https://github.com/facebook/rocksdb/pull/6757
    
    Test Plan: unit test included
    
    Reviewed By: anand1976
    
    Differential Revision: D21243256
    
    Pulled By: pdillinger
    
    fbshipit-source-id: 5644a1468d9e8c8575be02f4e04bc5d62dbbb57f
    bae6f586