1. 21 Apr, 2018 2 commits
  2. 10 Apr, 2018 3 commits
  3. 24 Mar, 2018 3 commits
  4. 17 Mar, 2018 2 commits
    • Andrew Kryczka's avatar
      update history and bump patch number · b4fc156f
      Andrew Kryczka authored
      b4fc156f
    • Andrew Kryczka's avatar
      Fix WAL corruption from checkpoint/backup race condition · 34ccab02
      Andrew Kryczka authored
      Summary:
      `Writer::WriteBuffer` was always called at the beginning of checkpoint/backup. But that log writer has no internal synchronization, which meant the same buffer could be flushed twice in a race condition case, causing a WAL entry to be duplicated. Then subsequent WAL entries would be at unexpected offsets, causing the 32KB block boundaries to be overlapped and manifesting as a corruption.
      
      This PR fixes the behavior to only use `WriteBuffer` (via `FlushWAL`) in checkpoint/backup when manual WAL flush is enabled. In that case, users are responsible for providing synchronization between WAL flushes. We can also consider removing the call entirely.
      Closes https://github.com/facebook/rocksdb/pull/3603
      
      Differential Revision: D7277447
      
      Pulled By: ajkr
      
      fbshipit-source-id: 1b15bd7fd930511222b075418c10de0aaa70a35a
      34ccab02
  5. 15 Mar, 2018 4 commits
  6. 09 Mar, 2018 1 commit
  7. 01 Mar, 2018 2 commits
  8. 28 Feb, 2018 2 commits
    • Andrew Kryczka's avatar
      skip CompactRange flush based on memtable contents · 3ae00472
      Andrew Kryczka authored
      Summary:
      CompactRange has a call to Flush because we guarantee that, at the time it's called, all existing keys in the range will be pushed through the user's compaction filter. However, previously the flush was done blindly, so it'd happen even if the memtable does not contain keys in the range specified by the user. This caused unnecessarily many L0 files to be created, leading to write stalls in some cases. This PR checks the memtable's contents, and decides to flush only if it overlaps with `CompactRange`'s range.
      
      - Move the memtable overlap check logic from `ExternalSstFileIngestionJob` to `ColumnFamilyData::RangesOverlapWithMemtables`
      - Reuse the above logic in `CompactRange` and skip flushing if no overlap
      Closes https://github.com/facebook/rocksdb/pull/3520
      
      Differential Revision: D7018897
      
      Pulled By: ajkr
      
      fbshipit-source-id: a3c6b1cfae56687b49dd89ccac7c948e53545934
      3ae00472
    • Siying Dong's avatar
      Update comments in DB::Close() · c287c098
      Siying Dong authored
      Summary: Closes https://github.com/facebook/rocksdb/pull/3543
      
      Differential Revision: D7093251
      
      Pulled By: siying
      
      fbshipit-source-id: 4066b82c95ecb65866c5842d68ab13ab9f85d567
      c287c098
  9. 27 Feb, 2018 3 commits
    • Istvan Szukacs's avatar
      Adding CentOS 7 Vagrantfile & build script · d6336563
      Istvan Szukacs authored
      Summary:
      I have updated the Vagrantfile to have an entry for CentOS 7. Also created a simple build script which is pretty similar to the one in Beringei.
      
      How to test:
      ```
      vagrant up centos7
      ```
      Todo:
      
      Implement -j X for the build.
      Closes https://github.com/facebook/rocksdb/pull/3530
      
      Differential Revision: D7090739
      
      Pulled By: ajkr
      
      fbshipit-source-id: 9f9eda5b507568993543d08de7ce168dfc12282e
      d6336563
    • Zhongyi Xie's avatar
      DB:Open should fail on tmpfs when use_direct_reads=true · ad05cbb1
      Zhongyi Xie authored
      Summary:
      Before:
      
      > $ TEST_TMPDIR=/dev/shm ./db_bench -use_direct_reads=true -benchmarks=readrandomwriterandom -num=10000000 -reads=100000 -write_buffer_size=1048576 -target_file_size_base=1048576 -max_bytes_for_level_base=4194304 -max_background_jobs=12 -readwritepercent=50 -key_size=16 -value_size=48 -threads=32
      DB path: [/dev/shm/dbbench]
      put error: IO error: While open a file for random read: /dev/shm/dbbench/000007.sst: Invalid argument
      put error: IO error: While open a file for random read: /dev/shm/dbbench/000007.sst: Invalid argument
      put error: IO error: While open a file for random read: /dev/shm/dbbench/000007.sst: Invalid argument
      put error: IO error: While open a file for random read: /dev/shm/dbbench/000007.sst: Invalid argument
      put error: IO error: While open a file for random read: /dev/shm/dbbench/000007.sst: Invalid argument
      put error: IO error: While open a file for random read: /dev/shm/dbbench/000007.sst: Invalid argument
      put error: IO error: While open a file for random read: /dev/shm/dbbench/000007.sst: Invalid argument
      put error: IO error: While open a file for random read: /dev/shm/dbbench/000007.sst: Invalid argument
      put error: IO error: While open a file for random read: /dev/shm/dbbench/000007.sst: Invalid argument
      db_bench: tpp.c:84: __pthread_tpp_change_priority: Assertion `new_prio == -1 || (new_prio >= fifo_min_prio && new_prio <= fifo_max_prio)' failed.
      put error: IO error: While open a file for random read: /dev/shm/dbbench/000007.sst: Invalid argument
      put error: IO error: While open a file for random read: /dev/shm/dbbench/000007.sst: Invalid argument
      
      After:
      > TEST_TMPDIR=/dev/shm ./db_bench -use_direct_reads=true -benchmarks=readrandomwriterandom -num=10000000 -reads=100000 -write_buffer_size=1048576 -target_file_size_base=1048576 -max_bytes_for_level_base=4194304 -max_background_jobs=12 -readwritepercent=50 -key_size=16 -value_size=48 -threads=32
      Initializing RocksDB Options from the specified file
      Initializing RocksDB Options from command-line flags
      open error: Not implemented: Direct I/O is not supported by the specified DB.
      Closes https://github.com/facebook/rocksdb/pull/3539
      
      Differential Revision: D7082658
      
      Pulled By: miasantreble
      
      fbshipit-source-id: f9d9c6ec3b5e9e049cab52154940ee101ba4d342
      ad05cbb1
    • Dmitri Smirnov's avatar
      Fix a memory leak in WindowsThread · 7eb292da
      Dmitri Smirnov authored
      Summary:
      _endthreadex does not return and thus objects
        for stack destructors do not run. This creates a memory leak.
        We remove the calls since _enthreadex called automatically after the
        threadproc returns i.e. thread exits.
      Closes https://github.com/facebook/rocksdb/pull/3542
      
      Differential Revision: D7088713
      
      Pulled By: ajkr
      
      fbshipit-source-id: 749ecafc6a9572f587f76e516547e07734349a54
      7eb292da
  10. 24 Feb, 2018 2 commits
  11. 23 Feb, 2018 5 commits
  12. 22 Feb, 2018 2 commits
    • Andrew Kryczka's avatar
      BackupEngine gluster-friendly file naming convention · b0929776
      Andrew Kryczka authored
      Summary:
      Use the rsync tempfile naming convention in our `BackupEngine`. The temp file follows the format, `.<filename>.<suffix>`, which is later renamed to `<filename>`. We fix `tmp` as the `<suffix>` as we don't need to use random bytes for now. The benefit is gluster treats this tempfile naming convention specially and applies hashing only to `<filename>`, so the file won't need to be linked or moved when it's renamed. Our gluster team suggested this will make things operationally easier.
      Closes https://github.com/facebook/rocksdb/pull/3463
      
      Differential Revision: D6893333
      
      Pulled By: ajkr
      
      fbshipit-source-id: fd7622978f4b2487fce33cde40dd3124f16bcaa8
      b0929776
    • Maysam Yabandeh's avatar
      WritePrepared Txn: fix non-emptied PreparedHeap bug · 828211e9
      Maysam Yabandeh authored
      Summary:
      Under a certain sequence of accessing PreparedHeap, there was a bug that would not successfully empty the heap. This would result in performance issues when the heap content is moved to old_prepared_ after max_evicted_seq_ advances the orphan prepared sequence numbers. The patch fixed the bug and add more unit tests. It also does more logging when the unlikely scenarios are faced
      Closes https://github.com/facebook/rocksdb/pull/3526
      
      Differential Revision: D7038486
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: f1e40bea558f67b03d2a29131fcb8734c65fce97
      828211e9
  13. 21 Feb, 2018 4 commits
    • Sagar Vemuri's avatar
      Add rocksdb.iterator.internal-key property · 8ada876d
      Sagar Vemuri authored
      Summary:
      Added a new iterator property: `rocksdb.iterator.internal-key` to get the internal-key (converted to user key) at which the iterator stopped.
      Closes https://github.com/facebook/rocksdb/pull/3525
      
      Differential Revision: D7033694
      
      Pulled By: sagar0
      
      fbshipit-source-id: d51e6c00f5e9d766c6276ef79774b81c6c5216f8
      8ada876d
    • jsteemann's avatar
      save redundant key lookup in map of locked keys · e9c31ab1
      jsteemann authored
      Summary:
      In case it is found that a key is already marked as locked in a
      stripe's map of locked keys, it is not necessary to look it up
      again using `std::unordered_map<std::string, ...>::at(size_t)`.
      
      Instead, we can use the already found position using the iterator
      produced by the previous `find` operation. Reusing the iterator
      will avoid having to hash the key again and do additional "random"
      memory lookups in the map of keys (though the data will very
      likely sit available in caches here already due to the previous
      find operation)
      Closes https://github.com/facebook/rocksdb/pull/3505
      
      Differential Revision: D7036446
      
      Pulled By: sagar0
      
      fbshipit-source-id: cced51547b2bd2d49394f6bc8c5896f09fa80f68
      e9c31ab1
    • Andrew Kryczka's avatar
      fix handling of empty string as checkpoint directory · 1960e73e
      Andrew Kryczka authored
      Summary:
      - made `CreateCheckpoint` properly return `InvalidArgument` when called with an empty directory. Previously it triggered an assertion failure due to a bug in the logic.
      - made `ldb` set empty `checkpoint_dir` if that's what the user specifies, so that we can use it to properly test `CreateCheckpoint` in the future.
      
      Differential Revision: D6874562
      
      fbshipit-source-id: dcc1bd41768261d9338987fa7711444289707ed7
      1960e73e
    • Igor Sugak's avatar
      fix shift UBSAN error in col_buf_encoder.cc · 5263da63
      Igor Sugak authored
      Summary:
      Add a static cast to perform the left shift as with an unsigned type.
      
      make ubsan_check
      Closes https://github.com/facebook/rocksdb/pull/3517
      
      Reviewed By: sagar0
      
      Differential Revision: D7016044
      
      Pulled By: igorsugak
      
      fbshipit-source-id: baf72f6197edd8f7220d010b15a23d6de6a72c49
      5263da63
  14. 17 Feb, 2018 3 commits
    • Po-Chuan Hsieh's avatar
      Fix build with USE_RTTI=0 · ab446dc2
      Po-Chuan Hsieh authored
      Summary:
      utilities/column_aware_encoding_util.cc:61:23: error: cannot use dynamic_cast with -fno-rtti
        table_reader_.reset(dynamic_cast<BlockBasedTable*>(table_reader.release()));
                            ^
      1 error generated.
      
      It was added as a [local patch](https://svnweb.freebsd.org/ports/head/databases/rocksdb/files/patch-utilities-column_aware_encoding_util.cc) on FreeBSD since RocksDB 5.8.
      It also fixes #2707.
      Closes https://github.com/facebook/rocksdb/pull/3514
      
      Differential Revision: D7005571
      
      Pulled By: siying
      
      fbshipit-source-id: 351a9055d21d0accdd7a932e8e7bfcd3c8e22068
      ab446dc2
    • Maysam Yabandeh's avatar
      WritePrepared Txn: optimizations for sysbench update_noindex · c178da05
      Maysam Yabandeh authored
      Summary:
      These are optimization that we applied to improve sysbech's update_noindex performance.
      1. Make use of LIKELY compiler hint
      2. Move std::atomic so the subclass
      3. Make use of skip_prepared in non-2pc transactions.
      Closes https://github.com/facebook/rocksdb/pull/3512
      
      Differential Revision: D7000075
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 1ab8292584df1f6305a4992973fb1b7933632181
      c178da05
    • Mike Kolupaev's avatar
      Fix deadlock in ColumnFamilyData::InstallSuperVersion() · 97307d88
      Mike Kolupaev authored
      Summary:
      Deadlock: a memtable flush holds DB::mutex_ and calls ThreadLocalPtr::Scrape(), which locks ThreadLocalPtr mutex; meanwhile, a thread exit handler locks ThreadLocalPtr mutex and calls SuperVersionUnrefHandle, which tries to lock DB::mutex_.
      
      This deadlock is hit all the time on our workload. It blocks our release.
      
      In general, the problem is that ThreadLocalPtr takes an arbitrary callback and calls it while holding a lock on a global mutex. The same global mutex is (at least in some cases) locked by almost all ThreadLocalPtr methods, on any instance of ThreadLocalPtr. So, there'll be a deadlock if the callback tries to do anything to any instance of ThreadLocalPtr, or waits for another thread to do so.
      
      So, probably the only safe way to use ThreadLocalPtr callbacks is to do only do simple and lock-free things in them.
      
      This PR fixes the deadlock by making sure that local_sv_ never holds the last reference to a SuperVersion, and therefore SuperVersionUnrefHandle never has to do any nontrivial cleanup.
      
      I also searched for other uses of ThreadLocalPtr to see if they may have similar bugs. There's only one other use, in transaction_lock_mgr.cc, and it looks fine.
      Closes https://github.com/facebook/rocksdb/pull/3510
      
      Reviewed By: sagar0
      
      Differential Revision: D7005346
      
      Pulled By: al13n321
      
      fbshipit-source-id: 37575591b84f07a891d6659e87e784660fde815f
      97307d88
  15. 16 Feb, 2018 2 commits
    • Andrew Kryczka's avatar
      fix advance reservation of arena block addresses · 0454f781
      Andrew Kryczka authored
      Summary:
      Calling `std::vector::reserve()` causes memory to be reallocated and then data to be moved. It was called prior to adding every block. This reallocation could be done a huge amount of times, e.g., for users with large index blocks.
      
      Instead, we can simply use `std::vector::emplace_back()` in such a way that preserves the no-memory-leak guarantee, while letting the vector decide when to reallocate space. Now I see reallocation/moving happen O(logN) times, rather than O(N) times, where N is the final size of vector.
      Closes https://github.com/facebook/rocksdb/pull/3508
      
      Differential Revision: D6994228
      
      Pulled By: ajkr
      
      fbshipit-source-id: ab7c11e13ff37c8c6c8249be7a79566a4068cd27
      0454f781
    • Yi Wu's avatar
      Legocastle job to report lite build binary size to scuba · 989d1231
      Yi Wu authored
      Summary:
      Add a legocastle job to continuously build the last 10 commits every 4 hours and report lite build binary size to scuba.
      Closes https://github.com/facebook/rocksdb/pull/3511
      
      Differential Revision: D7001730
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 7c8ca87c46d663c786a0d32be69ebbe7b19a5eb9
      989d1231