1. 18 Jan, 2018 1 commit
    • Andrew Kryczka's avatar
      fix live WALs purged while file deletions disabled · 46e599fc
      Andrew Kryczka authored
      When calling `DisableFileDeletions` followed by `GetSortedWalFiles`, we guarantee the files returned by the latter call won't be deleted until after file deletions are re-enabled. However, `GetSortedWalFiles` didn't omit files already planned for deletion via `PurgeObsoleteFiles`, so the guarantee could be broken.
      We fix it by making `GetSortedWalFiles` wait for the number of pending purges to hit zero if file deletions are disabled. This condition is eventually met since `PurgeObsoleteFiles` is guaranteed to be called for the existing pending purges, and new purges cannot be scheduled while file deletions are disabled. Once the condition is met, `GetSortedWalFiles` simply returns the content of DB and archive directories, which nobody can delete (except for deletion scheduler, for which I plan to fix this bug later) until deletions are re-enabled.
      Closes https://github.com/facebook/rocksdb/pull/3341
      Differential Revision: D6681131
      Pulled By: ajkr
      fbshipit-source-id: 90b1e2f2362ea9ef715623841c0826611a817634
  2. 17 Jan, 2018 5 commits
    • Andrew Kryczka's avatar
      fix DBTest.AutomaticConflictsWithManualCompaction · 266d85fb
      Andrew Kryczka authored
      After af92d4ad, only exclusive manual compaction can have conflict. dc360df8 updated the conflict-checking test case accordingly. But we missed the point that exclusive manual compaction can only conflict with automatic compactions scheduled after it, since it waits on pending automatic compactions before it begins running.
      This PR updates the test case to ensure the automatic compactions are scheduled after the manual compaction starts but before it finishes, thus ensuring a conflict. I also cleaned up the test case to use less space as I saw it cause out-of-space error on travis.
      Closes https://github.com/facebook/rocksdb/pull/3375
      Differential Revision: D6735162
      Pulled By: ajkr
      fbshipit-source-id: 020530a4e150a4786792dce7cec5d66b420cb884
    • Yi Wu's avatar
      Fix multiple build failures · dc360df8
      Yi Wu authored
      * Fix DBTest.CompactRangeWithEmptyBottomLevel lite build failure
      * Fix DBTest.AutomaticConflictsWithManualCompaction failure introduce by #3366
      * Fix BlockBasedTableTest::IndexUncompressed should be disabled if snappy is disabled
      * Fix ASAN failure with DBBasicTest::DBClose test
      Closes https://github.com/facebook/rocksdb/pull/3373
      Differential Revision: D6732313
      Pulled By: yiwu-arbug
      fbshipit-source-id: 1eb9b9d9a8d795f56188fa9770db9353f6fdedc5
    • Bartek Wrona's avatar
      Issue #3370 Broken CMakeLists.txt · bf6f03f3
      Bartek Wrona authored
      Issue #3370 Simple fixes to make RocksDB project working also as a submodule of other bigger one.
      Closes https://github.com/facebook/rocksdb/pull/3372
      Differential Revision: D6729595
      Pulled By: ajkr
      fbshipit-source-id: eee2589e7a7c4322873dff8510eebd050301c54c
    • Sunguck Lee's avatar
      Avoid too frequent MaybeScheduleFlushOrCompaction() call · af92d4ad
      Sunguck Lee authored
      If there's manual compaction in the queue, then "HaveManualCompaction(compaction_queue_.front())" will return true, and this cause too frequent MaybeScheduleFlushOrCompaction().
      Closes https://github.com/facebook/rocksdb/pull/3366
      Differential Revision: D6729575
      Pulled By: ajkr
      fbshipit-source-id: 96da04f8fd33297b1ccaec3badd9090403da29b0
    • Anand Ananthabhotla's avatar
      Add a Close() method to DB to return status when closing a db · d0f1b49a
      Anand Ananthabhotla authored
      Currently, the only way to close an open DB is to destroy the DB
      object. There is no way for the caller to know the status. In one
      instance, the destructor encountered an error due to failure to
      close a log file on HDFS. In order to prevent silent failures, we add
      DB::Close() that calls CloseImpl() which must be implemented by its
      The main failure point in the destructor is closing the log file. This
      patch also adds a Close() entry point to Logger in order to get status.
      When DBOptions::info_log is allocated and owned by the DBImpl, it is
      explicitly closed by DBImpl::CloseImpl().
      Closes https://github.com/facebook/rocksdb/pull/3348
      Differential Revision: D6698158
      Pulled By: anand1976
      fbshipit-source-id: 9468e2892553eb09c4c41b8723f590c0dbd8ab7d
  3. 13 Jan, 2018 4 commits
  4. 12 Jan, 2018 7 commits
    • Andrew Kryczka's avatar
      fix Gemfile.lock nokogiri dependencies · 6d7e3b9f
      Andrew Kryczka authored
      I installed the ruby dependencies and ran `bundle update nokogiri`. It depends on a newer version of "mini_portile2" which I missed in 9c2f64e1. Now `bundle install` works again.
      Closes https://github.com/facebook/rocksdb/pull/3361
      Differential Revision: D6710164
      Pulled By: ajkr
      fbshipit-source-id: 9a08d6cc6400ef495b715b3d68b04ce3f3367031
    • Peter (Stig) Edwards's avatar
      Consider an increase to buffer size when reading option file, from 4K to 8K. · 45828c72
      Peter (Stig) Edwards authored
      Hello and thank you for RocksDB,
      While looking into the buffered io used when an `OPTIONS` file is read I noticed the `OPTIONS` files produced by RocksDB 5.8.8 (and head of master) were just over 4096 bytes in size, resulting in the version of glibc I am using (glibc-2.17-196.el7) (on the filesystem used) being passed a 4K buffer for the `fread_unlocked` call and 2 system call reads using a 4096 buffer being used to read the contents of the `OPTIONS` file.
        If the buffer size is increased to 8192 then 1 system call read is used to read the contents.
        As I think the buffer size is just used for reading `OPTIONS` files, and I thought it likely that `OPTIONS` files have increased in size (as more options are added), I thought I would suggest an increase.
      [  If the comments from the top of the `OPTIONS` file are removed, and white space from the start of lines is removed then the size can be reduced to be under 4K, but as more options are added the size seems likely to grow again. ]
      Create a new database:
      > ./ldb --create_if_missing --db=/tmp/rdb_tmp put 1 1
      The OPTIONS file is 4252 bytes:
      > stat /tmp/rdb_tmp/OPTIONS* | head -n 2
        File: ‘/tmp/rdb_tmp/OPTIONS-000005’
        Size: 4252            Blocks: 16         IO Block: 4096   regular file
      Before, the 4096 byte buffer is used from 2 system read calls:
      > strace -f ./ldb --try_load_options --db=/tmp/rdb_tmp get DOES_NOT_EXIST 2>&1 |
          grep -A 1 'RocksDB option file'
      read(3, "# This is a RocksDB option file."..., 4096) = 4096
      read(3, "e\n  metadata_block_size=4096\n  c"..., 4096) = 156
      ltrace shows 4096 passed to fread_unlocked
      > ltrace -S -f ./ldb --try_load_options --db=/tmp/rdb_tmp get DOES_NOT_EXIST 2>&1 |
          grep -C 3 'RocksDB option file'
      [pid 51013] fread_unlocked(0x7ffd5fbf2d50, 1, 4096, 0x7fd2e084e780 <unfinished ...>
      [pid 51013] fstat@SYS(3, 0x7ffd5fbf28f0)         = 0
      [pid 51013] mmap@SYS(nil, 4096, 3, 34, -1, 0)    = 0x7fd2e318c000
      [pid 51013] read@SYS(3, "# This is a RocksDB option file."..., 4096) = 4096
      [pid 51013] <... fread_unlocked resumed> )       = 4096
      After, the 8192 byte buffer is used from 1 system read call:
      > strace -f ./ldb --try_load_options --db=/tmp/rdb_tmp get DOES_NOT_EXIST 2>&1 | grep -A 1 'RocksDB option file'
      read(3, "# This is a RocksDB option file."..., 8192) = 4252
      read(3, "", 4096)                       = 0
      ltrace shows 8192 passed to fread_unlocked
      > ltrace -S -f ./ldb --try_load_options --db=/tmp/rdb_tmp get DOES_NOT_EXIST 2>&1 | grep -C 3 'RocksDB option file'
      [pid 146611] fread_unlocked(0x7ffcfba382f0, 1, 8192, 0x7fc4e844e780 <unfinished ...>
      [pid 146611] fstat@SYS(3, 0x7ffcfba380f0)        = 0
      [pid 146611] mmap@SYS(nil, 4096, 3, 34, -1, 0)   = 0x7fc4eaee0000
      [pid 146611] read@SYS(3, "# This is a RocksDB option file."..., 8192) = 4252
      [pid 146611] read@SYS(3, "", 4096)               = 0
      [pid 146611] <... fread_unlocked resumed> )      = 4252
      [pid 146611] feof(0x7fc4e844e780)                = 1
      Closes https://github.com/facebook/rocksdb/pull/3294
      Differential Revision: D6653684
      Pulled By: ajkr
      fbshipit-source-id: 222f25f5442fefe1dcec18c700bd9e235bb63491
    • Changli Gao's avatar
      Fix memleak when DB::DeleteFile() · 0a7ba0e5
      Changli Gao authored
      Because the corresponding read_first_record_cache_ item wasn't
      erased, memory leaked.
      Closes https://github.com/facebook/rocksdb/pull/1712
      Differential Revision: D4363654
      Pulled By: ajkr
      fbshipit-source-id: 7da1adcfc8c380e4ffe05b8769fc2221ad17a225
    • Andrew Kryczka's avatar
      Update Gemfile.lock · 9c2f64e1
      Andrew Kryczka authored
      bump nokogiri number
      Closes https://github.com/facebook/rocksdb/pull/3358
      Differential Revision: D6708596
      Pulled By: ajkr
      fbshipit-source-id: 6662c3ba4994374ecf8a13928e915b655a980b70
    • Bo Liu's avatar
      add WriteBatch::WriteBatch(std::string&&) · 204af1ec
      Bo Liu authored
      to save a string copy for some use cases.
      The change is pretty straightforward, please feel free to let me know if you want to suggest any tests for it.
      Closes https://github.com/facebook/rocksdb/pull/3349
      Differential Revision: D6706828
      Pulled By: yiwu-arbug
      fbshipit-source-id: 873ce4442937bdc030b395c7f99228eda7f59eb7
    • Adam Retter's avatar
      Add Jenkins for PPC64le build status badge · d4da02d1
      Adam Retter authored
      Summary: Closes https://github.com/facebook/rocksdb/pull/3356
      Differential Revision: D6706909
      Pulled By: sagar0
      fbshipit-source-id: 6e4757d9eceab3e8a6c1b83c1be4108e86576cb2
    • Adam Retter's avatar
      FreeBSD build support for RocksDB and RocksJava · a53c571d
      Adam Retter authored
      Tested on a clean FreeBSD 11.01 x64.
      Closes https://github.com/facebook/rocksdb/pull/1423
      Closes https://github.com/facebook/rocksdb/pull/3357
      Differential Revision: D6705868
      Pulled By: sagar0
      fbshipit-source-id: cbccbbdafd4f42922512ca03619a5d5583a425fd
  5. 11 Jan, 2018 4 commits
  6. 10 Jan, 2018 6 commits
  7. 09 Jan, 2018 2 commits
  8. 06 Jan, 2018 4 commits
  9. 05 Jan, 2018 1 commit
    • Maysam Yabandeh's avatar
      Remove assert(s.ok()) from ::DeleteFile · 1c9ada59
      Maysam Yabandeh authored
      DestroyDB that is used in tests loops over the files returned by ::GetChildren and delete them one by one. Such files might be already deleted in the file system (during DeleteObsoleteFileImpl for example) but will get actually deleted with a delay sometimes before ::DeleteFile is called on the file name. We have some test failures where FaultInjectionTestEnv::DeleteFile fails on assert(s.ok()) during DestroyDB. This patch removes the assert statement to fix that.
      Closes https://github.com/facebook/rocksdb/pull/3324
      Differential Revision: D6659545
      Pulled By: maysamyabandeh
      fbshipit-source-id: 4c9552fbcd494dcf3e61d475c11fc965c4388b2c
  10. 04 Jan, 2018 2 commits
  11. 03 Jan, 2018 1 commit
    • Siying Dong's avatar
      Speed up BlockTest.BlockReadAmpBitmap · ccc095a0
      Siying Dong authored
      BlockTest.BlockReadAmpBitmap is too slow and times out in some environments. Speed it up by:
      (1) improve the way the verification is done. With this it is 5 times faster
      (2) run fewer tests for large blocks. This cut it down by another 10 times.
      Now it can finish in similar time as other tests.
      Closes https://github.com/facebook/rocksdb/pull/3313
      Differential Revision: D6643711
      Pulled By: siying
      fbshipit-source-id: c2397d666eab5421a78ca87e1e45491e0f832a6d
  12. 22 Dec, 2017 1 commit
    • burtonli's avatar
      Disable onboard cache for compaction output · b5c99cc9
      burtonli authored
      FILE_FLAG_WRITE_THROUGH is for disabling device on-board cache in windows API, which should be disabled if user doesn't need system cache.
      There was a perf issue related with this, we found during memtable flush, the high percentile latency jumps significantly. During profiling, we found those high latency (P99.9) read requests got queue-jumped by write requests from memtable flush and takes 80ms or even more time to wait, even when SSD overall IO throughput is relatively low.
      After enabling FILE_FLAG_WRITE_THROUGH, we rerun the test found high percentile latency drops a lot without observable impact on writes.
      Scenario 1: 40MB/s + 40MB/s  R/W compaction throughput
       Original | FILE_FLAG_WRITE_THROUGH | Percentage reduction
      P99.9 | 56.897 ms | 35.593 ms | -37.4%
      P99 | 3.905 ms | 3.896 ms | -2.8%
      Scenario 2:  14MB/s + 14MB/s R/W compaction throughput, cohosted with 100+ other rocksdb instances have manually triggered memtable flush operations (memtable is tiny), creating a lot of randomized the small file writes operations during test.
      Original | FILE_FLAG_WRITE_THROUGH | Percentage reduction
      P99.9 | 86.227   ms | 50.436 ms | -41.5%
      P99 | 8.415   ms | 3.356 ms | -60.1%
      Closes https://github.com/facebook/rocksdb/pull/3225
      Differential Revision: D6624174
      Pulled By: miasantreble
      fbshipit-source-id: 321b86aee9d74470840c70e5d0d4fa9880660a91
  13. 21 Dec, 2017 2 commits
    • Andrew Kryczka's avatar
      fix ForwardIterator reference to temporary object · f00e176c
      Andrew Kryczka authored
      Fixes the following ASAN error:
      ==2108042==ERROR: AddressSanitizer: stack-use-after-scope on address 0x7fc50ae9b868 at pc 0x7fc5112aff55 bp 0x7fff9eb9dc10 sp 0x7fff9eb9dc08
      === How to use this, how to get the raw stack trace, and more: fburl.com/ASAN ===
      READ of size 8 at 0x7fc50ae9b868 thread T0
      SCARINESS: 23 (8-byte-read-stack-use-after-scope)
           #0 rocksdb/dbformat.h:164                   rocksdb::InternalKeyComparator::user_comparator() const
           #1 librocksdb_src_rocksdb_lib.so+0x1429a7d  rocksdb::RangeDelAggregator::InitRep(std::vector<...> const&)
           #2 librocksdb_src_rocksdb_lib.so+0x142ceae  rocksdb::RangeDelAggregator::AddTombstones(std::unique_ptr<...>)
           #3 librocksdb_src_rocksdb_lib.so+0x1382d88  rocksdb::ForwardIterator::RebuildIterators(bool)
           #4 librocksdb_src_rocksdb_lib.so+0x1382362  rocksdb::ForwardIterator::ForwardIterator(rocksdb::DBImpl*, rocksdb::ReadOptions const&, rocksdb::ColumnFamilyData*, rocksdb::SuperVersion*)
           #5 librocksdb_src_rocksdb_lib.so+0x11f433f  rocksdb::DBImpl::NewIterator(rocksdb::ReadOptions const&, rocksdb::ColumnFamilyHandle*)
           #6 rocksdb/src/include/rocksdb/db.h:382     rocksdb::DB::NewIterator(rocksdb::ReadOptions const&)
           #7 rocksdb/db_range_del_test.cc:807         rocksdb::DBRangeDelTest_TailingIteratorRangeTombstoneUnsupported_Test::TestBody()
          #18 rocksdb/db_range_del_test.cc:1006        main
      Address 0x7fc50ae9b868 is located in stack of thread T0 at offset 104 in frame
           #0 librocksdb_src_rocksdb_lib.so+0x13825af  rocksdb::ForwardIterator::RebuildIterators(bool)
      Closes https://github.com/facebook/rocksdb/pull/3300
      Differential Revision: D6612989
      Pulled By: ajkr
      fbshipit-source-id: e7ea2ed914c1b80a8a29d71d92440a6bd9cbcc80
    • Maysam Yabandeh's avatar
      Blog post for WritePrepared Txn · 02a2c117
      Maysam Yabandeh authored
      Blog post to introduce the next generation of transaction engine at RocksDB.
      Closes https://github.com/facebook/rocksdb/pull/3296
      Differential Revision: D6612932
      Pulled By: maysamyabandeh
      fbshipit-source-id: 5bfa91ce84e937f5e4346bbda5a4725d0a7fd131