1. 18 Jan, 2018 1 commit
    • Andrew Kryczka's avatar
      fix live WALs purged while file deletions disabled · 46e599fc
      Andrew Kryczka authored
      Summary:
      When calling `DisableFileDeletions` followed by `GetSortedWalFiles`, we guarantee the files returned by the latter call won't be deleted until after file deletions are re-enabled. However, `GetSortedWalFiles` didn't omit files already planned for deletion via `PurgeObsoleteFiles`, so the guarantee could be broken.
      
      We fix it by making `GetSortedWalFiles` wait for the number of pending purges to hit zero if file deletions are disabled. This condition is eventually met since `PurgeObsoleteFiles` is guaranteed to be called for the existing pending purges, and new purges cannot be scheduled while file deletions are disabled. Once the condition is met, `GetSortedWalFiles` simply returns the content of DB and archive directories, which nobody can delete (except for deletion scheduler, for which I plan to fix this bug later) until deletions are re-enabled.
      Closes https://github.com/facebook/rocksdb/pull/3341
      
      Differential Revision: D6681131
      
      Pulled By: ajkr
      
      fbshipit-source-id: 90b1e2f2362ea9ef715623841c0826611a817634
      46e599fc
  2. 17 Jan, 2018 5 commits
    • Andrew Kryczka's avatar
      fix DBTest.AutomaticConflictsWithManualCompaction · 266d85fb
      Andrew Kryczka authored
      Summary:
      After af92d4ad, only exclusive manual compaction can have conflict. dc360df8 updated the conflict-checking test case accordingly. But we missed the point that exclusive manual compaction can only conflict with automatic compactions scheduled after it, since it waits on pending automatic compactions before it begins running.
      
      This PR updates the test case to ensure the automatic compactions are scheduled after the manual compaction starts but before it finishes, thus ensuring a conflict. I also cleaned up the test case to use less space as I saw it cause out-of-space error on travis.
      Closes https://github.com/facebook/rocksdb/pull/3375
      
      Differential Revision: D6735162
      
      Pulled By: ajkr
      
      fbshipit-source-id: 020530a4e150a4786792dce7cec5d66b420cb884
      266d85fb
    • Yi Wu's avatar
      Fix multiple build failures · dc360df8
      Yi Wu authored
      Summary:
      * Fix DBTest.CompactRangeWithEmptyBottomLevel lite build failure
      * Fix DBTest.AutomaticConflictsWithManualCompaction failure introduce by #3366
      * Fix BlockBasedTableTest::IndexUncompressed should be disabled if snappy is disabled
      * Fix ASAN failure with DBBasicTest::DBClose test
      Closes https://github.com/facebook/rocksdb/pull/3373
      
      Differential Revision: D6732313
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 1eb9b9d9a8d795f56188fa9770db9353f6fdedc5
      dc360df8
    • Bartek Wrona's avatar
      Issue #3370 Broken CMakeLists.txt · bf6f03f3
      Bartek Wrona authored
      Summary:
      Issue #3370 Simple fixes to make RocksDB project working also as a submodule of other bigger one.
      Closes https://github.com/facebook/rocksdb/pull/3372
      
      Differential Revision: D6729595
      
      Pulled By: ajkr
      
      fbshipit-source-id: eee2589e7a7c4322873dff8510eebd050301c54c
      bf6f03f3
    • Sunguck Lee's avatar
      Avoid too frequent MaybeScheduleFlushOrCompaction() call · af92d4ad
      Sunguck Lee authored
      Summary:
      If there's manual compaction in the queue, then "HaveManualCompaction(compaction_queue_.front())" will return true, and this cause too frequent MaybeScheduleFlushOrCompaction().
      
      https://github.com/facebook/rocksdb/issues/3198
      Closes https://github.com/facebook/rocksdb/pull/3366
      
      Differential Revision: D6729575
      
      Pulled By: ajkr
      
      fbshipit-source-id: 96da04f8fd33297b1ccaec3badd9090403da29b0
      af92d4ad
    • Anand Ananthabhotla's avatar
      Add a Close() method to DB to return status when closing a db · d0f1b49a
      Anand Ananthabhotla authored
      Summary:
      Currently, the only way to close an open DB is to destroy the DB
      object. There is no way for the caller to know the status. In one
      instance, the destructor encountered an error due to failure to
      close a log file on HDFS. In order to prevent silent failures, we add
      DB::Close() that calls CloseImpl() which must be implemented by its
      descendants.
      The main failure point in the destructor is closing the log file. This
      patch also adds a Close() entry point to Logger in order to get status.
      When DBOptions::info_log is allocated and owned by the DBImpl, it is
      explicitly closed by DBImpl::CloseImpl().
      Closes https://github.com/facebook/rocksdb/pull/3348
      
      Differential Revision: D6698158
      
      Pulled By: anand1976
      
      fbshipit-source-id: 9468e2892553eb09c4c41b8723f590c0dbd8ab7d
      d0f1b49a
  3. 13 Jan, 2018 4 commits
  4. 12 Jan, 2018 7 commits
    • Andrew Kryczka's avatar
      fix Gemfile.lock nokogiri dependencies · 6d7e3b9f
      Andrew Kryczka authored
      Summary:
      I installed the ruby dependencies and ran `bundle update nokogiri`. It depends on a newer version of "mini_portile2" which I missed in 9c2f64e1. Now `bundle install` works again.
      Closes https://github.com/facebook/rocksdb/pull/3361
      
      Differential Revision: D6710164
      
      Pulled By: ajkr
      
      fbshipit-source-id: 9a08d6cc6400ef495b715b3d68b04ce3f3367031
      6d7e3b9f
    • Peter (Stig) Edwards's avatar
      Consider an increase to buffer size when reading option file, from 4K to 8K. · 45828c72
      Peter (Stig) Edwards authored
      Summary:
      Hello and thank you for RocksDB,
      
      While looking into the buffered io used when an `OPTIONS` file is read I noticed the `OPTIONS` files produced by RocksDB 5.8.8 (and head of master) were just over 4096 bytes in size, resulting in the version of glibc I am using (glibc-2.17-196.el7) (on the filesystem used) being passed a 4K buffer for the `fread_unlocked` call and 2 system call reads using a 4096 buffer being used to read the contents of the `OPTIONS` file.
      
        If the buffer size is increased to 8192 then 1 system call read is used to read the contents.
      
        As I think the buffer size is just used for reading `OPTIONS` files, and I thought it likely that `OPTIONS` files have increased in size (as more options are added), I thought I would suggest an increase.
      
      [  If the comments from the top of the `OPTIONS` file are removed, and white space from the start of lines is removed then the size can be reduced to be under 4K, but as more options are added the size seems likely to grow again. ]
      
      Create a new database:
      
      ```
      > ./ldb --create_if_missing --db=/tmp/rdb_tmp put 1 1
      OK
      ```
      
      The OPTIONS file is 4252 bytes:
      
      ```
      > stat /tmp/rdb_tmp/OPTIONS* | head -n 2
        File: ‘/tmp/rdb_tmp/OPTIONS-000005’
        Size: 4252            Blocks: 16         IO Block: 4096   regular file
      ```
      
      Before, the 4096 byte buffer is used from 2 system read calls:
      
      ```
      > strace -f ./ldb --try_load_options --db=/tmp/rdb_tmp get DOES_NOT_EXIST 2>&1 |
          grep -A 1 'RocksDB option file'
      read(3, "# This is a RocksDB option file."..., 4096) = 4096
      read(3, "e\n  metadata_block_size=4096\n  c"..., 4096) = 156
      ```
      
      ltrace shows 4096 passed to fread_unlocked
      
      ```
      > ltrace -S -f ./ldb --try_load_options --db=/tmp/rdb_tmp get DOES_NOT_EXIST 2>&1 |
          grep -C 3 'RocksDB option file'
      [pid 51013] fread_unlocked(0x7ffd5fbf2d50, 1, 4096, 0x7fd2e084e780 <unfinished ...>
      [pid 51013] fstat@SYS(3, 0x7ffd5fbf28f0)         = 0
      [pid 51013] mmap@SYS(nil, 4096, 3, 34, -1, 0)    = 0x7fd2e318c000
      [pid 51013] read@SYS(3, "# This is a RocksDB option file."..., 4096) = 4096
      [pid 51013] <... fread_unlocked resumed> )       = 4096
      ...
      ```
      
      After, the 8192 byte buffer is used from 1 system read call:
      
      ```
      > strace -f ./ldb --try_load_options --db=/tmp/rdb_tmp get DOES_NOT_EXIST 2>&1 | grep -A 1 'RocksDB option file'
      read(3, "# This is a RocksDB option file."..., 8192) = 4252
      read(3, "", 4096)                       = 0
      ```
      
      ltrace shows 8192 passed to fread_unlocked
      
      ```
      > ltrace -S -f ./ldb --try_load_options --db=/tmp/rdb_tmp get DOES_NOT_EXIST 2>&1 | grep -C 3 'RocksDB option file'
      [pid 146611] fread_unlocked(0x7ffcfba382f0, 1, 8192, 0x7fc4e844e780 <unfinished ...>
      [pid 146611] fstat@SYS(3, 0x7ffcfba380f0)        = 0
      [pid 146611] mmap@SYS(nil, 4096, 3, 34, -1, 0)   = 0x7fc4eaee0000
      [pid 146611] read@SYS(3, "# This is a RocksDB option file."..., 8192) = 4252
      [pid 146611] read@SYS(3, "", 4096)               = 0
      [pid 146611] <... fread_unlocked resumed> )      = 4252
      [pid 146611] feof(0x7fc4e844e780)                = 1
      ```
      Closes https://github.com/facebook/rocksdb/pull/3294
      
      Differential Revision: D6653684
      
      Pulled By: ajkr
      
      fbshipit-source-id: 222f25f5442fefe1dcec18c700bd9e235bb63491
      45828c72
    • Changli Gao's avatar
      Fix memleak when DB::DeleteFile() · 0a7ba0e5
      Changli Gao authored
      Summary:
      Because the corresponding read_first_record_cache_ item wasn't
      erased, memory leaked.
      Closes https://github.com/facebook/rocksdb/pull/1712
      
      Differential Revision: D4363654
      
      Pulled By: ajkr
      
      fbshipit-source-id: 7da1adcfc8c380e4ffe05b8769fc2221ad17a225
      0a7ba0e5
    • Andrew Kryczka's avatar
      Update Gemfile.lock · 9c2f64e1
      Andrew Kryczka authored
      Summary:
      bump nokogiri number
      Closes https://github.com/facebook/rocksdb/pull/3358
      
      Differential Revision: D6708596
      
      Pulled By: ajkr
      
      fbshipit-source-id: 6662c3ba4994374ecf8a13928e915b655a980b70
      9c2f64e1
    • Bo Liu's avatar
      add WriteBatch::WriteBatch(std::string&&) · 204af1ec
      Bo Liu authored
      Summary:
      to save a string copy for some use cases.
      
      The change is pretty straightforward, please feel free to let me know if you want to suggest any tests for it.
      Closes https://github.com/facebook/rocksdb/pull/3349
      
      Differential Revision: D6706828
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 873ce4442937bdc030b395c7f99228eda7f59eb7
      204af1ec
    • Adam Retter's avatar
      Add Jenkins for PPC64le build status badge · d4da02d1
      Adam Retter authored
      Summary: Closes https://github.com/facebook/rocksdb/pull/3356
      
      Differential Revision: D6706909
      
      Pulled By: sagar0
      
      fbshipit-source-id: 6e4757d9eceab3e8a6c1b83c1be4108e86576cb2
      d4da02d1
    • Adam Retter's avatar
      FreeBSD build support for RocksDB and RocksJava · a53c571d
      Adam Retter authored
      Summary:
      Tested on a clean FreeBSD 11.01 x64.
      
      Closes https://github.com/facebook/rocksdb/pull/1423
      Closes https://github.com/facebook/rocksdb/pull/3357
      
      Differential Revision: D6705868
      
      Pulled By: sagar0
      
      fbshipit-source-id: cbccbbdafd4f42922512ca03619a5d5583a425fd
      a53c571d
  5. 11 Jan, 2018 4 commits
  6. 10 Jan, 2018 6 commits
  7. 09 Jan, 2018 2 commits
  8. 06 Jan, 2018 4 commits
  9. 05 Jan, 2018 1 commit
    • Maysam Yabandeh's avatar
      Remove assert(s.ok()) from ::DeleteFile · 1c9ada59
      Maysam Yabandeh authored
      Summary:
      DestroyDB that is used in tests loops over the files returned by ::GetChildren and delete them one by one. Such files might be already deleted in the file system (during DeleteObsoleteFileImpl for example) but will get actually deleted with a delay sometimes before ::DeleteFile is called on the file name. We have some test failures where FaultInjectionTestEnv::DeleteFile fails on assert(s.ok()) during DestroyDB. This patch removes the assert statement to fix that.
      Closes https://github.com/facebook/rocksdb/pull/3324
      
      Differential Revision: D6659545
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 4c9552fbcd494dcf3e61d475c11fc965c4388b2c
      1c9ada59
  10. 04 Jan, 2018 2 commits
  11. 03 Jan, 2018 1 commit
    • Siying Dong's avatar
      Speed up BlockTest.BlockReadAmpBitmap · ccc095a0
      Siying Dong authored
      Summary:
      BlockTest.BlockReadAmpBitmap is too slow and times out in some environments. Speed it up by:
      (1) improve the way the verification is done. With this it is 5 times faster
      (2) run fewer tests for large blocks. This cut it down by another 10 times.
      Now it can finish in similar time as other tests.
      Closes https://github.com/facebook/rocksdb/pull/3313
      
      Differential Revision: D6643711
      
      Pulled By: siying
      
      fbshipit-source-id: c2397d666eab5421a78ca87e1e45491e0f832a6d
      ccc095a0
  12. 22 Dec, 2017 1 commit
    • burtonli's avatar
      Disable onboard cache for compaction output · b5c99cc9
      burtonli authored
      Summary:
      FILE_FLAG_WRITE_THROUGH is for disabling device on-board cache in windows API, which should be disabled if user doesn't need system cache.
      There was a perf issue related with this, we found during memtable flush, the high percentile latency jumps significantly. During profiling, we found those high latency (P99.9) read requests got queue-jumped by write requests from memtable flush and takes 80ms or even more time to wait, even when SSD overall IO throughput is relatively low.
      
      After enabling FILE_FLAG_WRITE_THROUGH, we rerun the test found high percentile latency drops a lot without observable impact on writes.
      
      Scenario 1: 40MB/s + 40MB/s  R/W compaction throughput
      
       Original | FILE_FLAG_WRITE_THROUGH | Percentage reduction
      ---------------------------------------------------------------
      P99.9 | 56.897 ms | 35.593 ms | -37.4%
      P99 | 3.905 ms | 3.896 ms | -2.8%
      
      Scenario 2:  14MB/s + 14MB/s R/W compaction throughput, cohosted with 100+ other rocksdb instances have manually triggered memtable flush operations (memtable is tiny), creating a lot of randomized the small file writes operations during test.
      
      Original | FILE_FLAG_WRITE_THROUGH | Percentage reduction
      ---------------------------------------------------------------
      P99.9 | 86.227   ms | 50.436 ms | -41.5%
      P99 | 8.415   ms | 3.356 ms | -60.1%
      Closes https://github.com/facebook/rocksdb/pull/3225
      
      Differential Revision: D6624174
      
      Pulled By: miasantreble
      
      fbshipit-source-id: 321b86aee9d74470840c70e5d0d4fa9880660a91
      b5c99cc9
  13. 21 Dec, 2017 2 commits
    • Andrew Kryczka's avatar
      fix ForwardIterator reference to temporary object · f00e176c
      Andrew Kryczka authored
      Summary:
      Fixes the following ASAN error:
      
      ```
      ==2108042==ERROR: AddressSanitizer: stack-use-after-scope on address 0x7fc50ae9b868 at pc 0x7fc5112aff55 bp 0x7fff9eb9dc10 sp 0x7fff9eb9dc08
      === How to use this, how to get the raw stack trace, and more: fburl.com/ASAN ===
      READ of size 8 at 0x7fc50ae9b868 thread T0
      SCARINESS: 23 (8-byte-read-stack-use-after-scope)
           #0 rocksdb/dbformat.h:164                   rocksdb::InternalKeyComparator::user_comparator() const
           #1 librocksdb_src_rocksdb_lib.so+0x1429a7d  rocksdb::RangeDelAggregator::InitRep(std::vector<...> const&)
           #2 librocksdb_src_rocksdb_lib.so+0x142ceae  rocksdb::RangeDelAggregator::AddTombstones(std::unique_ptr<...>)
           #3 librocksdb_src_rocksdb_lib.so+0x1382d88  rocksdb::ForwardIterator::RebuildIterators(bool)
           #4 librocksdb_src_rocksdb_lib.so+0x1382362  rocksdb::ForwardIterator::ForwardIterator(rocksdb::DBImpl*, rocksdb::ReadOptions const&, rocksdb::ColumnFamilyData*, rocksdb::SuperVersion*)
           #5 librocksdb_src_rocksdb_lib.so+0x11f433f  rocksdb::DBImpl::NewIterator(rocksdb::ReadOptions const&, rocksdb::ColumnFamilyHandle*)
           #6 rocksdb/src/include/rocksdb/db.h:382     rocksdb::DB::NewIterator(rocksdb::ReadOptions const&)
           #7 rocksdb/db_range_del_test.cc:807         rocksdb::DBRangeDelTest_TailingIteratorRangeTombstoneUnsupported_Test::TestBody()
          #18 rocksdb/db_range_del_test.cc:1006        main
      
      Address 0x7fc50ae9b868 is located in stack of thread T0 at offset 104 in frame
           #0 librocksdb_src_rocksdb_lib.so+0x13825af  rocksdb::ForwardIterator::RebuildIterators(bool)
      ```
      Closes https://github.com/facebook/rocksdb/pull/3300
      
      Differential Revision: D6612989
      
      Pulled By: ajkr
      
      fbshipit-source-id: e7ea2ed914c1b80a8a29d71d92440a6bd9cbcc80
      f00e176c
    • Maysam Yabandeh's avatar
      Blog post for WritePrepared Txn · 02a2c117
      Maysam Yabandeh authored
      Summary:
      Blog post to introduce the next generation of transaction engine at RocksDB.
      Closes https://github.com/facebook/rocksdb/pull/3296
      
      Differential Revision: D6612932
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 5bfa91ce84e937f5e4346bbda5a4725d0a7fd131
      02a2c117