1. 21 Apr, 2018 1 commit
  2. 23 Feb, 2018 2 commits
  3. 17 Jan, 2018 1 commit
    • Andrew Kryczka's avatar
      fix DBTest.AutomaticConflictsWithManualCompaction · 266d85fb
      Andrew Kryczka authored
      Summary:
      After af92d4ad, only exclusive manual compaction can have conflict. dc360df8 updated the conflict-checking test case accordingly. But we missed the point that exclusive manual compaction can only conflict with automatic compactions scheduled after it, since it waits on pending automatic compactions before it begins running.
      
      This PR updates the test case to ensure the automatic compactions are scheduled after the manual compaction starts but before it finishes, thus ensuring a conflict. I also cleaned up the test case to use less space as I saw it cause out-of-space error on travis.
      Closes https://github.com/facebook/rocksdb/pull/3375
      
      Differential Revision: D6735162
      
      Pulled By: ajkr
      
      fbshipit-source-id: 020530a4e150a4786792dce7cec5d66b420cb884
      266d85fb
  4. 18 Oct, 2017 1 commit
    • Nikhil Benesch's avatar
      arena: derive alignment unit from std::max_align_t · c0208dff
      Nikhil Benesch authored
      Summary:
      As raised in #2265, the arena allocator will return memory that is improperly aligned to store a `std::function` on macOS. Oddly, I'm unable to tickle this bug without adding a `std::function` field to `struct ReadOptions`—but my proposal in #2265 does exactly that.
      
      In any case, here's a simple reproduction. Apply this bogus patch to get a `std::function` into `struct ReadOptions`
      
      ```
       --- a/include/rocksdb/options.h
      +++ b/include/rocksdb/options.h
      @@ -1035,6 +1035,8 @@ struct ReadOptions {
         // Default: 0
         uint64_t max_skippable_internal_keys;
      
      +  std::function<void()> foo;
      +
         ReadOptions();
         ReadOptions(bool cksum, bool cache);
       };
      ```
      
      then compile `db_properties_test` *with ubsan* and run `ReadLatencyHistogramByLevel`:
      
      ```
      $ make COMPILE_WITH_UBSAN=1 db_properties_test
      $ ./db_properties_test --gtest_filter=DBPropertiesTest.ReadLatencyHistogramByLevel
      ```
      
      ubsan will complain about several misaligned accesses:
      
      ```
      Note: Google Test filter = DBPropertiesTest.ReadLatencyHistogramByLevel
      [==========] Running 1 test from 1 test case.
      [----------] Global test environment set-up.
      [----------] 1 test from DBPropertiesTest
      [ RUN      ] DBPropertiesTest.ReadLatencyHistogramByLevel
      util/coding.h:372:12: runtime error: load of misaligned address 0x00010d85516c for type 'const unsigned long', which requires 8 byte alignment
      0x00010d85516c: note: pointer points here
        01 00 34 57 00 00 00 00  02 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  78 24 82 0a 01 00 00 00
                    ^
      util/coding.h:362:3: runtime error: store to misaligned address 0x7fff5733fac4 for type 'unsigned long', which requires 8 byte alignment
      0x7fff5733fac4: note: pointer points here
        01 00 00 00 00 00 00 00  02 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  80 1d 96 0d 01 00 00 00
                    ^
      util/coding.h:372:12: runtime error: load of misaligned address 0x00010d85516c for type 'const unsigned long', which requires 8 byte alignment
      0x00010d85516c: note: pointer points here
        01 00 34 57 00 00 00 00  02 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  78 24 82 0a 01 00 00 00
                    ^
      version_set.cc:854: runtime error: constructor call on misaligned address 0x00010dbfa5c8 for type 'rocksdb::(anonymous namespace)::LevelFileIteratorState', which requires 16 byte alignment
      0x00010dbfa5c8: note: pointer points here
       00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00
                    ^
      version_set.cc:512: runtime error: constructor call on misaligned address 0x00010dbfa5c8 for type 'rocksdb::(anonymous namespace)::LevelFileIteratorState', which requires 16 byte alignment
      0x00010dbfa5c8: note: pointer points here
       00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00
                    ^
      version_set.cc:505: runtime error: constructor call on misaligned address 0x00010dbfa5e8 for type 'rocksdb::ReadOptions', which requires 16 byte alignment
      0x00010dbfa5e8: note: pointer points here
       00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00
                    ^
      options.h:931: runtime error: constructor call on misaligned address 0x00010dbfa5e8 for type 'rocksdb::ReadOptions', which requires 16 byte alignment
      0x00010dbfa5e8: note: pointer points here
       00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00
                    ^
      options.h:931: runtime error: constructor call on misaligned address 0x00010dbfa628 for type 'std::__1::function<void ()>', which requires 16 byte alignment
      0x00010dbfa628: note: pointer points here
       00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00
                    ^
      functional:1583: runtime error: constructor call on misaligned address 0x00010dbfa628 for type 'std::__1::function<void ()>', which requires 16 byte alignment
      0x00010dbfa628: note: pointer points here
       00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00
                    ^
      /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/functional:1585:9: runtime error: member access within misaligned address 0x00010dbfa628 for type 'std::__1::function<void ()>', which requires 16 byte alignment
      0x00010dbfa628: note: pointer points here
       00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00
                    ^
      /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/functional:1585:9: runtime error: store to misaligned address 0x00010dbfa648 for type '__base *' (aka '__base<void ()> *'), which requires 16 byte alignment
      0x00010dbfa648: note: pointer points here
       00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00
                    ^
      db/version_set.cc:864:29: runtime error: upcast of misaligned address 0x00010dbfa5c8 for type 'rocksdb::(anonymous namespace)::LevelFileIteratorState', which requires 16 byte alignment
      0x00010dbfa5c8: note: pointer points here
       00 00 00 00  a0 db 70 0a 01 00 00 00  00 00 00 00 00 00 00 00  90 14 98 0d 01 00 00 00  00 00 00 00
                    ^
      db/version_set.cc:521:12: runtime error: member access within misaligned address 0x00010dbfa5c8 for type 'rocksdb::(anonymous namespace)::LevelFileIteratorState', which requires 16 byte alignment
      0x00010dbfa5c8: note: pointer points here
       00 00 00 00  a0 db 70 0a 01 00 00 00  00 00 00 00 00 00 00 00  90 14 98 0d 01 00 00 00  00 00 00 00
                    ^
      db/version_set.cc:521:12: runtime error: load of misaligned address 0x00010dbfa5d8 for type 'rocksdb::TableCache *', which requires 16 byte alignment
      0x00010dbfa5d8: note: pointer points here
       00 00 00 00  90 14 98 0d 01 00 00 00  00 00 00 00 00 00 00 00  01 01 ff ff ff ff ff ff  00 00 00 00
                    ^
      db/version_set.cc:522:9: runtime error: member access within misaligned address 0x00010dbfa5c8 for type 'rocksdb::(anonymous namespace)::LevelFileIteratorState', which requires 16 byte alignment
      0x00010dbfa5c8: note: pointer points here
       00 00 00 00  a0 db 70 0a 01 00 00 00  00 00 00 00 00 00 00 00  90 14 98 0d 01 00 00 00  00 00 00 00
                    ^
      db/version_set.cc:522:9: runtime error: reference binding to misaligned address 0x00010dbfa5e8 for type 'const rocksdb::ReadOptions', which requires 16 byte alignment
      0x00010dbfa5e8: note: pointer points here
       00 00 00 00  01 01 ff ff ff ff ff ff  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00
                    ^
      db/version_set.cc:522:24: runtime error: member access within misaligned address 0x00010dbfa5c8 for type 'rocksdb::(anonymous namespace)::LevelFileIteratorState', which requires 16 byte alignment
      0x00010dbfa5c8: note: pointer points here
       00 00 00 00  a0 db 70 0a 01 00 00 00  00 00 00 00 00 00 00 00  90 14 98 0d 01 00 00 00  00 00 00 00
                    ^
      db/version_set.cc:522:38: runtime error: member access within misaligned address 0x00010dbfa5c8 for type 'rocksdb::(anonymous namespace)::LevelFileIteratorState', which requires 16 byte alignment
      0x00010dbfa5c8: note: pointer points here
       00 00 00 00  a0 db 70 0a 01 00 00 00  00 00 00 00 00 00 00 00  90 14 98 0d 01 00 00 00  00 00 00 00
                    ^
      db/version_set.cc:522:57: runtime error: member access within misaligned address 0x00010dbfa5c8 for type 'rocksdb::(anonymous namespace)::LevelFileIteratorState', which requires 16 byte alignment
      0x00010dbfa5c8: note: pointer points here
       00 00 00 00  a0 db 70 0a 01 00 00 00  00 00 00 00 00 00 00 00  90 14 98 0d 01 00 00 00  00 00 00 00
                    ^
      db/version_set.cc:522:57: runtime error: load of misaligned address 0x00010dbfa678 for type 'rocksdb::RangeDelAggregator *', which requires 16 byte alignment
      0x00010dbfa678: note: pointer points here
       01 00 00 00  d0 a1 bf 0d 01 00 00 00  00 00 00 00 00 00 00 00  f8 db 70 0a 01 00 00 00  00 00 00 00
                    ^
      db/version_set.cc:523:54: runtime error: member access within misaligned address 0x00010dbfa5c8 for type 'rocksdb::(anonymous namespace)::LevelFileIteratorState', which requires 16 byte alignment
      0x00010dbfa5c8: note: pointer points here
       00 00 00 00  a0 db 70 0a 01 00 00 00  00 00 00 00 00 00 00 00  90 14 98 0d 01 00 00 00  00 00 00 00
                    ^
      db/version_set.cc:523:54: runtime error: load of misaligned address 0x00010dbfa668 for type 'rocksdb::HistogramImpl *', which requires 16 byte alignment
      0x00010dbfa668: note: pointer points here
       01 00 00 00  c8 88 a5 0d 01 00 00 00  00 00 00 00 01 00 00 00  d0 a1 bf 0d 01 00 00 00  00 00 00 00
                    ^
      db/version_set.cc:524:9: runtime error: member access within misaligned address 0x00010dbfa5c8 for type 'rocksdb::(anonymous namespace)::LevelFileIteratorState', which requires 16 byte alignment
      0x00010dbfa5c8: note: pointer points here
       00 00 00 00  a0 db 70 0a 01 00 00 00  00 00 00 00 00 00 00 00  90 14 98 0d 01 00 00 00  00 00 00 00
                    ^
      db/version_set.cc:524:47: runtime error: member access within misaligned address 0x00010dbfa5c8 for type 'rocksdb::(anonymous namespace)::LevelFileIteratorState', which requires 16 byte alignment
      0x00010dbfa5c8: note: pointer points here
       00 00 00 00  a0 db 70 0a 01 00 00 00  00 00 00 00 00 00 00 00  90 14 98 0d 01 00 00 00  00 00 00 00
                    ^
      db/version_set.cc:524:62: runtime error: member access within misaligned address 0x00010dbfa5c8 for type 'rocksdb::(anonymous namespace)::LevelFileIteratorState', which requires 16 byte alignment
      0x00010dbfa5c8: note: pointer points here
       00 00 00 00  a0 db 70 0a 01 00 00 00  00 00 00 00 00 00 00 00  90 14 98 0d 01 00 00 00  00 00 00 00
                    ^
      db/table_cache.cc:228:33: runtime error: reference binding to misaligned address 0x00010dbfa5e8 for type 'const rocksdb::ReadOptions', which requires 16 byte alignment
      0x00010dbfa5e8: note: pointer points here
       00 00 00 00  01 01 ff ff ff ff ff ff  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00
                    ^
      table/block_based_table_reader.cc:1554:41: runtime error: reference binding to misaligned address 0x00010dbfa5e8 for type 'const rocksdb::ReadOptions', which requires 16 byte alignment
      0x00010dbfa5e8: note: pointer points here
       00 00 00 00  01 01 ff ff ff ff ff ff  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00
                    ^
      table/block_based_table_reader.cc:1396:21: runtime error: reference binding to misaligned address 0x00010dbfa5e8 for type 'const rocksdb::ReadOptions', which requires 16 byte alignment
      0x00010dbfa5e8: note: pointer points here
       00 00 00 00  01 01 ff ff ff ff ff ff  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00
                    ^
      include/rocksdb/options.h:931:8: runtime error: reference binding to misaligned address 0x00010dbfa628 for type 'const std::function<void ()>', which requires 16 byte alignment
      0x00010dbfa628: note: pointer points here
       00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00
                    ^
      /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/functional:1584:13: runtime error: load of misaligned address 0x00010dbfa648 for type '__base *const' (aka '__base<void ()> *const'), which requires 16 byte alignment
      0x00010dbfa648: note: pointer points here
       00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  c8 a5 97 0d 01 00 00 00  38 36 9b 0d
                    ^
      table/block_based_table_reader.cc:1555:24: runtime error: reference binding to misaligned address 0x00010dbfa5e8 for type 'const rocksdb::ReadOptions', which requires 16 byte alignment
      0x00010dbfa5e8: note: pointer points here
       00 00 00 00  01 01 ff ff ff ff ff ff  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00
                    ^
      db/table_cache.cc:244:54: runtime error: load of misaligned address 0x00010dbfa618 for type 'const bool', which requires 16 byte alignment
      0x00010dbfa618: note: pointer points here
       00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00
                    ^
      db/table_cache.cc:246:49: runtime error: reference binding to misaligned address 0x00010dbfa5e8 for type 'const rocksdb::ReadOptions', which requires 16 byte alignment
      0x00010dbfa5e8: note: pointer points here
       00 00 00 00  01 01 ff ff ff ff ff ff  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00
                    ^
      db/version_set.cc:532:12: runtime error: member access within misaligned address 0x00010dbfa5c8 for type 'rocksdb::(anonymous namespace)::LevelFileIteratorState', which requires 16 byte alignment
      0x00010dbfa5c8: note: pointer points here
       00 00 00 00  a0 db 70 0a 01 00 00 00  00 00 00 00 00 00 00 00  90 14 98 0d 01 00 00 00  00 00 00 00
                    ^
      db/version_set.cc:532:12: runtime error: member access within misaligned address 0x00010dbfa5e8 for type 'const rocksdb::ReadOptions', which requires 16 byte alignment
      0x00010dbfa5e8: note: pointer points here
       00 00 00 00  01 01 ff ff ff ff ff ff  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00
                    ^
      db/version_set.cc:532:26: runtime error: load of misaligned address 0x00010dbfa5f8 for type 'const rocksdb::Slice *const', which requires 16 byte alignment
      0x00010dbfa5f8: note: pointer points here
       00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00
                    ^
      version_set.cc:493: runtime error: member call on misaligned address 0x00010dbfa5c8 for type 'rocksdb::(anonymous namespace)::LevelFileIteratorState', which requires 16 byte alignment
      0x00010dbfa5c8: note: pointer points here
       00 00 00 00  a0 db 70 0a 01 00 00 00  00 00 00 00 00 00 00 00  90 14 98 0d 01 00 00 00  00 00 00 00
                    ^
      version_set.cc:493: runtime error: member call on misaligned address 0x00010dbfa5e8 for type 'rocksdb::ReadOptions', which requires 16 byte alignment
      0x00010dbfa5e8: note: pointer points here
       00 00 00 00  01 01 ff ff ff ff ff ff  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00
                    ^
      options.h:931: runtime error: member call on misaligned address 0x00010dbfa5e8 for type 'rocksdb::ReadOptions', which requires 16 byte alignment
      0x00010dbfa5e8: note: pointer points here
       00 00 00 00  01 01 ff ff ff ff ff ff  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00
                    ^
      options.h:931: runtime error: member call on misaligned address 0x00010dbfa628 for type 'std::__1::function<void ()>', which requires 16 byte alignment
      0x00010dbfa628: note: pointer points here
       00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00
                    ^
      functional:1765: runtime error: member call on misaligned address 0x00010dbfa628 for type 'std::__1::function<void ()>', which requires 16 byte alignment
      0x00010dbfa628: note: pointer points here
       00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00
                    ^
      /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/functional:1766:9: runtime error: member access within misaligned address 0x00010dbfa628 for type 'std::__1::function<void ()>', which requires 16 byte alignment
      0x00010dbfa628: note: pointer points here
       00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00
                    ^
      /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/functional:1766:9: runtime error: load of misaligned address 0x00010dbfa648 for type '__base *' (aka '__base<void ()> *'), which requires 16 byte alignment
      0x00010dbfa648: note: pointer points here
       00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  c8 a5 97 0d 01 00 00 00  38 36 9b 0d
                    ^
      /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/functional:1766:27: runtime error: member access within misaligned address 0x00010dbfa628 for type 'std::__1::function<void ()>', which requires 16 byte alignment
      0x00010dbfa628: note: pointer points here
       00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00
                    ^
      /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/functional:1768:14: runtime error: member access within misaligned address 0x00010dbfa628 for type 'std::__1::function<void ()>', which requires 16 byte alignment
      0x00010dbfa628: note: pointer points here
       00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00
                    ^
      /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/functional:1768:14: runtime error: load of misaligned address 0x00010dbfa648 for type '__base *' (aka '__base<void ()> *'), which requires 16 byte alignment
      0x00010dbfa648: note: pointer points here
       00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  c8 a5 97 0d 01 00 00 00  38 36 9b 0d
                    ^
      [       OK ] DBPropertiesTest.ReadLatencyHistogramByLevel (1599 ms)
      [----------] 1 test from DBPropertiesTest (1599 ms total)
      
      [----------] Global test environment tear-down
      [==========] 1 test from 1 test case ran. (1599 ms total)
      [  PASSED  ] 1 test.
      ```
      
      So it seems the root cause is that the internal implementation of `std::function` on macOS (and perhaps with libc++ generally?) requires 16-byte aligned memory, but the arena allocator only guarantees that the returned memory will be `sizeof(void*)` aligned, which is only 8-byte alignment on my machine. This patch solves the problem by adjusting the allocator to derive the necessary alignment from `alignof(std::max_align_t)`, which is properly 16 bytes on my machine.
      
      As I mentioned in #2265, none of RocksDB's tests will cause this unaligned access to actually abort the process, but, on macOS, linking CockroachDB against a version of RocksDB with the above patch and letting it run for just a few seconds will cause a SIGABRT.
      
      ```
      Process 19792 stopped
      * thread #2, stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
          frame #0: 0x0000000004f5e78f cockroach`DBNewIter + 95
      cockroach`DBNewIter:
      ->  0x4f5e78f <+95>:  callq  *0x28(%rax)
          0x4f5e792 <+98>:  jmp    0x4f5e79e                 ; <+110>
          0x4f5e794 <+100>: movq   -0x50(%rbp), %rcx
          0x4f5e798 <+104>: movq   %rax, %rdi
      (lldb) bt
      * thread #2, stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
        * frame #0: 0x0000000004f5e78f cockroach`DBNewIter + 95
      ```
      
      I'd get you a backtrace, but [Go doesn't include cgo debug information on macOS](https://github.com/golang/go/issues/6942). I've also tried building against libc++ on Linux, where debug information would be available, but I can't seem to trigger the bug there.
      
      In any case, this PR both fixes the segfault in CockroachDB and fixes the warnings reported by ubsan.
      Closes https://github.com/facebook/rocksdb/pull/2347
      
      Differential Revision: D5108596
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: bd5e4323b2ce915ed4fe78e123cb8996aec75a00
      c0208dff
  5. 22 Jul, 2017 2 commits
  6. 18 Jul, 2017 1 commit
  7. 16 Jul, 2017 1 commit
  8. 25 Jun, 2017 1 commit
    • Maysam Yabandeh's avatar
      Optimize for serial commits in 2PC · 499ebb3a
      Maysam Yabandeh authored
      Summary:
      Throughput: 46k tps in our sysbench settings (filling the details later)
      
      The idea is to have the simplest change that gives us a reasonable boost
      in 2PC throughput.
      
      Major design changes:
      1. The WAL file internal buffer is not flushed after each write. Instead
      it is flushed before critical operations (WAL copy via fs) or when
      FlushWAL is called by MySQL. Flushing the WAL buffer is also protected
      via mutex_.
      2. Use two sequence numbers: last seq, and last seq for write. Last seq
      is the last visible sequence number for reads. Last seq for write is the
      next sequence number that should be used to write to WAL/memtable. This
      allows to have a memtable write be in parallel to WAL writes.
      3. BatchGroup is not used for writes. This means that we can have
      parallel writers which changes a major assumption in the code base. To
      accommodate for that i) allow only 1 WriteImpl that intends to write to
      memtable via mem_mutex_--which is fine since in 2PC almost all of the memtable writes
      come via group commit phase which is serial anyway, ii) make all the
      parts in the code base that assumed to be the only writer (via
      EnterUnbatched) to also acquire mem_mutex_, iii) stat updates are
      protected via a stat_mutex_.
      
      Note: the first commit has the approach figured out but is not clean.
      Submitting the PR anyway to get the early feedback on the approach. If
      we are ok with the approach I will go ahead with this updates:
      0) Rebase with Yi's pipelining changes
      1) Currently batching is disabled by default to make sure that it will be
      consistent with all unit tests. Will make this optional via a config.
      2) A couple of unit tests are disabled. They need to be updated with the
      serial commit of 2PC taken into account.
      3) Replacing BatchGroup with mem_mutex_ got a bit ugly as it requires
      releasing mutex_ beforehand (the same way EnterUnbatched does). This
      needs to be cleaned up.
      Closes https://github.com/facebook/rocksdb/pull/2345
      
      Differential Revision: D5210732
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 78653bd95a35cd1e831e555e0e57bdfd695355a4
      499ebb3a
  9. 23 Jun, 2017 1 commit
    • Siying Dong's avatar
      Fix Data Race Between CreateColumnFamily() and GetAggregatedIntProperty() · 6837a176
      Siying Dong authored
      Summary:
      CreateColumnFamily() releases DB mutex after adding column family to the set and install super version (to write option file), so if users call GetAggregatedIntProperty() in the middle, then super version will be null and the process will crash. Fix it by skipping those column families without super version installed.
      
      Maybe we should also fix the problem of releasing the lock when reading option file, but it is more risky. so I'm doing a quick and safer fix and we can investigate it later.
      Closes https://github.com/facebook/rocksdb/pull/2475
      
      Differential Revision: D5298053
      
      Pulled By: siying
      
      fbshipit-source-id: 4b3c8f91c60400b163fcc6cda8a0c77723be0ef6
      6837a176
  10. 25 May, 2017 2 commits
    • Andrew Kryczka's avatar
      fix column_family_test asan · a99fb992
      Andrew Kryczka authored
      Summary:
      stop calling Close() at the end of tests holding a compaction pressure token since it causes the write controller to be deleted while it's still needed. these calls were pointless anyways since Close() is already called in the test's destructor.
      Closes https://github.com/facebook/rocksdb/pull/2367
      
      Differential Revision: D5125906
      
      Pulled By: ajkr
      
      fbshipit-source-id: 6cad8673e5546a82ff602ac0ba59cc3f68dbde46
      a99fb992
    • Andrew Kryczka's avatar
      Introduce max_background_jobs mutable option · bb01c188
      Andrew Kryczka authored
      Summary:
      - `max_background_flushes` and `max_background_compactions` are still supported for backwards compatibility
      - `base_background_compactions` is completely deprecated. Now we just throttle to one background compaction when there's no pressure.
      - `max_background_jobs` is added to automatically partition the concurrent background jobs into flushes vs compactions. Currently it's very simple as we just allocate one-fourth of the jobs to flushes, and the remaining can be used for compactions.
      - The test cases that set `base_background_compactions > 1` needed to be updated. I just grab the pressure token such that the desired number of compactions can be scheduled.
      Closes https://github.com/facebook/rocksdb/pull/2205
      
      Differential Revision: D4937461
      
      Pulled By: ajkr
      
      fbshipit-source-id: df52cbbd497e13bbc9a60560a5ac2a2526b3f1f9
      bb01c188
  11. 19 May, 2017 1 commit
  12. 11 May, 2017 1 commit
  13. 08 May, 2017 1 commit
    • Yi Wu's avatar
      Add bulk create/drop column family API · 2cd00773
      Yi Wu authored
      Summary:
      Adding DB::CreateColumnFamilie() and DB::DropColumnFamilies() to bulk create/drop column families. This is to address the problem creating/dropping 1k column families takes minutes. The bottleneck is we persist options files for every single column family create/drop, and it parses the persisted options file for verification, which take a lot CPU time.
      
      The new APIs simply create/drop column families individually, and persist options file once at the end. This improves create 1k column families to within ~0.1s. Further improvement can be merge manifest write to one IO.
      Closes https://github.com/facebook/rocksdb/pull/2248
      
      Differential Revision: D5001578
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: d4e00bda671451e0b314c13e12ad194b1704aa03
      2cd00773
  14. 06 Apr, 2017 1 commit
  15. 05 Apr, 2017 1 commit
    • Andrew Kryczka's avatar
      Level-based L0->L0 compaction · d659faad
      Andrew Kryczka authored
      Summary:
      Level-based L0->L0 compaction operates on spans of files that aren't currently being compacted. It reduces the number of L0 files, thus making write stall conditions harder to reach.
      
      - L0->L0 is triggered when base level is unavailable due to pending compactions
      - L0->L0 always outputs one file of at most `max_level0_burst_file_size` bytes.
      - Subcompactions are disabled for L0->L0 since we want to output one file.
      - Input files are chosen as the longest span of available files that will fit within the size limit. This minimizes number of files in L0.
      Closes https://github.com/facebook/rocksdb/pull/2027
      
      Differential Revision: D4760318
      
      Pulled By: ajkr
      
      fbshipit-source-id: 9d07183
      d659faad
  16. 30 Mar, 2017 1 commit
  17. 14 Feb, 2017 1 commit
    • Sagar Vemuri's avatar
      Remove disableDataSync option · eb912a92
      Sagar Vemuri authored
      Summary:
      Remove disableDataSync, and another similarly named disable_data_sync options.
      This is being done to simplify options, and also because the performance gains of this feature can be achieved by other methods.
      Closes https://github.com/facebook/rocksdb/pull/1859
      
      Differential Revision: D4541292
      
      Pulled By: sagar0
      
      fbshipit-source-id: 5b3a6ca
      eb912a92
  18. 07 Feb, 2017 1 commit
    • Dmitri Smirnov's avatar
      Windows thread · 0a4cdde5
      Dmitri Smirnov authored
      Summary:
      introduce new methods into a public threadpool interface,
      - allow submission of std::functions as they allow greater flexibility.
      - add Joining methods to the implementation to join scheduled and submitted jobs with
        an option to cancel jobs that did not start executing.
      - Remove ugly `#ifdefs` between pthread and std implementation, make it uniform.
      - introduce pimpl for a drop in replacement of the implementation
      - Introduce rocksdb::port::Thread typedef which is a replacement for std::thread.  On Posix Thread defaults as before std::thread.
      - Implement WindowsThread that allocates memory in a more controllable manner than windows std::thread with a replaceable implementation.
      - should be no functionality changes.
      Closes https://github.com/facebook/rocksdb/pull/1823
      
      Differential Revision: D4492902
      
      Pulled By: siying
      
      fbshipit-source-id: c74cb11
      0a4cdde5
  19. 24 Nov, 2016 1 commit
    • Siying Dong's avatar
      Improve Write Stalling System · cd7c4143
      Siying Dong authored
      Summary:
      Current write stalling system has the problem of lacking of positive feedback if the restricted rate is already too low. Users sometimes stack in very low slowdown value. With the diff, we add a positive feedback (increasing the slowdown value) if we recover from slowdown state back to normal. To avoid the positive feedback to keep the slowdown value to be to high, we add issue a negative feedback every time we are close to the stop condition. Experiments show it is easier to reach a relative balance than before.
      
      Also increase level0_stop_writes_trigger default from 24 to 32. Since level0_slowdown_writes_trigger default is 20, stop trigger 24 only gives four files as the buffer time to slowdown writes. In order to avoid stop in four files while 20 files have been accumulated, the slowdown value must be very low, which is amost the same as stop. It also doesn't give enough time for the slowdown value to converge. Increase it to 32 will smooth out the system.
      Closes https://github.com/facebook/rocksdb/pull/1562
      
      Differential Revision: D4218519
      
      Pulled By: siying
      
      fbshipit-source-id: 95e4088
      cd7c4143
  20. 17 Nov, 2016 1 commit
  21. 10 Nov, 2016 1 commit
  22. 22 Oct, 2016 1 commit
  23. 20 Oct, 2016 1 commit
    • sdong's avatar
      column_family_test: disable some tests in LITE · fb2e4129
      sdong authored
      Summary: Some tests in column_family_test depend on functions that are not available in LITE build, which sometimes cause flakiness. Disable them.
      
      Test Plan: Run those tests in LITE build.
      
      Reviewers: yiwu, IslamAbdelRahman
      
      Reviewed By: IslamAbdelRahman
      
      Subscribers: leveldb, andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D65271
      fb2e4129
  24. 15 Oct, 2016 1 commit
  25. 24 Sep, 2016 1 commit
    • Yi Wu's avatar
      Split DBOptions into ImmutableDBOptions and MutableDBOptions · 9ed928e7
      Yi Wu authored
      Summary: Use ImmutableDBOptions/MutableDBOptions internally and DBOptions only for user-facing APIs. MutableDBOptions is barely a placeholder for now. I'll start to move options to MutableDBOptions in following diffs.
      
      Test Plan:
        make all check
      
      Reviewers: yhchiang, IslamAbdelRahman, sdong
      
      Reviewed By: sdong
      
      Subscribers: andrewkr, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D64065
      9ed928e7
  26. 14 Sep, 2016 1 commit
    • Yi Wu's avatar
      Refactor MutableCFOptions · 81747f1b
      Yi Wu authored
      Summary:
      * Change constructor of MutableCFOptions to depends only on ColumnFamilyOptions.
      * Move `max_subcompactions`, `compaction_options_fifo` and `compaction_pri` to ImmutableCFOptions to make it clear that they are immutable.
      
      Test Plan: existing unit tests.
      
      Reviewers: yhchiang, IslamAbdelRahman, sdong
      
      Reviewed By: sdong
      
      Subscribers: andrewkr, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D63945
      81747f1b
  27. 08 Sep, 2016 1 commit
    • sdong's avatar
      Fix Flaky ColumnFamilyTest.FlushCloseWALFiles · 67036c04
      sdong authored
      Summary: In ColumnFamilyTest.FlushCloseWALFiles, there is a small window in which the flush has finished but the log writer is not yet closed, causing the assert failure. Fix it by explicitly waiting the flush job to finish.
      
      Test Plan: Run the test many times in high parallelism.
      
      Subscribers: leveldb, andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D63423
      67036c04
  28. 02 Sep, 2016 1 commit
    • sdong's avatar
      Merge options source_compaction_factor, max_grandparent_overlap_bytes and... · 32149059
      sdong authored
      Merge options source_compaction_factor, max_grandparent_overlap_bytes and expanded_compaction_factor into max_compaction_bytes
      
      Summary: To reduce number of options, merge source_compaction_factor, max_grandparent_overlap_bytes and expanded_compaction_factor into max_compaction_bytes.
      
      Test Plan: Add two new unit tests. Run all existing tests, including jtest.
      
      Reviewers: yhchiang, igor, IslamAbdelRahman
      
      Reviewed By: IslamAbdelRahman
      
      Subscribers: leveldb, andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D59829
      32149059
  29. 11 Aug, 2016 1 commit
    • sdong's avatar
      read_options.background_purge_on_iterator_cleanup to cover forward iterator... · 56dd0341
      sdong authored
      read_options.background_purge_on_iterator_cleanup to cover forward iterator and log file closing too.
      
      Summary: With read_options.background_purge_on_iterator_cleanup=true, File deletion and closing can still happen in forward iterator, or WAL file closing. Cover those cases too.
      
      Test Plan: I am adding unit tests.
      
      Reviewers: andrewkr, IslamAbdelRahman, yiwu
      
      Reviewed By: yiwu
      
      Subscribers: leveldb, andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D61503
      56dd0341
  30. 03 Aug, 2016 2 commits
    • Yi Wu's avatar
      Ignore write stall triggers when auto-compaction is disabled · ee027fc1
      Yi Wu authored
      Summary:
      My understanding is that the purpose of write stall triggers are to wait for auto-compaction to catch up. Without auto-compaction, we don't need to stall writes.
      
      Also with this diff, flush/compaction conditions are recalculated on dynamic option change. Previously the conditions are recalculate only when write stall options are changed.
      
      Test Plan: See the new test. Removed two tests that are no longer valid.
      
      Reviewers: IslamAbdelRahman, sdong
      
      Reviewed By: sdong
      
      Subscribers: andrewkr, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D61437
      ee027fc1
    • Jay Edgar's avatar
      Add a GetComparator() function to the ColumnFamilyHandle base class so that... · cdc4eb68
      Jay Edgar authored
      Add a GetComparator() function to the ColumnFamilyHandle base class so that the user's comparator can be retrieved.
      
      Summary: MyRocks is adding support for the user of the SstFileWriter which needs a comparator.  It would be more convenient to get the comparator from the column family (which already has to have it) than to have caller keep track of it.
      
      Test Plan: Standard tests (adding one for the new method)
      
      Reviewers: IslamAbdelRahman, sdong
      
      Reviewed By: sdong
      
      Subscribers: andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D61155
      cdc4eb68
  31. 26 Jul, 2016 1 commit
    • sdong's avatar
      Ignore stale logs while restarting DBs · 2a6d0cde
      sdong authored
      Summary:
      Stale log files can be deleted out of order. This can happen for various reasons. One of the reason is that no data is ever inserted to a column family and we have an optimization to update its log number, but not all the old log files are cleaned up (the case shown in the unit tests added). It can also happen when we simply delete multiple log files out of order.
      
      This causes data corruption because we simply increase seqID after processing the next row and we may end up with writing data with smaller seqID than what is already flushed to memtables.
      
      In DB recovery, for the oldest files we are replaying, if there it contains no data for any column family, we ignore the sequence IDs in the file.
      
      Test Plan: Add two unit tests that fail without the fix.
      
      Reviewers: IslamAbdelRahman, igor, yiwu
      
      Reviewed By: yiwu
      
      Subscribers: hermanlee4, yoshinorim, leveldb, andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D60891
      2a6d0cde
  32. 22 Jul, 2016 1 commit
    • sdong's avatar
      Need to make sure log file synced before flushing memtable of one column family · d5a51d4d
      sdong authored
      Summary: Multiput atomiciy is broken across multiple column families if we don't sync WAL before flushing one column family. The WAL file may contain a write batch containing writes to a key to the CF to be flushed and a key to other CF. If we don't sync WAL before flushing, if machine crashes after flushing, the write batch will only be partial recovered. Data to other CFs are lost.
      
      Test Plan: Add a new unit test which will fail without the diff.
      
      Reviewers: yhchiang, IslamAbdelRahman, igor, yiwu
      
      Reviewed By: yiwu
      
      Subscribers: yiwu, leveldb, andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D60915
      d5a51d4d
  33. 14 Jul, 2016 1 commit
  34. 07 Jul, 2016 1 commit
    • sdong's avatar
      Add More Logging to track total_log_size · a00bf1b3
      sdong authored
      Summary: We saw instances where total_log_size is off the real value, but I'm not able to reproduce it. Add more logging to help debugging when it happens again.
      
      Test Plan: Run the unit test and see the logging.
      
      Reviewers: andrewkr, yhchiang, igor, IslamAbdelRahman
      
      Reviewed By: IslamAbdelRahman
      
      Subscribers: leveldb, andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D60081
      a00bf1b3
  35. 08 Jun, 2016 1 commit
    • Anirban Rahut's avatar
      Adding test for contiguous WAL detection · a73b26f6
      Anirban Rahut authored
      Summary:
      Add a test to detect that when WAL gets truncated,
      seq no's are checked to be contiguous.
      
      This test is put in ColumnFamilyTest as it has the necessary
      infrastructure/functions for flushing column families, which
      we use to ensure 2 active WAL files
      
      Test Plan:
      This is a test, no feature has been added.
      This test fails today and hence disabled
      
      Reviewers: sdong
      
      Reviewed By: sdong
      
      Subscribers: lgalanis, dhruba, andrewkr, pritamdamania
      
      Differential Revision: https://reviews.facebook.net/D59253
      a73b26f6
  36. 20 May, 2016 1 commit