1. 20 Oct, 2021 1 commit
  2. 14 Oct, 2021 1 commit
  3. 27 Aug, 2021 1 commit
  4. 25 Aug, 2021 1 commit
  5. 15 Aug, 2021 1 commit
  6. 09 Nov, 2020 1 commit
  7. 29 Oct, 2020 3 commits
  8. 13 Oct, 2020 3 commits
  9. 24 Jul, 2020 2 commits
  10. 16 Jul, 2020 3 commits
    • Andrew Kryczka's avatar
      48bfca38
    • Adam Retter's avatar
      Only check for python location once (#7123) · c8b9556c
      Adam Retter authored
      Summary:
      This fixes an issue introduced in 0c56fc4d whereby the location of Python is evaluated many times and leads to excessive logging of unknown python locations of CentOS 6.
      
      The location is now only checked once.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7123
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D22532274
      
      Pulled By: ajkr
      
      fbshipit-source-id: cade71b4b46e9a23d63ecb4dd36a4ac8ae217970
      c8b9556c
    • Yanqin Jin's avatar
      Report corrupted keys during compaction (#7124) · 30bfa2a4
      Yanqin Jin authored
      Summary:
      Currently, RocksDB lets compaction to go through even in case of
      corrupted keys, the number of which is reported in CompactionJobStats.
      However, RocksDB does not check this value. We should let compaction run
      in a stricter mode.
      
      Temporarily disable two tests that allow corrupted keys in compaction.
      With this PR, the two tests will assert(false) and terminate. Still need
      to investigate what is the recommended google-test way of doing it.
      Death test (EXPECT_DEATH) in gtest has warnings now.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7124
      
      Test Plan: make check
      
      Reviewed By: ajkr
      
      Differential Revision: D22530722
      
      Pulled By: riversand963
      
      fbshipit-source-id: 6a5a6a992028c6d4f92cb74693c92db462ae4ad6
      30bfa2a4
  11. 10 Jul, 2020 4 commits
    • Andrew Kryczka's avatar
      bump version to 6.11.3 · 3e1eb99a
      Andrew Kryczka authored
      3e1eb99a
    • Yanqin Jin's avatar
      Fix data race to VersionSet::io_status_ (#7034) · 3514cf51
      Yanqin Jin authored
      Summary:
      After https://github.com/facebook/rocksdb/issues/6949 , VersionSet::io_status_ can be concurrently accessed by multiple
      threads without lock, causing tsan test to fail. For example, a bg flush thread
      resets io_status_ before calling LogAndApply(), while another thread already in
      the process of LogAndApply() reads io_status_. This is a bug.
      
      We do not have to reset io_status_ each time we call LogAndApply(). io_status_
      is part of the state of VersionSet, and it indicates the outcome of preceding
      MANIFEST/CURRENT files IO operations. Its value should be updated only when:
      
      1. MANIFEST/CURRENT files IO fail for the first time.
      2. MANIFEST/CURRENT files IO succeed as part of recovering from a prior
         failure without process restart, e.g. calling Resume().
      
      Test Plan (devserver):
      COMPILE_WITH_TSAN=1 make check
      COMPILE_WITH_TSAN=1 make db_test2
      ./db_test2 --gtest_filter=DBTest2.CompactionStall
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7034
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D22247137
      
      Pulled By: riversand963
      
      fbshipit-source-id: 77b83e05390f3ee3cd2d96d3fdd6fe4f225e3216
      3514cf51
    • Yanqin Jin's avatar
      First step towards handling MANIFEST write error (#6949) · dd63f04c
      Yanqin Jin authored
      Summary:
      This PR provides preliminary support for handling IO error during MANIFEST write.
      File write/sync is not guaranteed to be atomic. If we encounter an IOError while writing/syncing to the MANIFEST file, we cannot be sure about the state of the MANIFEST file. The version edits may or may not have reached the file. During cleanup, if we delete the newly-generated SST files referenced by the pending version edit(s), but the version edit(s) actually are persistent in the MANIFEST, then next recovery attempt will process the version edits(s) and then fail since the SST files have already been deleted.
      One approach is to truncate the MANIFEST after write/sync error, so that it is safe to delete the SST files. However, file truncation may not be supported on certain file systems. Therefore, we take the following approach.
      If an IOError is detected during MANIFEST write/sync, we disable file deletions for the faulty database. Depending on whether the IOError is retryable (set by underlying file system), either RocksDB or application can call `DB::Resume()`, or simply shutdown and restart. During `Resume()`, RocksDB will try to switch to a new MANIFEST and write all existing in-memory version storage in the new file. If this succeeds, then RocksDB may proceed. If all recovery is completed, then file deletions will be re-enabled.
      Note that multiple threads can call `LogAndApply()` at the same time, though only one of them will be going through the process MANIFEST write, possibly batching the version edits of other threads. When the leading MANIFEST writer finishes, all of the MANIFEST writing threads in this batch will have the same IOError. They will all call `ErrorHandler::SetBGError()` in which file deletion will be disabled.
      
      Possible future directions:
      - Add an `ErrorContext` structure so that it is easier to pass more info to `ErrorHandler`. Currently, as in this example, a new `BackgroundErrorReason` has to be added.
      
      Test plan (dev server):
      make check
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6949
      
      Reviewed By: anand1976
      
      Differential Revision: D22026020
      
      Pulled By: riversand963
      
      fbshipit-source-id: f3c68a2ef45d9b505d0d625c7c5e0c88495b91c8
      dd63f04c
    • Akanksha Mahajan's avatar
      Update Flush policy in PartitionedIndexBuilder on switching from user-key to... · 115c9113
      Akanksha Mahajan authored
      Update Flush policy in PartitionedIndexBuilder on switching from user-key to internal-key mode (#7096)
      
      Summary:
      When format_version is high enough to support user-key and
      there are index entries for same user key that spans multiple data
      blocks then it changes from user-key mode to internal-key mode. But the
      flush policy is not reset to point to Block Builder of internal-keys.
      After this switch, no entries are added to user key index partition
      result, thus it never triggers flushing the block.
      
      Fix: 1. After adding the entry in sub_builder_index_, if there is a switch
      from user-key to internal-key, then flush policy is updated to point to
      Block Builder of internal-keys index partition.
      2. Set sub_builder_index_->seperator_is_key_plus_seq_ = true if
      seperator_is_key_plus_seq_  is set to true so that subsequent partitions
      can also use internal key mode.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7096
      
      Test Plan: make check -j64
      
      Reviewed By: ajkr
      
      Differential Revision: D22416598
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: 01fc2dc07ea1b32f8fb803995ebe6e9a3fbe67ac
      115c9113
  12. 08 Jul, 2020 3 commits
  13. 24 Jun, 2020 3 commits
  14. 19 Jun, 2020 3 commits
    • sdong's avatar
      Fix a bug that causes iterator to return wrong result in a rare data race (#6973) · 1910560c
      sdong authored
      Summary:
      The bug fixed in https://github.com/facebook/rocksdb/pull/1816/ is now applicable to iterator too. This was not an issue but https://github.com/facebook/rocksdb/pull/2886 caused the regression. If a put and DB flush happens just between iterator to get latest sequence number and getting super version, empty result for the key or an older value can be returned, which is wrong.
      Fix it in the same way as the fix in https://github.com/facebook/rocksdb/issues/1816, that is to get the sequence number after referencing the super version.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6973
      
      Test Plan: Will run stress tests for a while to make sure there is no general regression.
      
      Reviewed By: ajkr
      
      Differential Revision: D22029348
      
      fbshipit-source-id: 94390f93630906796d6e2fec321f44a920953fd1
      1910560c
    • Yanqin Jin's avatar
      Fail recovery when MANIFEST record checksum mismatch (#6996) · 61cc9ef7
      Yanqin Jin authored
      Summary:
      https://github.com/facebook/rocksdb/issues/5411 refactored `VersionSet::Recover` but introduced a bug, explained as follows.
      Before, once a checksum mismatch happens, `reporter` will set `s` to be non-ok. Therefore, Recover will stop processing the MANIFEST any further.
      ```
      // Correct
      // Inside Recover
      LogReporter reporter;
      reporter.status = &s;
      log::Reader reader(..., reporter);
      while (reader.ReadRecord() && s.ok()) {
      ...
      }
      ```
      The bug is that, the local variable `s` in `ReadAndRecover` won't be updated by `reporter` while reading the MANIFEST. It is possible that the reader sees a checksum mismatch in a record, but `ReadRecord` retries internally read and finds the next valid record. The mismatched record will be ignored and no error is reported.
      ```
      // Incorrect
      // Inside Recover
      LogReporter reporter;
      reporter.status = &s;
      log::Reader reader(..., reporter);
      s = ReadAndRecover(reader, ...);
      
      // Inside ReadAndRecover
        Status s;  // Shadows the s in Recover.
        while (reader.ReadRecord() && s.ok()) {
         ...
        }
      ```
      `LogReporter` can use a separate `log_read_status` to track the errors while reading the MANIFEST. RocksDB can process more MANIFEST entries only if `log_read_status.ok()`.
      Test plan (devserver):
      make check
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6996
      
      Reviewed By: ajkr
      
      Differential Revision: D22105746
      
      Pulled By: riversand963
      
      fbshipit-source-id: b22f717a423457a41ca152a242abbb64cf91fc38
      61cc9ef7
    • sdong's avatar
      Fix the bug that compressed cache is disabled in read-only DBs (#6990) · f5d4dbbe
      sdong authored
      Summary:
      Compressed block cache is disabled in https://github.com/facebook/rocksdb/pull/4650 for no good reason. Re-enable it.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6990
      
      Test Plan: Add a unit test to make sure a general function works with read-only DB + compressed block cache.
      
      Reviewed By: ltamasi
      
      Differential Revision: D22072755
      
      fbshipit-source-id: 2a55df6363de23a78979cf6c747526359e5dc7a1
      f5d4dbbe
  15. 18 Jun, 2020 1 commit
  16. 17 Jun, 2020 2 commits
    • Yanqin Jin's avatar
      Fix a bug of overwriting return code (#6989) · a818699f
      Yanqin Jin authored
      Summary:
      In best-efforts recovery, an error that is not Corruption or IOError::kNotFound or IOError::kPathNotFound will be overwritten silently. Fix this by checking all non-ok cases and return early.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6989
      
      Test Plan: make check
      
      Reviewed By: ajkr
      
      Differential Revision: D22071418
      
      Pulled By: riversand963
      
      fbshipit-source-id: 5a4ea5dfb1a41f41c7a3fdaf62b163007b42f04b
      a818699f
    • Yanqin Jin's avatar
      Let best-efforts recovery ignore CURRENT file (#6970) · 97a69f43
      Yanqin Jin authored
      Summary:
      Best-efforts recovery does not check the content of CURRENT file to determine which MANIFEST to recover from. However, it still checks the presence of CURRENT file to determine whether to create a new DB during `open()`. Therefore, we can tweak the logic in `open()` a little bit so that best-efforts recovery does not rely on CURRENT file at all.
      
      Test plan (dev server):
      make check
      ./db_basic_test --gtest_filter=DBBasicTest.RecoverWithNoCurrentFile
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6970
      
      Reviewed By: anand1976
      
      Differential Revision: D22013990
      
      Pulled By: riversand963
      
      fbshipit-source-id: db552a1868c60ed70e1f7cd252a3a076eb8ea58f
      97a69f43
  17. 13 Jun, 2020 2 commits
    • Andrew Kryczka's avatar
      update HISTORY.md for 6.11 release (#6972) · af58d927
      Andrew Kryczka authored
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/6972
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D22021953
      
      Pulled By: ajkr
      
      fbshipit-source-id: 4debbafe45b5939fd28549230eebf6006eb43440
      af58d927
    • Levi Tamasi's avatar
      Maintain the set of linked SSTs in BlobFileMetaData (#6945) · 83833637
      Levi Tamasi authored
      Summary:
      The `FileMetaData` objects associated with table files already contain the
      number of the oldest blob file referenced by the SST in question. This patch
      adds the inverse mapping to `BlobFileMetaData`, namely the set of table file
      numbers for which the oldest blob file link points to the given blob file (these
      are referred to as *linked SSTs*). This mapping will be used by the GC logic.
      
      Implementation-wise, the patch builds on the `BlobFileMetaDataDelta`
      functionality introduced in https://github.com/facebook/rocksdb/pull/6835: newly linked/unlinked SSTs are
      accumulated in `BlobFileMetaDataDelta`, and the changes to the linked SST set
      are applied in one shot when the new `Version` is saved. The patch also reworks
      the blob file related consistency checks in `VersionBuilder` so they validate the
      consistency of the forward table file -> blob file links and the backward blob file ->
      table file links for blob files that are part of the `Version`.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6945
      
      Test Plan: `make check`
      
      Reviewed By: riversand963
      
      Differential Revision: D21912228
      
      Pulled By: ltamasi
      
      fbshipit-source-id: c5bc7acf6e729a8fccbb12672dd5cd00f6f000f8
      83833637
  18. 12 Jun, 2020 4 commits
    • Yanqin Jin's avatar
      Fail point-in-time WAL recovery upon IOError reading WAL (#6963) · 717749f4
      Yanqin Jin authored
      Summary:
      If `options.wal_recovery_mode == WALRecoveryMode::kPointInTimeRecovery`, RocksDB stops replaying WAL once hitting an error and discards the rest of the WAL. This can lead to data loss if the error occurs at an offset smaller than the last sync'ed offset.
      Ideally, RocksDB point-in-time recovery should permit recovery if the error occurs after last synced offset while fail recovery if error occurs before the last synced offset. However, RocksDB does not track the synced offset of WALs. Consequently, RocksDB does not know whether an error occurs before or after the last synced offset. An error can be one of the following.
      - WAL record checksum mismatch. This can result from both corruption of synced data and dropping of unsynced data during shutdown. We cannot be sure which one. In order not to defeat the original motivation to permit the latter case, we keep the original behavior of point-in-time WAL recovery.
      - IOError. This means the WAL can be bad, an indicator of whole file becoming unavailable, not to mention synced part of the WAL. Therefore, we choose to modify the behavior of point-in-time recovery and fail the database recovery.
      
      Test plan (devserver):
      make check
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6963
      
      Reviewed By: ajkr
      
      Differential Revision: D22011083
      
      Pulled By: riversand963
      
      fbshipit-source-id: f9cbf29a37dc5cc40d3fa62f89eed1ad67ca1536
      717749f4
    • Levi Tamasi's avatar
      Revisit the handling of the case when a file is re-added to the same level (#6939) · d854abad
      Levi Tamasi authored
      Summary:
      https://github.com/facebook/rocksdb/pull/6901 subtly changed the handling of the corner case
      when a table file is deleted from a level, then re-added to the same level. (Note: this
      should be extremely rare; one scenario that comes to mind is a trivial move followed by
      a call to `ReFitLevel` that moves the file back to the original level.) Before that change,
      a new `FileMetaData` object was created as a result of this sequence; after the change,
      the original `FileMetaData` was essentially resurrected (since the deletion and the addition
      simply cancel each other out with the change). This patch restores the original behavior,
      which is more intuitive considering the interface, and in sync with how trivial moves are handled.
      (Also note that `FileMetaData` contains some mutable data members, the values of which
      might be different in the resurrected object and the freshly created one.)
      The PR also fixes a bug in this area: with the original pre-6901 code, `VersionBuilder`
      would add the same file twice to the same level in the scenario described above.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6939
      
      Test Plan: `make check`
      
      Reviewed By: ajkr
      
      Differential Revision: D21905580
      
      Pulled By: ltamasi
      
      fbshipit-source-id: da07ae45384ecf3c6c53506d106432d88a7ec9df
      d854abad
    • Levi Tamasi's avatar
      Turn DBTest2.CompressionFailures into a parameterized test (#6968) · 722ebba8
      Levi Tamasi authored
      Summary:
      `DBTest2.CompressionFailures` currently tests many configurations
      sequentially using nested loops, which often leads to timeouts
      in our test system. The patch turns it into a parameterized test
      instead.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6968
      
      Test Plan: `make check`
      
      Reviewed By: siying
      
      Differential Revision: D22006954
      
      Pulled By: ltamasi
      
      fbshipit-source-id: f71f2f7108086b7651ecfce3d79a7fab24620b2c
      722ebba8
    • Zhichao Cao's avatar
      Ingest SST files with checksum information (#6891) · b3585a11
      Zhichao Cao authored
      Summary:
      Application can ingest SST files with file checksum information, such that during ingestion, DB is able to check data integrity and identify of the SST file. The PR introduces generate_and_verify_file_checksum to IngestExternalFileOption to control if the ingested checksum information should be verified with the generated checksum.
      
          1. If generate_and_verify_file_checksum options is *FALSE*: *1)* if DB does not enable SST file checksum, the checksum information ingested will be ignored; *2)* if DB enables the SST file checksum and the checksum function name matches the checksum function name in DB, we trust the ingested checksum, store it in Manifest. If the checksum function name does not match, we treat that as an error and fail the IngestExternalFile() call.
          2. If generate_and_verify_file_checksum options is *TRUE*: *1)* if DB does not enable SST file checksum, the checksum information ingested will be ignored; *2)* if DB enable the SST file checksum, we will use the checksum generator from DB to calculate the checksum for each ingested SST files after they are copied or moved. Then, compare the checksum results with the ingested checksum information: _A)_ if the checksum function name does not match, _verification always report true_ and we store the DB generated checksum information in Manifest. _B)_ if the checksum function name mach, and checksum match, ingestion continues and stores the checksum information in the Manifest. Otherwise, terminate file ingestion and report file corruption.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6891
      
      Test Plan: added unit test, pass make asan_check
      
      Reviewed By: pdillinger
      
      Differential Revision: D21935988
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: 7b55f486632db467e76d72602218d0658aa7f6ed
      b3585a11
  19. 11 Jun, 2020 1 commit