- 20 Oct, 2021 1 commit
-
-
rui xia authored
-
- 14 Oct, 2021 1 commit
-
-
rui xia authored
-
- 27 Aug, 2021 1 commit
-
-
rui xia authored
-
- 25 Aug, 2021 1 commit
-
-
rui xia authored
-
- 15 Aug, 2021 1 commit
-
-
jiexray authored
-
- 09 Nov, 2020 1 commit
-
-
Yun Tang authored
-
- 29 Oct, 2020 3 commits
-
-
Jason Volk authored
Summary: TryMerge() overzealously creates one huge file read request in an attempt to merge smaller disjoint requests. For example, ~30 input requests of ~100 bytes output as 1 request of 100 MiB causing alarmingly large read throughputs to be repeatedly observed by the environment. Signed-off-by:
Jason Volk <jason@zemos.net> Pull Request resolved: https://github.com/facebook/rocksdb/pull/6979 Reviewed By: siying Differential Revision: D22668892 Pulled By: cheng-chang fbshipit-source-id: 7506fe9621b7f1a747dadf6b8ddb1b1a141c1937
-
azagrebin authored
-
azagrebin authored
[FLINK-10471] Add Apache Flink specific compaction filter to evict expired state which has time-to-live
-
- 13 Oct, 2020 3 commits
-
-
Andrew Kryczka authored
-
Andrew Kryczka authored
Backports part of 75d3b6fdf0aa1007c4d26382f65be0adf4519a37.
-
Andrew Kryczka authored
Backports part of 75d3b6fdf0aa1007c4d26382f65be0adf4519a37.
-
- 24 Jul, 2020 2 commits
-
-
Yanqin Jin authored
-
Yanqin Jin authored
Summary: During memtable lookup, an unrecognized value type should be reported as Status::Corruption. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7121 Test Plan: make check Reviewed By: cheng-chang Differential Revision: D22512124 Pulled By: riversand963 fbshipit-source-id: 9b97be7d9b230c5aae9205f96054420e5ea09066
-
- 16 Jul, 2020 3 commits
-
-
Andrew Kryczka authored
-
Adam Retter authored
Summary: This fixes an issue introduced in 0c56fc4d whereby the location of Python is evaluated many times and leads to excessive logging of unknown python locations of CentOS 6. The location is now only checked once. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7123 Reviewed By: zhichao-cao Differential Revision: D22532274 Pulled By: ajkr fbshipit-source-id: cade71b4b46e9a23d63ecb4dd36a4ac8ae217970
-
Yanqin Jin authored
Summary: Currently, RocksDB lets compaction to go through even in case of corrupted keys, the number of which is reported in CompactionJobStats. However, RocksDB does not check this value. We should let compaction run in a stricter mode. Temporarily disable two tests that allow corrupted keys in compaction. With this PR, the two tests will assert(false) and terminate. Still need to investigate what is the recommended google-test way of doing it. Death test (EXPECT_DEATH) in gtest has warnings now. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7124 Test Plan: make check Reviewed By: ajkr Differential Revision: D22530722 Pulled By: riversand963 fbshipit-source-id: 6a5a6a992028c6d4f92cb74693c92db462ae4ad6
-
- 10 Jul, 2020 4 commits
-
-
Andrew Kryczka authored
-
Yanqin Jin authored
Summary: After https://github.com/facebook/rocksdb/issues/6949 , VersionSet::io_status_ can be concurrently accessed by multiple threads without lock, causing tsan test to fail. For example, a bg flush thread resets io_status_ before calling LogAndApply(), while another thread already in the process of LogAndApply() reads io_status_. This is a bug. We do not have to reset io_status_ each time we call LogAndApply(). io_status_ is part of the state of VersionSet, and it indicates the outcome of preceding MANIFEST/CURRENT files IO operations. Its value should be updated only when: 1. MANIFEST/CURRENT files IO fail for the first time. 2. MANIFEST/CURRENT files IO succeed as part of recovering from a prior failure without process restart, e.g. calling Resume(). Test Plan (devserver): COMPILE_WITH_TSAN=1 make check COMPILE_WITH_TSAN=1 make db_test2 ./db_test2 --gtest_filter=DBTest2.CompactionStall Pull Request resolved: https://github.com/facebook/rocksdb/pull/7034 Reviewed By: zhichao-cao Differential Revision: D22247137 Pulled By: riversand963 fbshipit-source-id: 77b83e05390f3ee3cd2d96d3fdd6fe4f225e3216
-
Yanqin Jin authored
Summary: This PR provides preliminary support for handling IO error during MANIFEST write. File write/sync is not guaranteed to be atomic. If we encounter an IOError while writing/syncing to the MANIFEST file, we cannot be sure about the state of the MANIFEST file. The version edits may or may not have reached the file. During cleanup, if we delete the newly-generated SST files referenced by the pending version edit(s), but the version edit(s) actually are persistent in the MANIFEST, then next recovery attempt will process the version edits(s) and then fail since the SST files have already been deleted. One approach is to truncate the MANIFEST after write/sync error, so that it is safe to delete the SST files. However, file truncation may not be supported on certain file systems. Therefore, we take the following approach. If an IOError is detected during MANIFEST write/sync, we disable file deletions for the faulty database. Depending on whether the IOError is retryable (set by underlying file system), either RocksDB or application can call `DB::Resume()`, or simply shutdown and restart. During `Resume()`, RocksDB will try to switch to a new MANIFEST and write all existing in-memory version storage in the new file. If this succeeds, then RocksDB may proceed. If all recovery is completed, then file deletions will be re-enabled. Note that multiple threads can call `LogAndApply()` at the same time, though only one of them will be going through the process MANIFEST write, possibly batching the version edits of other threads. When the leading MANIFEST writer finishes, all of the MANIFEST writing threads in this batch will have the same IOError. They will all call `ErrorHandler::SetBGError()` in which file deletion will be disabled. Possible future directions: - Add an `ErrorContext` structure so that it is easier to pass more info to `ErrorHandler`. Currently, as in this example, a new `BackgroundErrorReason` has to be added. Test plan (dev server): make check Pull Request resolved: https://github.com/facebook/rocksdb/pull/6949 Reviewed By: anand1976 Differential Revision: D22026020 Pulled By: riversand963 fbshipit-source-id: f3c68a2ef45d9b505d0d625c7c5e0c88495b91c8
-
Akanksha Mahajan authored
Update Flush policy in PartitionedIndexBuilder on switching from user-key to internal-key mode (#7096) Summary: When format_version is high enough to support user-key and there are index entries for same user key that spans multiple data blocks then it changes from user-key mode to internal-key mode. But the flush policy is not reset to point to Block Builder of internal-keys. After this switch, no entries are added to user key index partition result, thus it never triggers flushing the block. Fix: 1. After adding the entry in sub_builder_index_, if there is a switch from user-key to internal-key, then flush policy is updated to point to Block Builder of internal-keys index partition. 2. Set sub_builder_index_->seperator_is_key_plus_seq_ = true if seperator_is_key_plus_seq_ is set to true so that subsequent partitions can also use internal key mode. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7096 Test Plan: make check -j64 Reviewed By: ajkr Differential Revision: D22416598 Pulled By: akankshamahajan15 fbshipit-source-id: 01fc2dc07ea1b32f8fb803995ebe6e9a3fbe67ac
-
- 08 Jul, 2020 3 commits
-
-
Andrew Kryczka authored
-
Peter Dillinger authored
Summary: Fix a Facebook internal build Test Plan: buck build @mode/opt :c_test :c_test_bin (was compilation failure, now "not found") buck build @mode/dev :c_test :c_test_bin (still passes)
-
sdong authored
Summary: This is to fix special logic to run tests inside FB. Buck test is broken after moving to cpp_unittest(). Move c_test back to the previous approach. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7076 Test Plan: Watch the Sandcastle run Reviewed By: ajkr Differential Revision: D22370096 fbshipit-source-id: 4a464d0903f2c76ae2de3a8ad373ffc9bedec64c
-
- 24 Jun, 2020 3 commits
-
-
Levi Tamasi authored
Summary: The constant `kNoExpiration` is currently defined in an internal/implementation header (`blob_log_format.h`); the patch moves it to the public header `blob_db.h` so it is accessible to users. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7018 Test Plan: `make check` Reviewed By: riversand963 Differential Revision: D22191354 Pulled By: ltamasi fbshipit-source-id: 98c8012a83b999a3f1a30e955ce6bb71ba29dc5c
-
Andrew Kryczka authored
-
Andrew Kryczka authored
-
- 19 Jun, 2020 3 commits
-
-
sdong authored
Summary: The bug fixed in https://github.com/facebook/rocksdb/pull/1816/ is now applicable to iterator too. This was not an issue but https://github.com/facebook/rocksdb/pull/2886 caused the regression. If a put and DB flush happens just between iterator to get latest sequence number and getting super version, empty result for the key or an older value can be returned, which is wrong. Fix it in the same way as the fix in https://github.com/facebook/rocksdb/issues/1816, that is to get the sequence number after referencing the super version. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6973 Test Plan: Will run stress tests for a while to make sure there is no general regression. Reviewed By: ajkr Differential Revision: D22029348 fbshipit-source-id: 94390f93630906796d6e2fec321f44a920953fd1
-
Yanqin Jin authored
Summary: https://github.com/facebook/rocksdb/issues/5411 refactored `VersionSet::Recover` but introduced a bug, explained as follows. Before, once a checksum mismatch happens, `reporter` will set `s` to be non-ok. Therefore, Recover will stop processing the MANIFEST any further. ``` // Correct // Inside Recover LogReporter reporter; reporter.status = &s; log::Reader reader(..., reporter); while (reader.ReadRecord() && s.ok()) { ... } ``` The bug is that, the local variable `s` in `ReadAndRecover` won't be updated by `reporter` while reading the MANIFEST. It is possible that the reader sees a checksum mismatch in a record, but `ReadRecord` retries internally read and finds the next valid record. The mismatched record will be ignored and no error is reported. ``` // Incorrect // Inside Recover LogReporter reporter; reporter.status = &s; log::Reader reader(..., reporter); s = ReadAndRecover(reader, ...); // Inside ReadAndRecover Status s; // Shadows the s in Recover. while (reader.ReadRecord() && s.ok()) { ... } ``` `LogReporter` can use a separate `log_read_status` to track the errors while reading the MANIFEST. RocksDB can process more MANIFEST entries only if `log_read_status.ok()`. Test plan (devserver): make check Pull Request resolved: https://github.com/facebook/rocksdb/pull/6996 Reviewed By: ajkr Differential Revision: D22105746 Pulled By: riversand963 fbshipit-source-id: b22f717a423457a41ca152a242abbb64cf91fc38
-
sdong authored
Summary: Compressed block cache is disabled in https://github.com/facebook/rocksdb/pull/4650 for no good reason. Re-enable it. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6990 Test Plan: Add a unit test to make sure a general function works with read-only DB + compressed block cache. Reviewed By: ltamasi Differential Revision: D22072755 fbshipit-source-id: 2a55df6363de23a78979cf6c747526359e5dc7a1
-
- 18 Jun, 2020 1 commit
-
-
Andrew Kryczka authored
Summary: The 6.11.fb branch is already cut so I will also backport this PR to that branch. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6994 Reviewed By: riversand963 Differential Revision: D22084532 Pulled By: ajkr fbshipit-source-id: 0b025f738cc31c65c673cbf89302359e88a34d19
-
- 17 Jun, 2020 2 commits
-
-
Yanqin Jin authored
Summary: In best-efforts recovery, an error that is not Corruption or IOError::kNotFound or IOError::kPathNotFound will be overwritten silently. Fix this by checking all non-ok cases and return early. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6989 Test Plan: make check Reviewed By: ajkr Differential Revision: D22071418 Pulled By: riversand963 fbshipit-source-id: 5a4ea5dfb1a41f41c7a3fdaf62b163007b42f04b
-
Yanqin Jin authored
Summary: Best-efforts recovery does not check the content of CURRENT file to determine which MANIFEST to recover from. However, it still checks the presence of CURRENT file to determine whether to create a new DB during `open()`. Therefore, we can tweak the logic in `open()` a little bit so that best-efforts recovery does not rely on CURRENT file at all. Test plan (dev server): make check ./db_basic_test --gtest_filter=DBBasicTest.RecoverWithNoCurrentFile Pull Request resolved: https://github.com/facebook/rocksdb/pull/6970 Reviewed By: anand1976 Differential Revision: D22013990 Pulled By: riversand963 fbshipit-source-id: db552a1868c60ed70e1f7cd252a3a076eb8ea58f
-
- 13 Jun, 2020 2 commits
-
-
Andrew Kryczka authored
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/6972 Reviewed By: zhichao-cao Differential Revision: D22021953 Pulled By: ajkr fbshipit-source-id: 4debbafe45b5939fd28549230eebf6006eb43440
-
Levi Tamasi authored
Summary: The `FileMetaData` objects associated with table files already contain the number of the oldest blob file referenced by the SST in question. This patch adds the inverse mapping to `BlobFileMetaData`, namely the set of table file numbers for which the oldest blob file link points to the given blob file (these are referred to as *linked SSTs*). This mapping will be used by the GC logic. Implementation-wise, the patch builds on the `BlobFileMetaDataDelta` functionality introduced in https://github.com/facebook/rocksdb/pull/6835: newly linked/unlinked SSTs are accumulated in `BlobFileMetaDataDelta`, and the changes to the linked SST set are applied in one shot when the new `Version` is saved. The patch also reworks the blob file related consistency checks in `VersionBuilder` so they validate the consistency of the forward table file -> blob file links and the backward blob file -> table file links for blob files that are part of the `Version`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6945 Test Plan: `make check` Reviewed By: riversand963 Differential Revision: D21912228 Pulled By: ltamasi fbshipit-source-id: c5bc7acf6e729a8fccbb12672dd5cd00f6f000f8
-
- 12 Jun, 2020 4 commits
-
-
Yanqin Jin authored
Summary: If `options.wal_recovery_mode == WALRecoveryMode::kPointInTimeRecovery`, RocksDB stops replaying WAL once hitting an error and discards the rest of the WAL. This can lead to data loss if the error occurs at an offset smaller than the last sync'ed offset. Ideally, RocksDB point-in-time recovery should permit recovery if the error occurs after last synced offset while fail recovery if error occurs before the last synced offset. However, RocksDB does not track the synced offset of WALs. Consequently, RocksDB does not know whether an error occurs before or after the last synced offset. An error can be one of the following. - WAL record checksum mismatch. This can result from both corruption of synced data and dropping of unsynced data during shutdown. We cannot be sure which one. In order not to defeat the original motivation to permit the latter case, we keep the original behavior of point-in-time WAL recovery. - IOError. This means the WAL can be bad, an indicator of whole file becoming unavailable, not to mention synced part of the WAL. Therefore, we choose to modify the behavior of point-in-time recovery and fail the database recovery. Test plan (devserver): make check Pull Request resolved: https://github.com/facebook/rocksdb/pull/6963 Reviewed By: ajkr Differential Revision: D22011083 Pulled By: riversand963 fbshipit-source-id: f9cbf29a37dc5cc40d3fa62f89eed1ad67ca1536
-
Levi Tamasi authored
Summary: https://github.com/facebook/rocksdb/pull/6901 subtly changed the handling of the corner case when a table file is deleted from a level, then re-added to the same level. (Note: this should be extremely rare; one scenario that comes to mind is a trivial move followed by a call to `ReFitLevel` that moves the file back to the original level.) Before that change, a new `FileMetaData` object was created as a result of this sequence; after the change, the original `FileMetaData` was essentially resurrected (since the deletion and the addition simply cancel each other out with the change). This patch restores the original behavior, which is more intuitive considering the interface, and in sync with how trivial moves are handled. (Also note that `FileMetaData` contains some mutable data members, the values of which might be different in the resurrected object and the freshly created one.) The PR also fixes a bug in this area: with the original pre-6901 code, `VersionBuilder` would add the same file twice to the same level in the scenario described above. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6939 Test Plan: `make check` Reviewed By: ajkr Differential Revision: D21905580 Pulled By: ltamasi fbshipit-source-id: da07ae45384ecf3c6c53506d106432d88a7ec9df
-
Levi Tamasi authored
Summary: `DBTest2.CompressionFailures` currently tests many configurations sequentially using nested loops, which often leads to timeouts in our test system. The patch turns it into a parameterized test instead. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6968 Test Plan: `make check` Reviewed By: siying Differential Revision: D22006954 Pulled By: ltamasi fbshipit-source-id: f71f2f7108086b7651ecfce3d79a7fab24620b2c
-
Zhichao Cao authored
Summary: Application can ingest SST files with file checksum information, such that during ingestion, DB is able to check data integrity and identify of the SST file. The PR introduces generate_and_verify_file_checksum to IngestExternalFileOption to control if the ingested checksum information should be verified with the generated checksum. 1. If generate_and_verify_file_checksum options is *FALSE*: *1)* if DB does not enable SST file checksum, the checksum information ingested will be ignored; *2)* if DB enables the SST file checksum and the checksum function name matches the checksum function name in DB, we trust the ingested checksum, store it in Manifest. If the checksum function name does not match, we treat that as an error and fail the IngestExternalFile() call. 2. If generate_and_verify_file_checksum options is *TRUE*: *1)* if DB does not enable SST file checksum, the checksum information ingested will be ignored; *2)* if DB enable the SST file checksum, we will use the checksum generator from DB to calculate the checksum for each ingested SST files after they are copied or moved. Then, compare the checksum results with the ingested checksum information: _A)_ if the checksum function name does not match, _verification always report true_ and we store the DB generated checksum information in Manifest. _B)_ if the checksum function name mach, and checksum match, ingestion continues and stores the checksum information in the Manifest. Otherwise, terminate file ingestion and report file corruption. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6891 Test Plan: added unit test, pass make asan_check Reviewed By: pdillinger Differential Revision: D21935988 Pulled By: zhichao-cao fbshipit-source-id: 7b55f486632db467e76d72602218d0658aa7f6ed
-
- 11 Jun, 2020 1 commit
-
-
Levi Tamasi authored
Summary: This is required so that the test cases can safely be run in parallel. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6962 Test Plan: `make check` Reviewed By: zhichao-cao Differential Revision: D21980060 Pulled By: ltamasi fbshipit-source-id: 616b7a0b686155d3874848b9098c67ad3f47efcc
-