Skip to content
  • Yanqin Jin's avatar
    Fail recovery when MANIFEST record checksum mismatch (#6996) · 61cc9ef7
    Yanqin Jin authored
    Summary:
    https://github.com/facebook/rocksdb/issues/5411 refactored `VersionSet::Recover` but introduced a bug, explained as follows.
    Before, once a checksum mismatch happens, `reporter` will set `s` to be non-ok. Therefore, Recover will stop processing the MANIFEST any further.
    ```
    // Correct
    // Inside Recover
    LogReporter reporter;
    reporter.status = &s;
    log::Reader reader(..., reporter);
    while (reader.ReadRecord() && s.ok()) {
    ...
    }
    ```
    The bug is that, the local variable `s` in `ReadAndRecover` won't be updated by `reporter` while reading the MANIFEST. It is possible that the reader sees a checksum mismatch in a record, but `ReadRecord` retries internally read and finds the next valid record. The mismatched record will be ignored and no error is reported.
    ```
    // Incorrect
    // Inside Recover
    LogReporter reporter;
    reporter.status = &s;
    log::Reader reader(..., reporter);
    s = ReadAndRecover(reader, ...);
    
    // Inside ReadAndRecover
      Status s;  // Shadows the s in Recover.
      while (reader.ReadRecord() && s.ok()) {
       ...
      }
    ```
    `LogReporter` can use a separate `log_read_status` to track the errors while reading the MANIFEST. RocksDB can process more MANIFEST entries only if `log_read_status.ok()`.
    Test plan (devserver):
    make check
    Pull Request resolved: https://github.com/facebook/rocksdb/pull/6996
    
    Reviewed By: ajkr
    
    Differential Revision: D22105746
    
    Pulled By: riversand963
    
    fbshipit-source-id: b22f717a423457a41ca152a242abbb64cf91fc38
    61cc9ef7