1. 01 May, 2018 1 commit
    • Andrew Kryczka's avatar
      Avoid directory renames in BackupEngine · 747c8532
      Andrew Kryczka authored
      Summary:
      We used to name private directories like "1.tmp" while BackupEngine populated them, and then rename without the ".tmp" suffix (i.e., rename "1.tmp" to "1") after all files were copied. On glusterfs, directory renames like this require operations across many hosts, and partial failures have caused operational problems.
      
      Fortunately we don't need to rename private directories. We already have a meta-file that uses the tempfile-rename pattern to commit a backup atomically after all its files have been successfully copied. So we can copy private files directly to their final location, so now there's no directory rename.
      Closes https://github.com/facebook/rocksdb/pull/3749
      
      Differential Revision: D7705610
      
      Pulled By: ajkr
      
      fbshipit-source-id: fd724a28dd2bf993ce323a5f2cb7e7d6980cc346
      747c8532
  2. 17 Apr, 2018 1 commit
  3. 07 Mar, 2018 1 commit
    • amytai's avatar
      Disallow compactions if there isn't enough free space · 0a3db28d
      amytai authored
      Summary:
      This diff handles cases where compaction causes an ENOSPC error.
      This does not handle corner cases where another background job is started while compaction is running, and the other background job triggers ENOSPC, although we do allow the user to provision for these background jobs with SstFileManager::SetCompactionBufferSize.
      It also does not handle the case where compaction has finished and some other background job independently triggers ENOSPC.
      
      Usage: Functionality is inside SstFileManager. In particular, users should set SstFileManager::SetMaxAllowedSpaceUsage, which is the reference highwatermark for determining whether to cancel compactions.
      Closes https://github.com/facebook/rocksdb/pull/3449
      
      Differential Revision: D7016941
      
      Pulled By: amytai
      
      fbshipit-source-id: 8965ab8dd8b00972e771637a41b4e6c645450445
      0a3db28d
  4. 06 Mar, 2018 1 commit
  5. 23 Feb, 2018 2 commits
  6. 22 Feb, 2018 1 commit
    • Andrew Kryczka's avatar
      BackupEngine gluster-friendly file naming convention · b0929776
      Andrew Kryczka authored
      Summary:
      Use the rsync tempfile naming convention in our `BackupEngine`. The temp file follows the format, `.<filename>.<suffix>`, which is later renamed to `<filename>`. We fix `tmp` as the `<suffix>` as we don't need to use random bytes for now. The benefit is gluster treats this tempfile naming convention specially and applies hashing only to `<filename>`, so the file won't need to be linked or moved when it's renamed. Our gluster team suggested this will make things operationally easier.
      Closes https://github.com/facebook/rocksdb/pull/3463
      
      Differential Revision: D6893333
      
      Pulled By: ajkr
      
      fbshipit-source-id: fd7622978f4b2487fce33cde40dd3124f16bcaa8
      b0929776
  7. 13 Sep, 2017 1 commit
    • Andrew Kryczka's avatar
      support opening zero backups during engine init · f5148ade
      Andrew Kryczka authored
      Summary:
      There are internal users who open BackupEngine for writing new backups only, and they don't care whether old backups can be read or not. The condition `BackupableDBOptions::max_valid_backups_to_open == 0` should be supported (previously in df74b775 I made the mistake of choosing 0 as a special value to disable the limit).
      Closes https://github.com/facebook/rocksdb/pull/2819
      
      Differential Revision: D5751599
      
      Pulled By: ajkr
      
      fbshipit-source-id: e73ac19eb5d756d6b68601eae8e43407ee4f2752
      f5148ade
  8. 08 Sep, 2017 1 commit
  9. 01 Sep, 2017 1 commit
    • Andrew Kryczka's avatar
      fix backup engine when latest backup corrupt · b97685ae
      Andrew Kryczka authored
      Summary:
      Backup engine is intentionally openable even when some backups are corrupt. Previously the engine could write new backups as long as the most recent backup wasn't corrupt. This PR makes the backup engine able to create new backups even when the most recent one is corrupt.
      
      We now maintain two ID instance variables:
      
      - `latest_backup_id_` is used when creating backup to choose the new ID
      - `latest_valid_backup_id_` is used when restoring latest backup since we want most recent valid one
      Closes https://github.com/facebook/rocksdb/pull/2804
      
      Differential Revision: D5734148
      
      Pulled By: ajkr
      
      fbshipit-source-id: db440707b31df2c7b084188aa5f6368449e10bcf
      b97685ae
  10. 22 Jul, 2017 2 commits
  11. 17 Jul, 2017 1 commit
    • Yedidya Feldblum's avatar
      CodeMod: Prefer ADD_FAILURE() over EXPECT_TRUE(false), et cetera · f1a056e0
      Yedidya Feldblum authored
      Summary:
      CodeMod: Prefer `ADD_FAILURE()` over `EXPECT_TRUE(false)`, et cetera.
      
      The tautologically-conditioned and tautologically-contradicted boolean expectations/assertions have better alternatives: unconditional passes and failures.
      
      Reviewed By: Orvid
      
      Differential Revision:
      D5432398
      
      Tags: codemod, codemod-opensource
      
      fbshipit-source-id: d16b447e8696a6feaa94b41199f5052226ef6914
      f1a056e0
  12. 16 Jul, 2017 1 commit
  13. 25 Jun, 2017 1 commit
    • Maysam Yabandeh's avatar
      Optimize for serial commits in 2PC · 499ebb3a
      Maysam Yabandeh authored
      Summary:
      Throughput: 46k tps in our sysbench settings (filling the details later)
      
      The idea is to have the simplest change that gives us a reasonable boost
      in 2PC throughput.
      
      Major design changes:
      1. The WAL file internal buffer is not flushed after each write. Instead
      it is flushed before critical operations (WAL copy via fs) or when
      FlushWAL is called by MySQL. Flushing the WAL buffer is also protected
      via mutex_.
      2. Use two sequence numbers: last seq, and last seq for write. Last seq
      is the last visible sequence number for reads. Last seq for write is the
      next sequence number that should be used to write to WAL/memtable. This
      allows to have a memtable write be in parallel to WAL writes.
      3. BatchGroup is not used for writes. This means that we can have
      parallel writers which changes a major assumption in the code base. To
      accommodate for that i) allow only 1 WriteImpl that intends to write to
      memtable via mem_mutex_--which is fine since in 2PC almost all of the memtable writes
      come via group commit phase which is serial anyway, ii) make all the
      parts in the code base that assumed to be the only writer (via
      EnterUnbatched) to also acquire mem_mutex_, iii) stat updates are
      protected via a stat_mutex_.
      
      Note: the first commit has the approach figured out but is not clean.
      Submitting the PR anyway to get the early feedback on the approach. If
      we are ok with the approach I will go ahead with this updates:
      0) Rebase with Yi's pipelining changes
      1) Currently batching is disabled by default to make sure that it will be
      consistent with all unit tests. Will make this optional via a config.
      2) A couple of unit tests are disabled. They need to be updated with the
      serial commit of 2PC taken into account.
      3) Replacing BatchGroup with mem_mutex_ got a bit ugly as it requires
      releasing mutex_ beforehand (the same way EnterUnbatched does). This
      needs to be cleaned up.
      Closes https://github.com/facebook/rocksdb/pull/2345
      
      Differential Revision: D5210732
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 78653bd95a35cd1e831e555e0e57bdfd695355a4
      499ebb3a
  14. 28 Apr, 2017 1 commit
  15. 25 Apr, 2017 1 commit
    • Andrew Kryczka's avatar
      Reunite checkpoint and backup core logic · e5e545a0
      Andrew Kryczka authored
      Summary:
      These code paths forked when checkpoint was introduced by copy/pasting the core backup logic. Over time they diverged and bug fixes were sometimes applied to one but not the other (like fix to include all relevant WALs for 2PC), or it required extra effort to fix both (like fix to forge CURRENT file). This diff reunites the code paths by extracting the core logic into a function, CreateCustomCheckpoint(), that is customizable via callbacks to implement both checkpoint and backup.
      
      Related changes:
      
      - flush_before_backup is now forcibly enabled when 2PC is enabled
      - Extracted CheckpointImpl class definition into a header file. This is so the function, CreateCustomCheckpoint(), can be called by internal rocksdb code but not exposed to users.
      - Implemented more functions in DummyDB/DummyLogFile (in backupable_db_test.cc) that are used by CreateCustomCheckpoint().
      Closes https://github.com/facebook/rocksdb/pull/1932
      
      Differential Revision: D4622986
      
      Pulled By: ajkr
      
      fbshipit-source-id: 157723884236ee3999a682673b64f7457a7a0d87
      e5e545a0
  16. 20 Apr, 2017 1 commit
    • Andrew Kryczka's avatar
      Limit backups opened · df74b775
      Andrew Kryczka authored
      Summary:
      This was requested by a customer who wants to proactively monitor whether any valid backups are available. The existing performance was poor because Open() serially reads every small meta-file (one per backup), which was slow on HDFS.
      
      Now we only read the minimum number of meta-files to find `max_valid_backups_to_open` valid backups. The customer mentioned above can just set it to one.
      Closes https://github.com/facebook/rocksdb/pull/2151
      
      Differential Revision: D4882564
      
      Pulled By: ajkr
      
      fbshipit-source-id: cb0edf9e8ac693e4d5f24902e725a011ed8c0c2f
      df74b775
  17. 06 Apr, 2017 1 commit
  18. 04 Apr, 2017 1 commit
  19. 31 Mar, 2017 1 commit
  20. 23 Feb, 2017 1 commit
    • Andrew Kryczka's avatar
      Gracefully handle previous backup interrupted · 5040414e
      Andrew Kryczka authored
      Summary:
      As the last step in backup creation, the .tmp directory is renamed omitting the .tmp suffix. In case the process terminates before this, the .tmp directory will be left behind. Even if this happens, we want future backups to succeed, so I added some checks/cleanup for this case.
      Closes https://github.com/facebook/rocksdb/pull/1896
      
      Differential Revision: D4597323
      
      Pulled By: ajkr
      
      fbshipit-source-id: 48900d8
      5040414e
  21. 07 Feb, 2017 1 commit
    • Dmitri Smirnov's avatar
      Windows thread · 0a4cdde5
      Dmitri Smirnov authored
      Summary:
      introduce new methods into a public threadpool interface,
      - allow submission of std::functions as they allow greater flexibility.
      - add Joining methods to the implementation to join scheduled and submitted jobs with
        an option to cancel jobs that did not start executing.
      - Remove ugly `#ifdefs` between pthread and std implementation, make it uniform.
      - introduce pimpl for a drop in replacement of the implementation
      - Introduce rocksdb::port::Thread typedef which is a replacement for std::thread.  On Posix Thread defaults as before std::thread.
      - Implement WindowsThread that allocates memory in a more controllable manner than windows std::thread with a replaceable implementation.
      - should be no functionality changes.
      Closes https://github.com/facebook/rocksdb/pull/1823
      
      Differential Revision: D4492902
      
      Pulled By: siying
      
      fbshipit-source-id: c74cb11
      0a4cdde5
  22. 02 Jan, 2017 1 commit
  23. 15 Dec, 2016 1 commit
    • Andrew Kryczka's avatar
      Fail BackupEngine::Open upon meta-file read error · 83f9a6fd
      Andrew Kryczka authored
      Summary:
      We used to treat any failure to read a backup's meta-file as if the backup were corrupted; however, we should distinguish corruption errors from errors in the backup Env. This fixes an issue where callers would get inconsistent results from GetBackupInfo() if they called it on an engine that encountered Env error during initialization. Now we fail Initialize() in this case so callers cannot invoke GetBackupInfo() on such engines.
      Closes https://github.com/facebook/rocksdb/pull/1654
      
      Differential Revision: D4318573
      
      Pulled By: ajkr
      
      fbshipit-source-id: f7a7c54
      83f9a6fd
  24. 17 Nov, 2016 1 commit
  25. 15 Sep, 2016 1 commit
  26. 10 Jun, 2016 1 commit
    • Wanning Jiang's avatar
      Backup Options · 56887f6c
      Wanning Jiang authored
      Summary: Backup options file to private directory
      
      Test Plan:
      backupable_db_test.cc, BackupOptions
      	   Modify DB options by calling OpenDB for 3 times. Check the latest options file is in the right place. Also check no redundent files are backuped.
      
      Reviewers: andrewkr
      
      Reviewed By: andrewkr
      
      Subscribers: leveldb, dhruba, andrewkr
      
      Differential Revision: https://reviews.facebook.net/D59373
      56887f6c
  27. 04 Jun, 2016 1 commit
    • Uddipta Maity's avatar
      Adding support for sharing throttler between multiple backup and restores · 1147e5b0
      Uddipta Maity authored
      Summary:
      Rocksdb backup and restore rate limiting is currently done per backup/restore.
      So, it is difficult to control rate across multiple backup/restores. With this
      change, a throttler can be provided. If a throttler is provided, it is used.
      Otherwise, a new throttler is created based on the actual rate limits specified
      in the options.
      
      Test Plan: Added unit tests
      
      Reviewers: ldemailly, andrewkr, sdong
      
      Reviewed By: andrewkr
      
      Subscribers: igor, yiwu, andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D56265
      1147e5b0
  28. 20 May, 2016 1 commit
  29. 12 May, 2016 1 commit
  30. 11 May, 2016 1 commit
    • Andrew Kryczka's avatar
      Isolate db env and backup Env in unit tests · e61ba052
      Andrew Kryczka authored
      Summary:
      - Used ChrootEnv so the database and backup Envs are isolated in the filesystem.
      - Removed DifferentEnvs test since now every test uses different Envs
      
      Depends on D57543
      
      Test Plan:
      - ran backupable_db_test
      - verified backupable_db_test now catches the bug when D57159 is backed out (this bug previously passed through the test cases, which motivated this change)
      
      Reviewers: sdong, lightmark, IslamAbdelRahman
      
      Reviewed By: IslamAbdelRahman
      
      Subscribers: andrewkr, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D57615
      e61ba052
  31. 27 Apr, 2016 1 commit
    • Islam AbdelRahman's avatar
      Fix BackupableDBTest · eb739808
      Islam AbdelRahman authored
      Summary: Fix BackupableDBTest.NoDoubleCopy and BackupableDBTest.DifferentEnvs by mocking the db files in db_env instead of backup_env_
      
      Test Plan: make check -j64
      
      Reviewers: sdong, andrewkr
      
      Reviewed By: andrewkr
      
      Subscribers: andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D57273
      eb739808
  32. 16 Apr, 2016 1 commit
    • sdong's avatar
      Fix backupable_db_test test cases that can't run by itself · cea8ed97
      sdong authored
      Summary:
      Several of backupable_db_test fails if running standalone, because of directory missing. Fix it by:
      (1) garbage collector skips shared directory if it doesn't exit
      (2) BackupableDBTest.Issue921Test to create the parent directory of the backup directory fist.
      
      Test Plan: Run the tests individually and make sure they pass
      
      Subscribers: leveldb, andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D56829
      cea8ed97
  33. 02 Apr, 2016 1 commit
    • Uddipta Maity's avatar
      Rocksdb backup can store optional application specific metadata · b55e2165
      Uddipta Maity authored
      Summary:
      Rocksdb backup engine maintains metadata about backups in separate files. But,
      there was no way to add extra application specific data to it. Adding support
      for that.
      In some use cases, applications decide to restore a backup based on some
      metadata. This will help those cases to cheaply decide whether to restore or
      not.
      
      Test Plan:
      Added a unit test. Existing ones are passing
      
      Sample meta file for BinaryMetadata test-
      
      ```
      
      1459454043
      0
      metadata 6162630A64656600676869
      2
      private/1/MANIFEST-000001 crc32 1184723444
      private/1/CURRENT crc32 3505765120
      
      ```
      
      Reviewers: sdong, ldemailly, andrewkr
      
      Reviewed By: andrewkr
      
      Subscribers: andrewkr, dhruba, ldemailly
      
      Differential Revision: https://reviews.facebook.net/D56007
      b55e2165
  34. 11 Mar, 2016 1 commit
    • Andrew Kryczka's avatar
      Cleanup stale manifests outside of full purge · d9620239
      Andrew Kryczka authored
      Summary:
      - Keep track of obsolete manifests in VersionSet
      - Updated FindObsoleteFiles() to put obsolete manifests in the JobContext for later use by PurgeObsoleteFiles()
      - Added test case that verifies a stale manifest is deleted by a non-full purge
      
      Test Plan:
        $ ./backupable_db_test --gtest_filter=BackupableDBTest.ChangeManifestDuringBackupCreation
      
      Reviewers: IslamAbdelRahman, yoshinorim, sdong
      
      Reviewed By: sdong
      
      Subscribers: andrewkr, leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D55269
      d9620239
  35. 04 Mar, 2016 1 commit
  36. 02 Mar, 2016 1 commit
    • Andrew Kryczka's avatar
      Get file attributes in bulk for VerifyBackup and CreateNewBackup · f8e90e87
      Andrew Kryczka authored
      Summary:
      For VerifyBackup(), backup files can be spread across "shared/",
      "shared_checksum/", and "private/" subdirectories, so we have to
      bulk get all three.
      
      For CreateNewBackup(), we make two separate bulk calls: one for the
      data files and one for WAL files.
      
      There is also a new helper function, ExtendPathnameToSizeBytes(),
      that translates the file attributes vector to a map. I decided to leave
      GetChildrenFileAttributes()'s (from D53781) return type as vector to
      keep it consistent with GetChildren().
      
      Depends on D53781.
      
      Test Plan:
      verified relevant unit tests
      
        $ ./backupable_db_test
      
      Reviewers: IslamAbdelRahman, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D53919
      f8e90e87
  37. 01 Mar, 2016 1 commit
    • Andrew Kryczka's avatar
      Handle concurrent manifest update and backup creation · 69c471bd
      Andrew Kryczka authored
      Summary:
      Fixed two related race conditions in backup creation.
      
      (1) CreateNewBackup() uses DB::DisableFileDeletions() to prevent table files
      from being deleted while it is copying; however, the MANIFEST file could still
      rotate during this time. The fix is to stop deleting the old manifest in the
      rotation logic. It will be deleted safely later when PurgeObsoleteFiles() runs
      (can only happen when file deletions are enabled).
      
      (2) CreateNewBackup() did not account for the CURRENT file being mutable.
      This is significant because the files returned by GetLiveFiles() contain a
      particular manifest filename, but the manifest to which CURRENT refers can
      change at any time. This causes problems when CURRENT changes between the call
      to GetLiveFiles() and when it's copied to the backup directory. To workaround this, I
      manually forge a CURRENT file referring to the manifest filename returned in
      GetLiveFiles().
      
      (2) also applies to the checkpointing code, so let me know if this approach is
      good and I'll make the same change there.
      
      Test Plan:
      new test for roll manifest during backup creation.
      
      running the test before this change:
      
        $ ./backupable_db_test --gtest_filter=BackupableDBTest.ChangeManifestDuringBackupCreation
        ...
        IO error: /tmp/rocksdbtest-9383/backupable_db/MANIFEST-000001: No such file or directory
      
      running the test after this change:
      
        $ ./backupable_db_test --gtest_filter=BackupableDBTest.ChangeManifestDuringBackupCreation
        ...
        [ RUN      ] BackupableDBTest.ChangeManifestDuringBackupCreation
        [       OK ] BackupableDBTest.ChangeManifestDuringBackupCreation (2836 ms)
      
      Reviewers: IslamAbdelRahman, anthony, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D54711
      69c471bd
  38. 25 Feb, 2016 1 commit
    • Andrew Kryczka's avatar
      Reorder instance variables in backup test for proper destruction order · 69c98f04
      Andrew Kryczka authored
      Summary:
      As titled. This fixes the tsan error caused by logger_ being used in
      backup_engine_'s destructor. It does not fix the transient unit test failure,
      which is caused by MANIFEST file changing while backup is happening.
      
      Test Plan:
      verified the tsan error no longer happens on either success or
      failure.
      
        $ COMPILE_WITH_TSAN=1 make -j32 backupable_db_test
        $ while ./backupable_db_test --gtest_filter=BackupableDBTest.CorruptionsTest ; do : ; done
      
      Reviewers: sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D54669
      69c98f04