This project is mirrored from https://github.com/openzfs/zfs.git. Pull mirroring failed .
Last successful update .
  1. 28 Jan, 2023 2 commits
  2. 26 Jan, 2023 2 commits
    • Alexander Motin's avatar
      Prefetch on deadlists merge · dc5c8006
      Alexander Motin authored
      
      
      During snapshot deletion ZFS may issue several reads for each deadlist
      to merge them into next snapshot's or pool's bpobj.  Number of the dead
      lists increases with number of snapshots.  On HDD pools it may take
      significant time during which sync thread is blocked.
      
      This patch introduces prescient prefetch of required blocks for up to
      128 deadlists ahead.  Tests show reduction of time required to delete
      dataset with 720 snapshots with randomly overwritten file on wide HDD
      pool from 75-85 to 22-28 seconds.
      
      Reviewed-by: default avatarAllan Jude <allan@klarasystems.com>
      Reviewed-by: default avatarBrian Behlendorf <behlendorf1@llnl.gov>
      Signed-off-by: default avatarAlexander Motin <mav@FreeBSD.org>
      Sponsored by:	iXsystems, Inc.
      Issue #14276 
      Closes #14402
      dc5c8006
    • Brian Behlendorf's avatar
      Improve resilver ETAs · c85ac731
      Brian Behlendorf authored
      
      
      When resilvering the estimated time remaining is calculated using
      the average issue rate over the current pass.  Where the current
      pass starts when a scan was started, or restarted, if the pool
      was exported/imported.
      
      For dRAID pools in particular this can result in wildly optimistic
      estimates since the issue rate will be very high while scanning
      when non-degraded regions of the pool are scanned.  Once repair
      I/O starts being issued performance drops to a realistic number
      but the estimated performance is still significantly skewed.
      
      To address this we redefine a pass such that it starts after a
      scanning phase completes so the issue rate is more reflective of
      recent performance.  Additionally, the zfs_scan_report_txgs
      module option can be set to reset the pass statistics more often.
      
      Reviewed-by: default avatarAkash B <akash-b@hpe.com>
      Reviewed-by: default avatarTony Hutter <hutter2@llnl.gov>
      Signed-off-by: default avatarBrian Behlendorf <behlendorf1@llnl.gov>
      Closes #14410 
      c85ac731
  3. 25 Jan, 2023 4 commits
  4. 24 Jan, 2023 4 commits
    • David Hedberg's avatar
      Wait for txg sync if the last DRR_FREEOBJECTS might result in a hole · 37a27b43
      David Hedberg authored
      
      
      If we receive a DRR_FREEOBJECTS as the first entry in an object range,
      this might end up producing a hole if the freed objects were the
      only existing objects in the block.
      
      If the txg starts syncing before we've processed any following
      DRR_OBJECT records, this leads to a possible race where the backing
      arc_buf_t gets its psize set to 0 in the arc_write_ready() callback
      while still being referenced from a dirty record in the open txg.
      
      To prevent this, we insert a txg_wait_synced call if the first
      record in the range was a DRR_FREEOBJECTS that actually
      resulted in one or more freed objects.
      
      Reviewed-by: default avatarBrian Behlendorf <behlendorf1@llnl.gov>
      Signed-off-by: default avatarDavid Hedberg <david.hedberg@findity.com>
      Sponsored by: Findity AB
      Closes #11893
      Closes #14358 
      37a27b43
    • Richard Yao's avatar
      Reject streams that set ->drr_payloadlen to unreasonably large values · 73968def
      Richard Yao authored
      
      
      In the zstream code, Coverity reported:
      
      "The argument could be controlled by an attacker, who could invoke the
      function with arbitrary values (for example, a very high or negative
      buffer size)."
      
      It did not report this in the kernel. This is likely because the
      userspace code stored this in an int before passing it into the
      allocator, while the kernel code stored it in a uint32_t.
      
      However, this did reveal a potentially real problem. On 32-bit systems
      and systems with only 4GB of physical memory or less in general, it is
      possible to pass a large enough value that the system will hang. Even
      worse, on Linux systems, the kernel memory allocator is not able to
      support allocations up to the maximum 4GB allocation size that this
      allows.
      
      This had already been limited in userspace to 64MB by
      `ZFS_SENDRECV_MAX_NVLIST`, but we need a hard limit in the kernel to
      protect systems. After some discussion, we settle on 256MB as a hard
      upper limit. Attempting to receive a stream that requires more memory
      than that will result in E2BIG being returned to user space.
      
      Reported-by: Coverity (CID-1529836)
      Reported-by: Coverity (CID-1529837)
      Reported-by: Coverity (CID-1529838)
      Reviewed-by: default avatarBrian Behlendorf <behlendorf1@llnl.gov>
      Signed-off-by: default avatarRichard Yao <richard.yao@alumni.stonybrook.edu>
      Closes #14285 
      73968def
    • rob-wing's avatar
      Configure zed's diagnosis engine with vdev properties · 69f024a5
      rob-wing authored
      
      
      Introduce four new vdev properties:
          checksum_n
          checksum_t
          io_n
          io_t
      
      These properties can be used for configuring the thresholds of zed's
      diagnosis engine and are interpeted as <N> events in T <seconds>.
      
      When this property is set to a non-default value on a top-level vdev,
      those thresholds will also apply to its leaf vdevs. This behavior can be
      overridden by explicitly setting the property on the leaf vdev.
      
      Note that, these properties do not persist across vdev replacement. For
      this reason, it is advisable to set the property on the top-level vdev
      instead of the leaf vdev.
      
      The default values for zed's diagnosis engine (10 events, 600 seconds)
      remains unchanged.
      
      Reviewed-by: default avatarTony Hutter <hutter2@llnl.gov>
      Reviewed-by: default avatarBrian Behlendorf <behlendorf1@llnl.gov>
      Reviewed-by: default avatarRyan Moeller <ryan@iXsystems.com>
      Reviewed-by: default avatarAllan Jude <allan@klarasystems.com>
      Signed-off-by: default avatarRob Wing <rob.wing@klarasystems.com>
      Sponsored-by: Seagate Technology LLC
      Closes #13805 
      69f024a5
    • Richard Yao's avatar
      free_blocks(): Fix reports from 2016 PVS Studio FreeBSD report · f091db92
      Richard Yao authored
      In 2016, the authors of PVS Studio ran it on the FreeBSD kernel, which
      identified a number of bugs / cleanup opportunities in the FreeBSD ZFS kernel
      code. A few of them persist to the present day:
      
      https://reviews.freebsd.org/D5245
      
      
      
      Note that the scan was done against
      freebsd/freebsd-src@46763fd4ca8a37f836c9bf2333f9d687509278f3.
      
      In particular, we have the following in free_blocks():
      
      \sys\cddl\contrib\opensolaris\uts\common\fs\zfs\dnode_sync.c (174): error V547: Expression '__left >= __right' is always true. Unsigned type value is always >= 0.
      \sys\cddl\contrib\opensolaris\uts\common\fs\zfs\dnode_sync.c (171): error V634: The priority of the '*' operation is higher than that of the '<<' operation. It's possible that parentheses should be used in the expression.
      \sys\cddl\contrib\opensolaris\uts\common\fs\zfs\dnode_sync.c (175): error V547: Expression '__left >= __right' is always true. Unsigned type value is always >= 0.
      
      A couple of assertions accidentally typecast the arguments they check to
      unsigned in such a way that the result is always true. Also, parentheses
      are missing around `1<<epbs` in `(db->db_blkid * 1<<epbs)`. This works
      out to be okay due to multiplication not caring what order of operations
      we use, but it is better to fix it to be `(db->db_blkid << epbs)`.
      
      A few of the function local variables probably never should have been
      32-bit in the first place, so we make them 64-bit. We also replace the
      existing assertions with additional assertions to ensure that 64-bit
      unsigned arithmetic is safe.
      
      Reviewed-by: default avatarAlexander Motin <mav@FreeBSD.org>
      Reviewed-by: default avatarBrian Behlendorf <behlendorf1@llnl.gov>
      Signed-off-by: default avatarRichard Yao <richard.yao@alumni.stonybrook.edu>
      Closes #14407 
      f091db92
  5. 21 Jan, 2023 5 commits
  6. 20 Jan, 2023 1 commit
  7. 18 Jan, 2023 8 commits
  8. 14 Jan, 2023 1 commit
    • Richard Yao's avatar
      Linux ppc64le ieee128 compat: Do not redefine __asm on external headers · d27c7ba6
      Richard Yao authored
      
      
      There is an external assembly declaration extension in GNU C that glibc
      uses when building with ieee128 floating point support on ppc64le.
      Marking that as volatile makes no sense, so the build breaks.
      
      It does not make sense to only mark this as volatile on Linux, since if
      do not want the compiler reordering things on Linux, we do not want the
      compiler reordering things on any other platform, so we stop treating
      Linux specially and just manually inline the CPP macro so that we can
      eliminate it. This should fix the build on ppc64le.
      
      Tested-by: @gyakovlev 
      Reviewed-by: default avatarBrian Behlendorf <behlendorf1@llnl.gov>
      Signed-off-by: default avatarRichard Yao <richard.yao@alumni.stonybrook.edu>
      Closes #14308
      Closes #14384 
      d27c7ba6
  9. 13 Jan, 2023 10 commits
  10. 12 Jan, 2023 3 commits