Skip to content
  • Brian Behlendorf's avatar
    Fix dRAID self-healing short columns · 93c8e91f
    Brian Behlendorf authored
    
    
    When dRAID performs a normal read operation only the data columns
    in the raid map are read from disk.  This is enough information to
    calculate the checksum, verify it, and return the needed data to the
    application.  It's only in the event of a checksum failure that the
    additional parity and any empty columns must be read since they are
    required for parity reconstruction.
    
    Reading these additional columns is handled by vdev_raidz_read_all()
    which calls vdev_draid_map_alloc_empty() to expand the raid_map_t
    and submit IOs for the missing columns.  This all works correctly,
    but it fails to account for any "short" columns.  These are data
    columns which are padded with a empty skip sector at the end.
    Since that empty sector is not needed for a normal read it's not
    read when columns is first read from disk.  However, like the parity
    and empty columns the skip sector is needed to perform reconstruction.
    
    The fix is to mark any "short" columns as never being read by clearing
    the rc_tried flag when expanding the raid_map_t.  This will cause
    the entire column to re-read from disk in the event of a checksum
    failure allowing the self-healing functionality to repair the block.
    
    Note that this only effects the self-healing feature because when
    scrubbing a pool the parity, data, and empty columns are all read
    initially to verify their contents.  Furthermore, only blocks which
    contain "short" columns would be effected, and only when the memory
    backing the skip sector wasn't already zeroed out.
    
    This change extends the existing redundancy_raidz.ksh test case to
    verify self-healing (as well as resilver and scrub).  Then applies
    the same test case to dRAID with a slightly modified version of
    the test script called redundancy_draid.ksh.  The unused variable
    combrec was also removed from both test cases.
    
    Reviewed-by: default avatarMatthew Ahrens <mahrens@delphix.com>
    Reviewed-by: default avatarMark Maybee <mark.maybee@delphix.com>
    Signed-off-by: default avatarBrian Behlendorf <behlendorf1@llnl.gov>
    Closes #12010
    93c8e91f