Skip to content
  • Ahelenia Ziemiańska's avatar
    zed: protect against wait4()/fork() races to the global PID table · 3bd6b0e0
    Ahelenia Ziemiańska authored
    
    
    This can be very easily triggered by adding a sleep(1) before
    the wait4() on a PID-starved system: the reaper thread would wait
    for a child before its entry appeared, letting old entries accumulate:
    
      Invoking "all-debug.sh" eid=3021 pid=391
      Finished "(null)" eid=0 pid=391 time=0.002432s exit=0
      Invoking "all-syslog.sh" eid=3021 pid=336
      Finished "(null)" eid=0 pid=336 time=0.002432s exit=0
      Invoking "history_event-zfs-list-cacher.sh" eid=3021 pid=347
      Invoking "all-debug.sh" eid=3022 pid=349
      Finished "history_event-zfs-list-cacher.sh" eid=3021 pid=347
                                                  time=0.001669s exit=0
      Finished "(null)" eid=0 pid=349 time=0.002404s exit=0
      Invoking "all-syslog.sh" eid=3022 pid=370
      Finished "(null)" eid=0 pid=370 time=0.002427s exit=0
      Invoking "history_event-zfs-list-cacher.sh" eid=3022 pid=391
      avl_find(tree, new_node, &where) == NULL
      ASSERT at ../../module/avl/avl.c:641:avl_add()
      Thread 1 "zed" received signal SIGABRT, Aborted.
    
    By employing this wider lock, we atomise [wait, remove] and [fork, add]:
    slowing down the reaper thread now just causes some zombies
    to accumulate until it can get to them
    
    Reviewed-by: default avatarBrian Behlendorf <behlendorf1@llnl.gov>
    Reviewed-by: default avatarDon Brady <don.brady@delphix.com>
    Signed-off-by: default avatarAhelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
    Closes #11963
    Closes #11965
    3bd6b0e0