The up-down monitor on our new metricsinfra prometheus flaps often because the node exporter gets wedged. Based on repeated investigations, this seems to be caused by NFS hangs on the client side.
Based on https://github.com/prometheus/node_exporter/issues/578 and later on https://github.com/prometheus/node_exporter/pull/1166, it seems like the best upstream aims to do to fix this is cause the exporter to return 503s instead of blowing up the host during such events. That is a reasonable thing to do, but it also makes the host report down and stop reporting all of our metrics.
It seems like a good idea to monitor NFS at the host level instead of on the client to avoid wedging our entire monitoring setup when the somewhat frequent issue of NFS client hangs comes up.