Jun 11

Solution patch to VMware ESXi 5.5 Update One NFS connectivity issues

OK this is really important,  when Update one for ESXi 5.5 was released, customers connecting to datastores via the NFS protocol were experiencing intermittent connectivity and in particular All Paths Down or APD.

What is ‘All Paths Down’?

For those who are unfamiliar with this “condition”, APD occurs on an ESXi host when a storage device is removed in an uncontrolled manner from the host, or the device simply fails, and the VMkernel essentially panics. This results in the datastore not accepting any I/O from the virtual machines for the duration of the APD condition. The result is that Windows virtual machines begin BlueScreening and filesystems becoming read only for Linux VMS. This can be permanently or temporarily, either way bad stuff happens.

but alas, a patch has just been released for this.

Patch notes

This patch resolves the following issues:

PR1242103: When you run ESXi 5.5 Update 1, the ESXi host intermittently loses connectivity to NFS storage and an All Paths Down (APD) condition to NFS volumes is observed. During the duration of the APD condition and after, the array still responds to ping and the netcat tests are also successful. There is no evidence to indicate a physical network or a NFS storage array issue.

  • Entries similar to the following are logged in the vobd.log file for volume named 12345678-abcdefg0 as an example:

    YYYY-04-01T14:35:08.075Z: [APDCorrelator] 9414268686us: [esx.problem.storage.apd.start] Device or filesystem with identifier [12345678-abcdefg0] has entered the All Paths Down state.
    YYYY-04-01T14:36:55.274Z: No correlator for vob.vmfs.nfs.server.disconnect
    YYYY-04-01T14:36:55.274Z: [vmfsCorrelator] 9521467867us: [esx.problem.vmfs.nfs.server.disconnect] 12345678-abcdefg0-0000-000000000000 NFS-DS1
    YYYY-04-01T14:37:28.081Z: [APDCorrelator] 9553899639us: [vob.storage.apd.timeout] Device or filesystem with identifier [12345678-abcdefg0] has entered the All Paths Down Timeout state after being in the All Paths Down state for 140 seconds. I/Os will now be fast failed.

Where can I get it?