There has been a number of improvements made to the base hypervisor, probably notably around Auto Deploy.
For those who are not familiar with Auto Deploy who have not had a chance to play with it, what it essentially gives you is the power to rapidly deploy new vSphere hosts into your environment and bring them up to a specified patch level you define. Putting this in a cloud-computing context and in particular infrastructure as service is a BIG step forward – This is becoming more and more automated, time to market is a key metric when measuring how well your cloud computing business is running.
Without further ado, let’s dive into the new announcements.
The vSphere 5.1 has undergone a number of enhancements including:
- Local ESXi shell users automatically get full shell access. It is no longer required to share a single root account enhancing your audit-trail.
- SNMPv3 is now supported bringing authentication and SSL support to the host-monitoring infrastructure.
- Auto Deploy now offers a stateless caching mode that caches the boot image in order to permit a host to boot using the last good-known image should the Auto Deploy infrastructure be unavailable. I see this as a potential turning point in the adoption of Auto Deploy. There is a requirement for a dedicated boot device for this feature to function.
- Auto Deploy can now be leveraged for stateful installs. This may be beneficial to accounts that already have PXE in place but want to continue using traditional stateful methods.
- vCenter Server is the primary point of management for most environments and it too has been enhanced and tuned for this new release. Some of the new additions include:
- The vSphere Web Client is now the primary point of management. It was noted during a session @ VMworld last week that the vSphere Client will no longer see development or receive new features.
- An interesting new feature of the Web Client is the ability to pause a task and resume it from the “Work in Progress” task section. This is helpful if you need to gather additional information to complete a task without cancelling it and starting over.
- The Web Client does NOT need to be installed on the same server as vCenter Server and you can scale-out your vCenter services across servers.
- Support for Open LDAP & NIS authentication using the Web Client (not the traditional vSphere client), this will make Linux-only environments happy.
- Single Sign-on. Read the PDF for more (the traditional vSphere Client is not supported).
- The Web Client can track multiple vCenter Servers and inventory objects using the updated Inventory Service so you can now manage multiple vCenter environments from a single pane of glass without using linked-mode unless you wish to share permissions and licenses.
Outside of the obvious scalability improvements (64vCPU’s, 256 pCPU’s, >1M IOPS) vSphere has undergone a number of refinements in order to improve performance and management;
vSphere can now attempt to reduce the memory overhead of a VM by swapping out the overhead memory reservation of each VM to disk. This can increase overall consolidation ratios and improve VM per host densities but it comes with the requirement that the swap file be manually created by the administrator to leverage this feature.
Use the following CLI command in your kickstart install or perform post-install.
esxcli sched swap system set -d true -n <datastore name>
If you have previously read and implemented the recommendations in the Best Practices for Performance Tuning of Latency-Sensitive Workloads in vSphere VMs technical white paper you will know that it can be a manual and administratively intensive process (outside of PowerCLI). vSphere 5.1 now offers a checkbox to enable the VM .VMX settings for you saving a number of manual steps.
The traditional vMotion and storage vMotion (svMotion) have been combined into one operation offering the ability to perform a vMotion of a VM that does NOT leverage common shared storage.
This means that two servers using direct attached storage (DAS) can vMotion a VM between them.
Consider this feature beneficial for migration scenarios but there is a catch! The svMotion operation occurs across the “Management Network” vmkernel interface.
So if you are using an HP BladeSystem/Virtual Connect infrastructure you may want to review your design if you have followed any of the Virtual Connect guides that say it is a “best practice” to use a 100Mbit Management FlexNIC. A 1GE Management interface is recommended and what I recommend.
While on the svMotion topic; vSphere 5.1 has changed from performing serial disk migrations of VMDK’s within a VM to a parallel method if the VMDK’s reside on distinct datastores, so lets take a look at Storage stuff
Storage is commonly the least understood topic and receives the least exciting but most useful features. I won’t cover the new disk format as it is primarily View related however there are other areas of improvement.
- High Availability will now restart VM’s that encounter a Permanent Device Loss (PDL) state (5.0 U1 did too). Please understand that a PDL is much less common than an All-Paths-Down (APD) state where HA does NOT respond but we may yet get there in the future. HA responding to PDL’s is a step in the right direction.
- 16Gb FC HBA’s are now supported. Where vSphere 5.0 supported 16Gb HBA’s in 8Gb mode, vSphere 5.1 enables the full 16Gb throughput. An interested tidbit confirmed by Emulex reps on the VMworld show floor indicated the leveraging a 16Gb HBA in 8Gb mode will outperform a similar 8Gb HBA due to the 16Gb HBA ASIC improvements in I/O processing.
- SMART monitoring has also been introduce using esxcli (but NOT vCenter) in order to examine disk error characteristics. This has been targeted for SSD monitoring but it can only be leveraged using the command line.
- The ability to automatically detect and set the congestion threshold to the 90% percent throughput mark. This is done using the SIOC injector that measure latency against throughput and can dynamically tune the threshold to the characteristics of the underlying disks. It is very much a “set it and forget it” feature that dynamically adjusts to a changing environment.
- Additionally, the underlying SIOC injector has also undergone improvement in where it measures the latency characteristics. Instead of a leveraging the datastore latency metric which effectively ignores the storage stack above the datastore level, the new SIOC injector leverages a new value coined VmObservedLatency that measures higher up the virtualized storage stack as detected by the actual VM’s in order to more accurately reflect the performance characteristics experienced by the application or user.
- The SIOC injector now also has the ability to detect common underlying disk striping configurations in order to avoid svMotioning VM’s across datastores backed by the same spindles on the back-end of the array. The VMware vSphere Storage DRS Interoperability white paper includes recommendations when and when _not_ to enable I/O load-balancing in a SDRS cluster but obviously these recommendations were not always being followed.
Networking is another interesting topic and the vast majority of improvements are focused the vSphere Distributed Switch (vDS). I should call out that if you are using Enterprise Plus licensing you should take a serious look at the vDS as the classic vSS (vSphere Standard Switch) is unlikely to evolve in the future effectively at its max feature potential.
- Network Health Check (VLAN, MTU and failover team validation) is a very welcome addition as I have seen customer environments encounter HA events (and unplanned VM downtime) due to misconfigured teaming and/or switchports. You want this feature!!
- vDS Management network rollback and recovery is the catalyst that will calm the fears of a cluster-wide failure due to accidental misconfiguration of a fully vDS design. If a change occurs and the management network loses connectivity the vDS will automatically rollback the last change(s). A very impressive live-demo of this feature was shown at VMworld. This is one of the last hurdles for what I see as the beginning of majority support for the vDS instead of the vSS.
- vDS Distributed Port Auto Expand – while a nice touch in itself the PDF has some helpful information on selecting the best vDS “Port Binding” method for your environment. The Static Binding method is the default and likely best candidate for the majority of environments out there. Consider a traditional server has a fixed cabling configuration into a physical switch, the cables do not move. This is akin to static binding, a fixed configuration that does not depend on vCenter to PowerOn VM’s.
- Dynamic Binding is depreciated.
- Ephemeral is a “plug-and-pray” method with no fixed binding but you therefore lose vCenter performance history and stats and increase the troubleshooting complexity. Not recommended for most.
- There are a number of other great features but I want to point out one last new feature that mitigates a risk that has been hiding under the radar across most environments. The BPDU filter. If your VMware environment is connected to a network that leverages the Spanning Tree Protocol (STP) then prior to vSphere 5.1 it is possible to take host VM networking offline if you follow VMware’s own switchport configuration guidelines.
- VMware recommends that all hosts should NOT participate in STP by enabling PortFast and BPDU Guard that prevents accidental layer 2 bridging loops from causing a network disruption. The problem is that a VM with two or more vNIC’s attached could potentially bridge interfaces and introduce a loop. When this loop is introduced, BPDU packets are sent out and a properly configured switch would err-disable the port attached taking the VM offline and eventually all other vmnic’s attached to the switch by nature of VMware failover capabilities. Consider this a denial of service risk.
- Now with the vSphere v5.1 you can enable this advanced feature, Net.BlockGuestBPDU, which is disabled by default on both the vSS & vDS. This is the only feature that I can see that has made its way into the vSS and I would highly recommend that any environment using STP and no intention to leverage VM-based bridging by design enable this setting.