Yesterday, VMware announced this years awardees with the prestigious vExpert 2013 title, an award that has been around since 2009 to recognise individuals on their contributions to the global virtualization and cloud community. The list is put together by VMware and in particular John Troyer (@JohnTroyer) and the VMware Social Media & Community Team- No easy task in my mind as there are a lot of great VMware practitioners and Evangelists out there.
Announcement link (Just in case you dont believe me 🙂
For me, this is my very first year being recognised in this category, and I am very humbled and honoured to be recognised among some of the great evangelists in this field – some who are personal friends of mine which makes it even more special.
Does it mean you’re now an expert?
No, not necessarily. The title is not based on what you know or how much you know. A great extract from the announcement page shows what it takes (and doesnt take) to be recognised:
“A vExpert not a technical certification or even a general measure of VMware expertise. The judges selected people who were particularly engaged with their community and who had developed a substantial personal platform of influence in those communities. There were a lot of very smart, very accomplished people, even VCDXs, that weren’t named as vExpert this year” (Retrieved from http://blogs.vmware.com/vmtn/2013/05/vexpert-2013-awardees-announced.html)
Lastly, I would also like to extend my congratulations out to all of the other vExperts for 2013. Looking forward to meeting some of you over the next year.
Time for a technical deep dive – As you may or may not know, HP 3PAR boasts a sexy array-based software feature set. You’ve probably heard me rave on about it in other posts and podcasts I have done. I’ve worn this fan-boy cap for a while now and spoken about our leading thin suite.
Some 3PAR software features perhaps don’t get as much limelight as our thinning capability or wide striping but they are extremely supportive and tell a great benefit story, especially in the area of High Availability (HA) or Business Continuity (BC). These features are array based and are called peer persistence, peer motion, persistent ports and persistent cache.
The names appear similar but they mean different solutions so let’s take a look at some of these concepts and how they work particularly with the vSphere platform—starting with peer persistence.
Spotlight on HP 3PAR Peer Persistence
Another way to look at peer persistence is to tie it back to something that has been around for a while in the virtual world – VMware vMotion technology.
So what vMotion does for local site workloads, peer persistence does for geographically disparate data centres, meaning it offers transparent site switchover without the application knowing it has happened. It provides a HA configuration between these sites in a metro-cluster configuration.
Peer persistence allows you to extend this coverage across two sites, enabling use of your environment and load balancing your virtual machines (VMs) across disparate arrays. Truly a new way to think of a federated data center in this context IMO. Note, this implies an active\active configuration at the array level but not at the volume level, which is active\passive which infers hosts paths to the primary volume and secondary volume
Transparent switchover and Zero RPO/RTO
What does it mean to offer “transparent site failover”? In the context of 3PAR peer persistence, it simply means that workloads residing in a VMware cluster can shift from site 1 to site 2 without downtime. This makes aggressive RPO\RTOs an achievable reality.
This is particular fit for mission-critical applications or services that may have this RTO/RPO of 0.
Note: I mention VMware for this as this particular technology only currently supports VMware hosts (vSphere 5.0+) and not vMSC.
How does it do it?
Peer persistence leverages HP 3PAR remote copy synchronous to manage the transfer of the remote from local site to remote site and gain acknowledgement back to the host operating system (vSphere in this instance). Today the switchover is a manual process executed via the 3PAR CLI (the automated process is coming later this year).
Working with VMware vSphere, this allows your ESXi cluster to virtually span across data centers. So in the above figure VMs are being serviced by HP StoreServ Storage A and other VMs are being serviced by HP StoreServ Storage B but all are existing in the same VMware data center. Moving VMs between sites would typically need a reset, but peer persistence removes this limitation by continuous copying or shadowing the VM on the remote volume via RC and switching over the volumes logically.
I’ll write the high level process with the above figure 1 in mind without any pretty pictures:
- Host can logically see both HP StoreServ Storage arrays by means of stretch fabric.
- VM resides in Site 1, on Volume A. Volume A’s partner (Volume B) is being presented to Host B in Site 2 and can be considered active but not primary.
- The primary and secondary volumes are exported using different Target Port Groups supported by persona 11 (VMware).
This presentation is possible via Asymmetric Logical Unit Access or ALUA allowing a SCSI device (Volume A in this instance) to be masked with same traits (Source WWN).
Provided this configuration exists, the process is:
- Controlled switchover is initiated by user manually via CLI on primary array. Using ‘setrcopygroup switchover <groupname>’
- IO from the host to the primary array is blocked and in flight IO is allowed to drain.
- The remote copy group is stopped and snapshots are taken on the primary array.
- The primary array target port group is changed to transition state.
- The primary array sends a remote failover request to the secondary array.
- The secondary array target port group is changed to transition state.
- The secondary array takes a recovery point snapshot.
- The secondary array remote copy group changes state to become primary-reversed (pri-rev). At this point the secondary volume will become read/write.
- The secondary target port group is changed to active state.
- The secondary array returns a failover complete message to the primary array.
- From here, The primary array target port group is changed to standby state and any blocked IO is returned to the host with the following sense error: “NOT READY, LOGICAL UNIT NOT ACCESSIBLE, TARGET PORT IN STANDBY STATE”
- The host will perform SCSI inquiry requests to detect what target port groups have changed and which paths are now active. Getting your host multipathing configuration is very important here!
- Volume B is now marked primary and hosts continues to access the volume via the same WWN as before. Host IO will now be serviced on the active path to the secondary array without the host application even knowing what happened!
What you need
Here’s your list:
- A WAN link between your two data centers should not have more than 2.6ms latency. This is important as remote copy synchronous needs to be able to send the write and wait for an acknowledgement within a given time frame.
- One thing to note is that previously, vMotion used to be supported only on networks with round-trip (RTT) time latencies of up to 5 ms but with VMware vSphere 5 introduced a new latency-aware Metro vMotion feature that increases the round-trip latency limit for vMotion networks from 5 ms to 10ms. The requirement for 5ms seems to be a thing of the past moving forward allowance cool things like spanned virtual data centres in this regard.
- The ESXi hosts must be configured using 3PAR host Persona 11.
- Host timeouts need to be less than 30 seconds. This time does not include the subsequent recover and reverse operations so leave some headroom.
- The WWNs of the volumes being replicated have to be the same. Therefore, ALUA is also a requirement
- From a 3PAR-licensing point of view, you need 3PAR remote copy synchronous, and Peer persistence of course which is licensed on a per array basis as well.
Peer persistence supportability
For now, HP 3PAR Peer persistence is only available for VMware clusters: vSphere 5.x and up. More platforms to be supported in the future.
So Peer Persistency is a HA solution that removes barriers traditionally found in physical data centers . This is not the single entity defining a virtual data centres, but simply acting as just one of the pillars supporting it allowing virtualization and storage to not be constrained by physical elements anymore like in the past. To this end, achieving a more aggressive BC plan is becoming more realistic.
For more on Peer persistence, please check out this service brief: HP 3PAR Peer Persistence—Achieve high availability in your multisite federated environment
Every major initiative for optimizing data center performance, decreasing TCO, increasing ROI, or maximizing productivity – including consolidation, virtualization, clouds, server upgrades, tiered storage, data analytics and BI tools – involves storage data migration.
Data has an incalculable value, and its loss can have significant impact. As Frost & Sullivan says in a recent Executive Brief, “one would expect that storage data migrations should be approached with the same attention a museum lavishes on a traveling Rembrandt exhibit.” To expand on this, in 2012 it was estimated that $8 billion dollars worldwide was spent in data migration services.
A research white paper published in December 2011 entitled “Data Migration – 2011″ by Philip Howard from Bloor Research shows the average cost for a data migration project is $875,000, so to extrapolate the value and criticality on these types of projects should be fairly straightforward. Overrunning project budget, or rolling back a failed migration due to lack of planning, are normal occurrences – in fact this same study proposes that the average cost of a project overrunning its budget is $268,000.00 – approximately 30% of the average cost of a data migration project.
Between 1999 and 2007, 84% of data migrations went over budget and overtime; this is astronomical and costly – and it can get very tricky when trying to pinpoint just why did the data migration project go over budget and over time. More often than not, it is usually down to lack of experience and planning (and I do believe that experience and planning should come in the same sentence.)
And there are potentially serious risks involved. Recent studies show that migration projects nearly always have unwanted surprises: 34% of migrations have data missed or loss, a further 38% have some form of data corruption.
And probably the biggest risk associated with migrations is that 64% of migration projects have unexpected outage/downtime. Now, tie this back to a research paper put forward by Vision Solutions in 2011, which shows that the typical cost of downtime can reach nearly 6.5 million dollars per hour for some in the Brokerage service industry, and up to 2.8 million dollars per hour for those in the Energy Industry. To really understand this and put it in context, let’s have a look at some of the reasons why we migrate.
Why do we migrate data?
The migration of data isn’t typically something an IT manager or CIO does for fun, end of the day it will cost money and time. Ageing infrastructure or the need for a particular technology feature that’s not available on the current infrastructure are just a couple of the reasons why people migrate. In my experience, it’s all of the above. CIO’s are constantly (or should be) looking at new and innovative ways to reduce footprint and drive down environmental costs, such as data centre space and power, as well as expose newer and greater technological advancements within a given product set. Newer product releases for infrastructure seldom take a step back when it comes to form factor and power draw.
So do customers who perform migrations achieve their overall goals? Not exactly…As I mentioned above, those undertaking DIY migrations typically have surprises which result in a heavier investment in staff to try and remediate those surprises, subsequently resulting in a project budget that is exceeded. Yes, 54% of the time a project budget is overrun due to these challenges but I’m not here to throw stats at you – I’m here to raise the awareness that if not properly planned and executed, your data migration project (as big or small as it may be) will run into at least one of these surprises.
HP Data Migration Services can help you address those challenges and risks. Each data migration project has astorage and data migration project manager assigned to make sure everything goes smoothly. We understand that storage infrastructures are typically multivendor, which is why our service is vendor-agnostic. We work to keep costs down and help you avoid the common pitfalls and risks of data migration.
To learn more about the new HP Data Migration Service, check out this online presentation. You’ll learn about the typical project flow and your migration technology options. Data migration is not usually just a simple copy-and-paste exercise.
Read more about HP Storage Migration Consulting.
You can learn more about ways to ease the pain of data migration at HP Discover 2013.