This is a two-part blog post that will provide you with an overview of Oracle ZFS and how it relates to VMware environments. The first part (this blog post) will introduce Oracle ZFS storage and discuss the architectural fundamentals and how they benefit VMware workloads. The second part will look at VMware specific support, integration and why this appliance proves to be an excellent platform to host your virtual workloads. With that, let’s introduce Oracle ZS!
What is Oracle ZFS
The Oracle ZFS Storage appliances are a set of highly scalable resilient storage systems that provide platform for mixed high-performing workload types. Built on SMP architecture designed to make full use of all CPU’s and threads, and using the ZFS file system which offers a multi layer caching architecture to run applications as quickly and efficiently as possible, the appliances support multi protocols ranging from file-level NFS and SMB, to block-level Fibre channel and iSCSI. Depending on your requirements, the appliances comes in two flavours – ZS3-2 and ZS4-4 with the key differences between these systems are predominantly hardware related (CPU cores, Max supported storage (Cache and disk) etc . Take a look at the below summary to identify the major differences:
The Virtual Architecture of Oracle ZFS
There may be a misconception that Oracle ZFS Storage Appliances are only designed for Oracle database workloads, given the unique capabilities customers gain, when combining Oracle storage, with the Oracle database subsequently allowing them to do some extended operations. Whilst this IS true, the misconception that other workloads do not work well on these appliances is NOT true. In fact given the nature and randomised workload of virtualisation and cloud environments, ZFS storage is a great choice for both Oracle and non-Oracle workloads. Why? Let’s start off by looking at the architecture of ZS appliances – Hybrid Storage Pools, Data integrity and performance RAID striping.
Hybrid Storage Pools
Hybrid Storage Pools (HSP) is a concept which describes the virtual pool layered over the whole collection of drives at the controller level. Each controller in a ZS owns this collection of drives which can be added to a virtual pool or HSP. From there, you can carve out block or file shares to your clients. Nice and easy!
End-to-end data integrity
Every block of data is checksummed which protects your data against silent data corruption, the file system employs a 256-bit checksum for every block. Instead of storing the checksum with the block itself, it stores the checksum in its parent block. Every parent block pointer contains the checksums for all its children blocks, so the entire pool can validate that the data is both accurate and recoverable.
RAID-Z -High-performance striping
Oracle ZFS Storage combines the capabilities of RAID volume management and a file system, which allows for intelligent decisions to be made about block placement, resilvering (RAID rebuilds), data repairs, etc. For example, if a disk needs to be replaced, only the ‘live’ data needs to be copied to the new drive which can reduce rebuild times. As another example, if a drive were to misbehave and start handing back corrupt data, the metadata kept within ZFS allows the system to identify, and correct problems on the fly transparent to the application.
Fundamentally, ZFS offers software based RAID called RAID-Z, which comes in three different flavours: Mirrored, RAID-Z1, RAID-Z2 and RAID-Z3:
Writes are divided and written in full across two or three drives depending on your redundancy requirements.
RAID-Z1 protects data against a single drive failure by storing data redundantly among multiple drives. RAID-Z is similar to standard RAID 5 does not have the write penalty that RAID5 encounters.
RAIDZ2 is similar to RAID 6 and offers double parity to tolerate multiple disk failures, and the performance is equivalent to RAIDZ1.
This type is similar to RAIDZ and RAIDZ2, but with a third parity point as an added protection. This allows up to three drive failures and the performance is similar to RAIDZ and RAIDZ2.
All raid types can be tailored depending on the workload, but overall the architecture is specifically designed for high-bandwidth and low-latency application requirements which is why workloads such as Splunk, Microsoft SQL, Oracle RAC, and other database types work so well!
But how do these relate to Virtualisation and Cloud environments? Why is it so beneficial? Lets explain in the next section.
Virtualisation and Private Cloud Workloads
Virtualisation and cloud workloads have fundamentally changed the type of stress placed on storage systems. Traditional storage architectures relied on disk spindles for performance (either conventional or flash) no longer meet the requirements that these new types of workloads need. Pooled shared resources means that CIOs and CTOs want to maximise the investment they make in technology by increasing the utilisation and efficiency of the infrastructure. This shift in mindset has influenced the rise of private cloud environments and converged infrastructure. Legacy storage infrastructure struggles with this new found type of workloads resulting in some companies retrofitting their architecture to meet the demands. So how do Oracle do it?
We include massive Dynamic random-access memory (DRAM) cache and an operating system that optimises its use, allowing up to 90 percent of I/O to come from the fastest possible medium which is actually faster than flash drives.
Serving Virtual workload IO’s via Hybrid Storage Pools
Back in 2008, Sun (Later acquired by Oracle) devised a way to serve storage out of DRAM resulting in faster response times for those tier-one application workloads. How does this happen?
When IO requests come in, our intelligent adaptive cache manages the IO from any workload type. This smart management results in up to 90% of the incoming IO requests getting served out of fast DRAM, this results in an extremely fast response time.
Once the IO has been served, the data remains in DRAM and is utilised on a MRU (Most Recently Used) or demoted to disk based on LFU (Less Frequently used) basis – The Adaptive Replacement Cache intelligently changes the cache based on the workload. So when data is requested from the ZFS, it first looks to the DRAM; if it is there, it can be retrieved extremely fast (typically in nanoseconds) and provided back to the application requesting it. The even cooler thing here is that there are multiple levels of acceleration within the ZFS storage stack – which determine how best to service the IO request. The first level is simply referred to as ARC or Adaptive Replacement Cache which resides in the DRAM module, the second level (generally if the data is not present in the first level) is L2ARC which resides on SSD – offering a slower response time but still as quick as you could expect from SSDs.
One might argue that there are various AFA models on the market today that offer similar functionality and smarts within a system which is correct, the difference here is that these SSD’s that typically serve the IO’s are not as close to the CPU as ARC hence a quicker response time.
SMP for performance.
Oracle ZFS runs on a Symmetric Multi-Processing (SMP) Operating System. A ZFS appliance is capable of running thousands of CPU threads simultaneously and have full access to all I/O devices, and are controlled by a single operating system instance that treats all processors equally. This prevents from running into any CPU bottlenecks that may impact storage and subsequently VM performance. (Holy Kahuna!)
Achieving higher rates of VM Density
To achieve a better ROI on hardware in virtual and cloud environments, Oracle ZS excels at achieving high VM density ratios How does it do this?
VM density for the purpose of this article can be defined as number of virtual machines housed on the datastore and does not relate to the number of virtual machines residing on server architecture.
Generally speaking, the main challenges in achieving high VM density are IO bottles necks, with DRAM-centric architecture employed in the ZFS subsystem this dramatically increases the number of VMs one can deploy per system, lowering costs and increasing efficiency overall! ZFS offer multiple storage profiles (Mirrored, Single Parity, Double-Parity and triple-parity) depending on your application performance and availability requirements.
Reducing storage footprint with De-duplication
Generally speaking, virtual workloads are great candidates for de-duplication as they share a lot of the same blocks within the VMDK’s of the virtual machines if you run multiple VM’s of the same type on the same datastore within ZFS. The de-duplication engine recognises this and stores only one copy of the identical block resulting in higher ROI and efficiency of your infrastructure. ZFS also offers compression to further reduce your storage footprint and maximise your investment.
Analysing using Oracle DTrace Analytics – What is it?
Oracle DTrace provide visual real-time storage analytics of your virtual environment allowing customers to better utilise storage resources by identifying, troubleshooting, and resolving storage bottlenecks. Sometimes, these bottlenecks result in customers throwing more hardware at the problem to keep up, by leveraging these smart analytics, these customers can essentially squeeze the most out of their hardware before investing in new hardware. This is offered in the base default configuration, and Oracle ZFS can provide even further statistics using Advanced DTrace.
To enable Advanced DTrace, in the ZFS storage appliance GUI browse to Configuration>Preferences and tick the box entitled “Make available advanced analytics statistics” however this should only be used for troubleshooting purposes and should not be left on. You can get a fairly sufficient level of analytic results without this turned on:
Once this has been ticked, it will enable an extended set of granular statistics available for the ZFS Storage appliance.
What does Oracle DTrace look like?
From the example below, you can get a gauge as to the level of detail it can provide of the resources, this is the dashboard view and in practice can drill further..
In terms of NFS connectivity, the following analytic counters are available for monitoring/troubleshooting etc:
- NFSv3 operations per second of type read broken down by latency
- NFSv3 operations per second of type write broken down by latency
- NFSv3 operations per second broken down by size
- NFSv3 operations per second broken down by type of operation
- NFSv3 operations per second of type read broken down by size
- NFSv3 operations per second of type write broken down by size
This also provides a good mechanism to show that all CPU resources in the system are being used (Thanks SMP!), another example is diving a level deeper and determining where these read\write IO’s are coming from, what particular files they are attributed to etc – Being able to observe these operations on a per-VM basis is extremely helpful.
Stay tuned for the next part in which I dive into some of the VMware related aspects of the ZFS appliance such as VASA, VAAI as well as the Storage Manager Plugin for VMware!