Journey to vSAN – A Technical Adventure

Imagine having just configured some LUNs on your new PowerVault MD3820i.  Encryption key management has been configured, and 20 SEDs (self-encrypting drives) are spinning and ready for use with vSphere.  There are 4 SSDs in the PowerVault to use for caching that just need to be configured.

“What do you mean these cache disks aren’t supported for use with SEDs?”  That was the question posed to Dell Support after being told we could not add cache disks for our SED LUNs in this array.

The SSDs were in the PowerVault, and the device recognized them without an issue.  We just couldn’t configure them as cache. Even the documentation from Dell mentioned this configuration is not supported.  As it turns out, the SSDs we received were not SEDs and could not be used as cache in combination with SEDs.

While the SAN with SEDs would meet the corporate encryption requirements, losing caching capabilities meant the storage would no longer meet our IOPs needs.  And that is where our story begins.

After opening a support case and trying a firmware update with no change, the next call was made to our vendor’s account team.   We were sold a solution that wasn’t even supported, and we needed to do something different to meet our operational requirements.

The vendor team wanted to try and find some solid state SEDs to use for cache in the PowerVault before putting too much effort into engineering a different solution.  But their search for disks took a great deal of time.  After days and weeks of waiting with no drive to test, we pushed them to quote other options for a potential RMA of our current equipment while we began researching other options.

The boss suggested Infinio as a caching solution and asking the vendor to pay for it, so we began a trial.  Infinio is extremely simple to install in a vSphere environment, and their technical team is very good.  They walked us through the setup of the Infinio appliance, and we set it up to use host RAM for caching.  Then we did benchmarking using diskspd (the replacement for sqlio) inside different Windows VMs with Infinio activated to analyze the results.  While Infinio did a great job of caching disk read requests, it is not a write caching solution.  Despite the write cache increasing slightly, we were relying solely on the write cache of the MD3820i controllers when writing to disk.  The conclusion after testing was Infinio would not allow us to meet our IOPs requirements.

There were other storage device options, but all were more expensive.  Our budget was spent, and there was no going back to management for more of it.

I couldn’t understand why none of our vendor reps had mentioned VMware vSAN as an option.  This was an avenue we hadn’t considered.  I mentioned this to the boss, discussed the potential to use VM Encryption instead of buying SEDs, and advised looking at ReadyNodes.

A few days later the idea was dismissed due to cost.  That didn’t make sense to me, and we still had no solution.  At this point I picked the brains of several community members to vet my idea from a cost standpoint.  One recommendation was to look at single-socket ReadyNodes to save on vSAN, vCenter, and vSphere licensing.  The servers would have an extra processor slot if we needed to add it later.  Our infrastructure as a whole was not CPU heavy, and of course the vendor had put two processors in the servers we ordered along with the PowerVault.

This licensing model spawned a series of new planning conversations.  There was talk of VxRail as an option, but it would have been too costly.  In the end, we found a solution that met our needs from a performance standpoint, spent no additional funds, and eliminated a single point of failure by using distributed storage.  See for yourself how the landscape changed.

The original BOM was:

  • Three dual-CPU PowerEdge servers with no local storage
  • PowerVault MD 3820i
  • vSphere Standard for 6 sockets
  • vCenter Standard
  • Four PowerConnect switches

 

The new BOM was:

  • Four single-CPU PowerEdge servers (modified ReadyNodes), each with 2 disk groups (SSD for cache, HDDs for capacity)
  • vSAN Standard for 4 sockets
  • vSphere Enterprise Plus for 4 sockets
  • vCenter Standard
  • Hytrust KeyControl – 2 node license
  • Two PowerConnect switches

Lessons Learned

Technical validation is a big deal when making any purchase.  Sometimes the vendor doesn’t get it right.  Sometimes we as IT buyers don’t get it right.  When mistakes are made, consider all of the options before asking for more budget.  Be willing to work with your vendor (assuming you have a good relationship with them) to try and right the ship.   I guarantee both parties will learn and grow from the experience.

 

Helpful Links

Leave a Reply

Your email address will not be published. Required fields are marked *