The purpose of this operation was to encapsulate existing datastores into VPLEX to take benefit of the synchronous and high availability features of VPLEX Metro between 2 sites.
Unfortunately I got into trouble with an issue that I really didn’t expect and anticipate. This is why I thought important to give you my experience on this and hopefully help anyone planning to do the same. But before going any further, let me give you some info regarding the customer environement and VPLEX.
VPLEX is a virtual storage system that stands between your storage arrays and your hosts. It virtualizes all arrays connected to it and then presents as one storage system your volumes to hosts.
So basically, you attach any compatible storage to VPLEX and then the magic goes, all your hosts will see just one array with the volumes you’d like to present. I won’t go into details on how to configure VPLEX as this is out of scope in this article.
One of the strength of VPLEX is that it’s a great storage system and perhaps the best allowing to have a real active-active infrastructure and migrate VMs easyly between multiple sites. But is good also for migration of multiple storage arrays.
There are 3 differents VPLEX products:
- VPLEX Local: allow storage virtualization within one datacenter
- VPLEX Metro: allow storage virtualization and synchronous connection (active-active) between multiple sites
- VPLEX Geo: allow storage virtualization and asynchronous (active-active) connection between multiple sites
The customer environement is based of 3 sites located in 2 datacenters and 1 dataroom.
Hardware is identical on both datacenters:
- 1 DellEMC VNX 5300 and 1 VNX 5600
- 1 DellEMC VPLEX Single Engine VS2
- 7 Dell Chassis M1000E with about 55 M720 and M730 blades
- 2 Brocade SAN switch 8Gb/s
- 107 ESXi Servers 6.0U2
Now the software part:
- 1 vCenter Server 6.0U2
- DellEMC PowerPath VE 6.2
- 1 VPLEX Witness VM (in dataroom)
Here’s a visual of the VPLEX Metro infrastructure with VMware
In order to get things done properly, I gathered a lot of info and spent time with the storage guys before the big day. I prepared also some scripts to automate things as much as possible.
I installed PowerPath VE 6.2 on all ESXi hosts and made sur that all paths were available. PowerPath VE is great solution that handle better multipathing than VMware does natively when using DellEMC storage arrays.
As you can imagine, such an operation requires to shut down a LOT of VMs that were in production and this had to be done during the weekend.
So here are the steps I went through:
- Shut down VMs
- Remove VMs from inventory
- Unmount datastores
- Detach storage devices
- Encapsulate LUNs (datastores) into new VPLEX virtual volumes
- Put VPLEX virtual volumes into distributed volumes (that are replicated and synchronized across both VPLEX)
- Present VPLEX virtual volumes to ESXi hosts
- Run a “rescan storage” to find all storage devices and VMFS volumes
- Make sure that all ESXi hosts have access to datastores and see all paths
- Register VMs
- Start VMs
There are other steps on the storage side but that I won’t describe as I’m mainly focusing on VMware here.
Everything went fine until step 7 and then troubles began and I’m going to explain why.
After presenting datastores to ESXi hosts, I could see that all paths were available and all LUNs were seen properly. However, some LUNs couldn’t be attached and mounted to each hosts but only some of them and I couldn’t understand why. That was really weird as some datastores were mounted to some ESXi host and others not.
So I spent a long time trying to figure out what happened, by looking into logs, checking with the storage team that everything was fine on VPLEX and so on. One weird thing that happened also was that when I tried to manually add a datastore by selecting a LUN into the “Storage > New Datastore” menu, the only available option I had was “Format the disk“, and the other 2 “Keep the existing signature / Assign a new signature” were greyed out”.
I was going crazy and decided to open a case with VMware to gain some time while searching. After a while I found on a web site a solution.
Apparently, after encapsulating existing datastores into VPLEX, they are not considered as regular VMFS volumes anymore but as “Snapshots”. So the only way to solve the issue is to attach snapshot to ESXi hosts one by one through the CLI.
The first step was to scan and make sure that all volumes were seen on each host one by one by running this command: “esxcfg-volume -l“. This took me a while, but fortunately all volumes were seen by every host.
Then mount every datastores that are not accessible to hosts by running this command “esxcfg-volume -M <datastore name>”. It is important to write the letter M in capital so that the datastatore mount is persistent across reboot. I tried of course to reboot some hosts to make sure of it and it worked!
Before I was done with mounting all the datastores, the VMware support called me back to provide a similar command that does the exact same thing. They explained to me that this issue is a normal behavior with VPLEX when using existing datastore with data on it.
Here is the command from the VMware support: “esxcli storage vmfs snapshot mount -u <device identifier =naaxxxxxx>”
Once all datastores were mounted, we powered on the VMs but not all at once to avoid to generate too many IOs and really mess everything up ; and we were good to go :).
After a few days of monitoring the customer confirmed that everything was fine, and a few months later I was in charge to upgrade his vSphere environment to 6.5 🙂
Like I said earlier, I really didn’t anticipate and expect such behavior with VPLEX. Anyway this is a good experience and here’s what I advise to you guys whom are planning to do the same implementation.
1. If you’re building a new infrastructure then no worries, here’s what you need to do:
- Create new LUNs on your primary storage array
- Put those LUNs into VPLEX volumes
- Present VPLEX volumes to your ESXi hosts
- Format the volumes to VMFS datastores
2. But if you’re using existing storage with data, follow the below:
- Create a new or multiple datastore(s) that will be used to store VM data temporarily
- Migrate VMs with Storage vMotion to that temp datastore
- Properly unmount the existing datastores and detach storage devices (LUNs)
- Put the datastores into VPLEX virtual volumes
- Present VPLEX virtual volumes to ESXi hosts
- Format all volumes into VMFS datastores in vCenter
- Re-migrate all the VMs with Storage vMotion to the fresh formatted datastores
- Delete the temp datastore(s)
By the way, I’m sorry guys as you won’t find much screenshots of the incident in this article. As you can imagine, I was a bit under pressure during this operation and at that time I had not in mind to write a blog post about it.
Hope you like this article that will help many of you 😉