VMware Path Selection Policy, SSDs Limit in DS3500/DS3700...
The latest Tech Tips from Jim Latham, Sr. Systems Engineer…
1.VMware Path Selection Policy:
In setting up a new VMware installation with our DS3524 with 10Gbps
The latest Tech Tips from Jim Latham, Sr. Systems Engineer…
1.VMware Path Selection Policy:
In setting up a new VMware installation with our DS3524 with 10Gbps iSCSI HICs, we configured MRU and so no real issues when connected to 1Gb ports. When we moved to 10GbE ports we started to see SCSI disconnects whenever we placed a high IO load such as a storage VMotion or VM clone. We changed the path policy to round robin (RR) and all the errors and disconnects stopped. We have only done 4 isolated servers so far but the results have been good. My problem is the VMware HCL lists MRU as the path policy for the DS3500, and our customer may validate that we're configuring his systems "correctly".
There is a note on the HCL that talks about RR as the path selection policy, but it's more of a generic note that says "contact the storage array manufacturer for recommendations and instructions". This note appears on almost every storage array, so I don't believe it carries much weight.
http://www.vmware.com/resources/compatibility/detail.php?deviceCategory=san&productid=14700&releaseid=148&deviceCategory=san&partner=43&releases=148&arrayTypes=1&isSVA=1&page=1&display_interval=10&sortColumn=Partner&sortOrder=Asc
ANSWER
Please note that currently IBM supports MRU, and I'm not sure when that will be modified (further explained below). Because of that, this Tip may be "stepping slightly out of bounds" a little with my response.
I'm currently on a crusade to get the VMware HCL updated to reflect both MRU and RR as valid path policies. I spoke to some NetApp/Engenio engineers, and the introduction of higher latencies with storage connected over IP seemed reasonable, especially if all traffic is going down a single path as would be the case with MRU. They also said that our storage (NOT A STATEMENT ON IBM STORAGE, just splitting hairs) works with either MRU or RR as path selection policies (with newer versions of VMware).
I believe the issue is that VMware has, over the years, improved their failover driver to be much more sophisticated. In the old days, it might not have recognized the difference between paths to the controller that owned the lun, and the one that didn't. That all seems to have been cleared up in the 4.x timeframe.
As always, I hate putting myself in front of a speeding bus, and would not ask anyone else to do so either. So, for supporting documentation (and the reasoning behind my crusade), note the following:
• From the HCL at the URL above, the note alludes to the improvements made to ESX 4.0 and later.
o "Attention: Storage partners using ESX 4.0 or later may recommend VMW_PSP_RR for path failover policy for certain storage array models. If desired, contact the storage array manufacturer for recommendation and instruction to set VMW_PSP_RR appropriately."
• From a Vmware Knowledge Base article describing multipath policies in ESX/ESXi 4.x and ESXi 5.x, that seems to also support the point that it's "safe" (listed as a Note):
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1011340
o "Switching to Round Robin from MRU or Fixed is safe and supported for all arrays, Please check with your vendor for supported Multipathing policies for your storage array. Switching to a unsupported pathing policy can cause an outage."
• From the 4.1 iSCSI SAN Configuration Guide (Vmware http://www.vmware.com/pdf/vsphere4/r41/vsp_41_iscsi_san_cfg.pdf)
o "If your array does not support the ALUA protocol(the DS products currently do not support ALUA) and you want your host to do automatic load balancing, configure your devices to use the Round Robin PSP."
• From the ESXi 5.0/vCenter Server 5.0 Storage Guide
(http://pubs.vmware.com/vsphere-50/topic/com.vmware.ICbase/PDF/vsphere-esxi-vcenter-server-50-storage-guide.pdf) in describing storage path policies:
o VMW_PSP_RR The host uses an automatic path selection algorithm rotating through all active paths when connecting to active-passive arrays, or through all available paths when connecting to active-active arrays. RR is the default for a number of arrays and can be used with both active-active and active-passive arrays to implement load balancing across paths for different LUNs.
• Another blog supporting the argument: http://www.boche.net/blog/index.php/2011/09/28/changing-the-default-vsphere-5-0-psp-to-round-robin/
Of course none of this is as nice as having the HCL updated, so we'll keep working on that. And if I get any updates, I'll let you know. In the meantime, hope this helps…
2.SD Limits in DCS3700:
I understand (please correct me if I'm wrong) that the DCS3700 has a max cap of 20 SSD drives per storage subsystem. Is this same limitation true for the DS3524 as they both use the same controllers?
If you can verify that the 192 drive value in the charts are also applicable to the SSD performance #'s that would be great. Again, I'm not sure if Snowmass supports 192 SSDs at once?
ANSWER
There is a limit of 20 SSDs in the storage subsystem (this limit is the same for DS5000, DS3500, and DCS3700). I just called a colleague in engineering to verify if the limit was lifted yet or not. The answer was "not".
I believe the thought is that 20 SSDs is more than enough to max out the controller, so going to a larger SSD count would not be advantageous. Given the current pricing of the SSDs, the priority to get the drive count very high isn’t there. The relative cost of additional controllers is small compared to the drives, and the additional drives don’t provide value except for “capacity”.
If you want to have a rack of SSDs, then put in a rack of controllers also, and you'll get a much larger aggregate performance number. The only thing that doesn't make sense is why 20? So, with the next firmware release (mid-year), I am told the number will increase to 24 (for a fully loaded DS3524).
Per the previous logic, I am told that 60 would be a waste, so even with the DCS3700, the number is only going up to 24.
That's the bad news.
The good news is that a rack of 20 DS3524s, each with 20 SSDs should yield about 20 X 70K IOPS reads – that’s 1,400,000 IOPS! And 20 X 18K writes IOPS – or 360,000 IOPS writes - (depending on IO size, and assuming you can fit 20 systems in a rack).
FOR ADDITIONAL INFORMATION OR QUESTIONS PLEASE CONTACT:
Barry Kushin
Business Development Executive
IBM OEM Field Sales Americas
NetApp Inc.
Direct: 949-478-3510
Mobile: 949-230-2005
Stephanie Matthews / Strategic Marketing Consultant - IBM System x / 210-247-1361
Direct link: http://www.avnetadvantage.com/IBM/System-x/Strategies/#8799-3Posted on February 27, 2012 Read Less ↑


