Guide to understanding LoginVSI test results and how to compare
The use of LoginVSI as a VDI performance testing or validation tool has increased over the last several years. It’s really the only tool to offer these services from an independent party, so by default it’s the defacto option for vendors to showcase their solutions. Vendors use LoginVSI on a regular basis to showcase how their solution meets a common set of tests which make them a candidate to be considered for your VDI projects.
To learn the basics on how to understand the results from LoginVSI tests you can refer to a post on their blog here. It’s an older post but still pretty valid since the data points have not changed that much over the versions. The danger enters when you are looking to take testing results from multiple vendors and compare them. You simply cannot take results from different tests and compare the data points without understanding how the testing was done, what was tested and how the differences in the tests affect the results. Also what should you be aware of that might affect the results?
So in this post, I will lay out a number of items to help educate on how to better understand, compare and interpret these LoginVSI results that are published. Because while someone may be publishing a very low VSIbase number and/or high densities you need to be able to determine whether that means anything to your environment and if it’s really valid to anyone.
I think most people understand that comparing results from a Citrix test to a VMware Horizon test is not apples to apples. There is a certain amount of overlap that can be accounted for but to be fair you should be comparing tests with the same data points. Then there is different types of desktops, the whole persistent versus non-persistent discussion and how these apply to each other. Both VMware and Citrix each offer two different types of non-persistent provisioning options within their products now, so comparing results gets even more fuzzy. I don’t see many vendors running LoginVSI testing using persistent desktops, so that should not be much of a concern. But there are significant differences in how the different non-persistent provisioning options work that should be aware of when interpreting results.
The version of the Citrix or VMware broker should be the same or very close in the different tests that you are comparing. Along with the many provisioning options explained below the version of the broker could affect the test results if one revision provided performance improvements that another did not provide.
Citrix offers two different provisioning methods for non-persistent desktops, which will be the focus for the majority of tests that you will encounter. Some vendors may provide results for both. The different options are Machine Creation Services (MCS) and Provisioning Services (PVS). In short MCS is a storage based architecture while PVS is heavily networked based using centralized caching points. Each option is explored a bit deeper in the following sections.
The PVS architecture is unique to Citrix and typically uses multiple PVS servers that are load balanced. The golden image for the VDI pool is loaded onto the PVS servers and presented as read-only. These PVS servers use memory within the server OS as a caching layering that allows commonly accessed blocks to be quickly returned to guests improving performance. Each VDI/SBC virtual machine is referred to as a PVS target and is a VM with no persistent data or OS installed. These PVS targets boot the golden image via a network connection to one of the PVS servers.
The writes for each PVS target can be cache in a number of different ways. Each method offers its pros and cons and makes comparing results invalid if the tests are run using different PVS write caching methods.
- Cache in device RAM – Each PVS target (VDI VM) will be assigned additional memory above what your OS/image requirements are and this is memory from the physical host supplying resources to all VMs running on it. The writes for each VM will go into the assigned RAM for that target and be persisted until the session is finished.
- Cache on device hard disk – In this option, the writes for each VM are stored on a local hard disk for each VM and this is typically the same storage that is being used for running all VDI virtual machines.
- Cache in device RAM with overflow on hard disk – This last option is a combination of the two previous methods. It typically is configured to provide a smaller amount of RAM for caching and if writes exceed this amount during a session it will begin to use hard disk for the overflow.
** When interpreting the results you cannot fairly compare a test that uses PVS with RAM cache versus a test that uses PVS with disk cache.
** It would also be incorrect to compare a test that uses PVS with any caching method to a test that uses MCS with any caching method.
** A point to question is why any modern hybrid or all-flash storage vendor would utilize PVS with RAM cache to showcase their storage solution. This virtually removes the storage solution from the testing and does not validate their solution. PVS is a legacy solution that was designed to hide the poor performance of legacy storage arrays.
** If a vendor tested with PVS and does not specifically explain how the write cache was setup and configured, you should be suspicious and request further details. If there are performance charts look for write performance, if it’s low or near zero they are using RAM cache.
The MCS architecture takes a storage-focused approach to provisioning non-persistent desktops. The golden image is a shared VM that all of the virtual desktops in a pool with boot from. This golden image is read only and all read requests are provided by the storage that it’s sitting on. This is different than PVS in that there are no PVS servers that provide read requests that are cached in memory.
Until XenDesktop 7.9 all reads and writes from these desktop virtual machines were serviced by the storage they were running on. In 7.9 Citrix introduced the Cache in device RAM and Cache in RAM with overflow options for MCS also, that will use the host memory as a caching layer for writes. This now allows the same write caching options between PVS and MCS with the main difference being where the reads are serviced for the golden image.
** When interpreting the results you cannot fairly compare a test that uses MCS with RAM cache versus a test that uses MCS with disk cache.
** It would also be incorrect to compare a test that uses MCS with any caching method to a test that uses PVS with any caching method.
** A point to question is why any modern hybrid or all-flash storage vendor would utilize MCS with RAM cache to showcase their storage solution. This removes the storage solution from services all or most of the write traffic and does not validate their solution.
VMware also offers two different provisioning methods for non-persistent desktops, which will be the focus for the majority of tests that you will encounter. The different options are Linked Clones and Instant Clones. As of 2016, I don’t think you will see any vendors test anything but linked clones as instant clones are a new technology that is still maturing. In the future, I would expect that many vendors will begin to provide results for both. In short, linked clones is a storage based architecture while instant clones is a new method that removes several of the large storage spikes. Each option is explored a bit deeper in the following sections.
The linked clone provisioning method from VMware is very similar to the MCS option explained above from Citrix, but without the different write caching options. Linked clones use a golden image or replica that all read operations for the desktop pool are serviced from. Each desktop virtual machine has a delta disk and this is where all write operations are performed. This makes linked clones a provisioning method that is heavily affected by the performance of the storage platform used in your design.
There is one caching alternative available for linked clones that is called the View storage accelerator or Content Based Read Cache (CBRC). This can utilize up to 2GB of host memory to cache commonly accessed bits from the replica image for read operations.
** Take note that if testing was done using the storage accelerator as it removes some of the read operations from the storage system and cannot be fairly compared to another test that does not use the same feature.
** Likewise if you do compare results of tests and the vendor that does not use storage accelerator is able to provide better results than a vendor that does use it is something to be aware of.
The instant clone architecture is somewhat like a modernized version of linked clones. The philosophy is similar but rather than each pool using a single replica image and pulling and reads from that single image on storage, instant clones creates a replica VM for the pool image on every host. The replica on each host has the OS booted and then placed in a stunned state. This allows each new virtual desktop when created to use the state of the replica as its starting point without the need to initially boot up the OS. This approach saves time and reduces storage peaks during provisioning and image update procedures.
For these reasons, it would not be a fair to compare a test that used instant clones to one that used linked clones. The different provision methods can dramatically affect the provisioning times and I/O behavior during steady states.
The version of Windows is important when comparing test results. While Windows version may not be as impactful as some of the other points discussed in this post, it is still something that must be taken seriously. I think that most will agree that Windows 7 or 10 are the primary Client OS versions that are already deployed or being deployed today. Deploying Windows 10 will result in about a 10-20% reduction in user density.
This is important for when you are sizing your own environment, but since we are talking about LoginVSI testing is also very relevant to this discussion also. Different versions of MS Office can dramatically affect the performance and density of tests. You can read more about the effects of different office versions on RDSH/VDI user densities here, to save you the time reading I will summarize. Office 2010 currently offers the best user density of Office versions that are widely deployed still (although no longer in mainstream support). Using Office 2013 will result in a 20% reduction in density when compared to 2010. Office 2016 further reduces the density 5% lower than 2013 or 25% less than 2010.
As you can see that with the effects Office can have on user densities it would not be accurate to compare tests that used different Office versions. Lets look at the following scenario, the vendor you prefer has published a report that meets all of your requirements.
- Vendor fulfills all your requirements
- VSIbase is attractive
- Density is 20% lower than other tests
- But vendor is using Office 2016 in their testing
- Same OS versions, same provisioning methods used
Based on these points I would be comfortable given that all other testing points match, the lower density can be accounted for in different Office versions.
Just like it would not be correct to compare the towing capacities of a truck with a v6 engine to that of a v8 engine, comparing results from tests that use different Intel CPUs is also not apples to apples. In general, you should be comparing test results that showcase the same CPU generation, if that is not a possibility then you can look at results but there would be no way to account for potential differences in results.
- Intel Ivy Bridge processors
- Intel Haswell v3 processors
- Intel Broadwell v4 processors
Each of these CPU generations offers a performance increase over the previous version that affects both consolidation ratios and overall performance. These performance benefits are obvious for the virtual desktops, but if you are running a software-defined storage solution or hyperconverged (HCI) solution these storage layer will also benefit from these CPU improvements.
Memory can be of impact on your environment, running specific combinations can result in enhanced or worse decreased performance in terms of the speed that is available (1866 vs 2133Mhz is a 20% density difference for example). This drop in memory speed is typically a result of configuring a server with too many DIMM slots populated which lowers the speed. If a configuration you are considering is using more than 512GB of host memory you should check with vendor documentation to understand what will happen to the memory speed for the proposed configuration.
Like any software vendor LoginVSI makes changes to their software on a regular basis and these changes can affect the testing results. Different versions of the testing software could be using different applications, different testing methodologies or other factors. For these reasons, it is important to make sure that when comparing test results from different tests that they are at least on the same major version release. It would be unfair to compare a testing run using LoginVSI 3.5 to one using version 4.0. It is more acceptable to compare tests run on 4.0 and 4.5 as long as all other factors explained in this post are in alignment.
When it comes to LoginVSI testing or just any type of VDI testing most people are seeking two primary data points. The first is storage performance which historically was the major pain point in past deployments. The second data point is user density or the number of virtual desktops per host. The reality is that LoginVSI testing is valuable but can in no way be used to tell you exactly what your environment should be sized like.
To size your environment you will need to understand your use cases and their requirements, then combine those details with performance results that were collected from your actual environment. This will provide you with actual data points that can be used for sizing and the LoginVSI results along with a skilled EUC architect can then provide customized sizing for your environment.
What to watch for
I’ve seen this all too often is that vendors will size the VMs that they will use for running their LoginVSI tests with for the bare minimum to pass the test. By providing the minimal amount of CPU and memory to each VM they can try and show a higher density of users to hosts while still passing the test. The danger here is that vendors that do this create a false sense of user density to confuses customers and architects. Saying that you can achieve 300 users per host while using a configuration that is not likely to be deployed into production by 99% of customers is worthless. And in doing so they are either trying a bait and switch method or proposing you dramatically undersize your design. Either of these approaches would get you thrown out of my office if I was the customer.
So when looking at published test results pay close attention to what the size of the desktops used during the testing was. A Windows 7 desktop with 1 vCPU and 1.5GB of memory may get you to pass the test results but for the vast majority of use cases is not going to provide a delightful user experience.
Todays VM averages
With the above discussion on what to look out for in desktop sizing, I thought it would be helpful to level set on what are acceptable sizing in 2016. Since the release of Windows 7 and even more when moving to Windows 8/10 the need for 2 vCPU for each virtual desktop is the new normal. Today when sizing you should default to 2 vCPU and only move down to 1 vCPU or increase past 2 vCPU when you have valid testing data to support the decision.
The default starting point for modern Windows versions should not go below 2GB of memory unless valid testing has been done to support the request. While 2GB is the starting point, I took an informal survey of several EUC experts and the results showed that their current default sizing for VDI is 2 vCPU and 3-4GB of memory. In the end, the amount of memory will depend on the use case requirements and the applications they are using, but these data points help provide some guidance on what is acceptable and what is not when it comes to test results.
Friends don’t let friends deploy 1 vCPU desktops
User Cases / Scenarios
So far I’ve covered a bunch of configuration points and hardware details that can affect test results. One of the last things to be aware of is to ensure that the tests you are comparing are using the same type of user case or scenario. These are commonly referred to as knowledge worker, task worker, developer, etc. These determine the applications and how demanding the workload will be. Obviously a developer use case is far more demanding than a task worker that commonly uses 1 or 2 simple applications. Most tests are focused on the knowledge worker use case.
Scaling out designs
Typically you will see vendors that are testing in a range of 200 to 1000 desktops in a test run. There may be a few tests that use larger quantities but are not as common. The main thing to look for here is does the vendor provide a detailed explanation of how you would scale from the tested amount to the amount that your end state design is projected to be. As an example if I am looking at a test for 1000 desktops, I will need to understand how this vendor would scale and my design would look for the 20,000 desktops that my environment is projected to be.
The default answer of most vendors will probably be it’s just a cookie cutter approach and you can just stamp out the same build as what was tested. This is not good enough and you should press them harder for real answers.
Questions to understand
These are several data points that you should understand when looking at larger designs or how you will scale from a starting amount to your future desired state.
- What are the cluster sizing limitations?
- Example would be can I only create clusters of 500 or 1000 users as an example, which means I need 20-40 clusters to reach 20,000 desktops.
- As you scale how does this affect management story?
If you evaluating platforms for your currently or future VDI/EUC environment then you have probably been looking at LoginVSI results. When going through your normal solution evaluation process be sure that you consider all of the points explained in this post, especially when trying to make sense of testing results. These will help you better understand how things were tested and also whether someone is trying to spin things unfairly in their favor.
About Brian Suhr
Brian is a VCDX5-DCV and a Sr. Tech Marketing Engineer at Nutanix and owner of this website. He is active in the VMware community and helps lead the Chicago VMUG group. Specializing in VDI and Cloud project designs. Awarded VMware vExpert status 6 years for 2016 - 2011. VCP3, VCP5, VCP5-Iaas, VCP-Cloud, VCAP-DTD, VCAP5-DCD, VCAP5-DCA, VCA-DT, VCP5-DT, Cisco UCS Design