Sunday, April 21, 2013

Understanding "Ready Time" in VMware vSphere - Some Excerpts From VMware Docs


Ready Time
(Originally published on Aug 27, 2008)
     Stated simply, ready time is the amount of time a VM wants to run but  has not be provided CPU resources on which to execute.  Somewhat  confusingly, ready time is reported in two different values between  esxtop and VirtualCenter.  In esxtop is reported in an easily-consumed  percentage format.  A number of 5% means the VM spent 5% of its last  sample period waiting for available CPU resources.  In VirtualCenter  ready time is reported as a time measurement.  In VC's real-time data,  which produces sample values every 20,000 ms, a number of 1,000 ms is  reported for a 5% ready time...
The most common cause of high ready time is trying to get too much work out of too little hardware. 

Co-scheduling SMP VMs in VMware ESX Server
(Originally published on May 2, 2008)
     For a multiprocessor VM (also known as an "SMP VM"), it is important to present the guest OS and applications executing within the VM with the illusion that they are running on a dedicated physical multiprocessor.  ESX Server faithfully implements this illusion by supporting near-synchronous coscheduling of the virtual CPUs within a single multiprocessor VM. 
     The term "coscheduling" refers to a technique used in concurrent systems for scheduling related processes to run on different processors at the same time.  This approach, alternately referred to as "gang scheduling", had historically been applied to running high-performance parallel applications, such as scientific computations.  VMware ESX Server pioneered a form of coscheduling that is optimized for running SMP VMs efficiently...
   ...at any particular point in time, each virtual cpu (VCPU) may be scheduled, descheduled, preempted, or blocked waiting for some event.  Without coscheduling, the VCPUs associated with an SMP VM would be scheduled independently, breaking the guest's assumptions regarding uniform progress.  We use the term "skew" to refer to the difference in execution rates between two or more VCPUs associated with an SMP VM...
     ...Relaxed coscheduling in ESX Server 3.x...instead of requiring all VCPUs to be co-started, only those VCPUs that are skewed must be co-started.  This ensures that when any VCPU is scheduled, all other VCPUs that are "behind" will also be scheduled, reducing skew.  This approach is called "relaxed" coscheduling, since only a subset of a VM's VCPUs must be scheduled simultaneously after skew has been detected...


VMware Performance Study (PDF)
VMware ESX Server 3 -- Ready Time Observations
     As overall CPU utilization and the number of virtual machines increase, the scheduler is more likely to require a virtual machine to wait for access to a CPU. Even when a guest operating system is not servicing load, there are maintenance activities that the operating system must perform (for example, it must service clock interrupts to maintain correct time). Thus, even idle guests must be scheduled, consume CPU resources (albeit small), and accumulate ready time. The fact that the scheduler is allocating CPU resources to operating systems — rather than to applications as a normal operating system does — can make the scheduling somewhat more complex than it is in normal operating systems...Ready time can be an indicator of saturation on a system.
Several factors affect the amount of ready time seen.
  • Overall CPU utilization: You are more likely to see ready time when utilization is high, because the CPU is more likely to be busy when another virtual machine becomes ready to run.
  • Number of resource consumers: (in this case, guest operating systems): When a host is running a larger number of virtual machines, the scheduler is more likely to need to queue a virtual machine behind one or more that are already running or queued.
  • Load correlation: If loads are correlated — for example, if one load wakes another one when the first load has completed its task — ready times are unlikely. If a single event wakes multiple loads, high ready times are likely.
  • Number of virtual CPUs in a virtual machine: When co-scheduling for n-way Virtual SMP is required, the virtual CPUs can be scheduled only when n physical CPUs are available to be preempted.
     In multiprocessor systems, an additional factor affects ready time. Virtual machines that have been scheduled on a particular CPU will be given a preference to run on the same CPU again. This is because of performance advantages of finding data in the CPU cache. In a multiprocessor system, therefore, the ESX Server scheduler may choose to let a few cycles on a CPU stay idle rather than aggressively move a ready virtual machine to another CPU that may be idle.
    Ready time for a process in isolation cannot be identified as a problem. The best metrics for
examining the health of a server continue to be:
  1. CPU utilization
  2. Response time, and
  3. Application queues