Jume - My Virtualization Blog

My personal and professional virtualization blog. Everything about VMware, PowerCLI, Powershell, Agile, Scrum, VSAN and Cloud related.

Troubleshooting disk latency

I was asked to troubleshoot a VM. SCOM was reporting the following message: "Logical disk transfer (reads and writes) latency is too high - The threshold for the Logical Disk\Avg.". Also, jobs (this is a SQL server) took longer to run (almost twice a long). Now, this seems like an issue with disk I/O: the path to storage endpoint (SAN), incorrect HBA configuration or perhaps congestion on the storage array - none of them are true. Actually, it doesn't have anything to do with disk latency!

CPU|Contention (%)

So I opened up VROPS. Looked at the CPU | Contention. These are high numbers. The contention would suggest this VM is fighting for resources, cannot get it, and would be able to perform better. So I checked the host and the VM's configuration. It's a large VM, with 14 vCPU's, on a 16 core host (32 logical CPU's, hyperthreading is enabled). BUT IT'S THE ONLY VM ON THIS HOST!!!

CPU|Demand (%)

Next, I took a look at the CPU | Demand. Demand was high, but while demand was high, the contention was low! So this confirmed the conflict with other VMs issue - there is none. Something else is going on here! I remembered an issue with C-State throttling on Intel CPUs. If a VM is low on CPU Demand, the physical CPU doesn't require to run to full clock speeds. However, once a VM demands more CPU resources, the CPU needs to switch to a higher rate, and the duration to switch to that faster state introduces latency. It's called %C_Lat and can be viewed in ESXTOP. %C_Lat is also taken into account when calculating CPU|Contention (%), so we need to check the performance setting in the BIOS on the host.


​So I opened up OneView, checked the profile and by default it's configured to 'Balanced Power and Performance'. I've switched it to 'Maximum Performance'. Have a look at the screenshot below, and see what changes:

Here the profile is set to 'Balanced Power and Performance'. C6 State is enabled and also Package C6 is enabled.
Here the profile is set to 'Maximum Performance'. There are 'No C-States' and also 'No P-States'. Also DIMM Voltage Preference changes to 'Optimed for Performance'.

Great. I applied the profile and waited until the host was started again. Note the difference how ESXi is displaying the power profile. Both display 'High Performance' but 'Technology' is ACPI C-states.

This is the Powersetting when 'Balanced' is selected in BIOS
This is the Powersetting when 'High Performance' is selected.

Awesome... I've moved the VM back to it's original server. And we should see improvements...

 
Troubleshooting disk latency (cont.)
Blogging again

Related Posts

 

Comments 3

Guest - Interested party on Wednesday, 15 January 2020 17:25

you forgot to update the post with your results after a full day of data collection

you forgot to update the post with your results after a full day of data collection:D
Bouke Groenescheij on Thursday, 16 January 2020 06:34

Hi,

Thanks for reading. Actually I did, but to make it more clear I added a link to that post: https://www.jume.nl/troubleshooting-disk-latency-cont.

Cheers!

Hi, Thanks for reading. Actually I did, but to make it more clear I added a link to that post: [url=https://www.jume.nl/troubleshooting-disk-latency-cont]https://www.jume.nl/troubleshooting-disk-latency-cont[/url]. Cheers!
Guest - Troy Hrehirchuk on Friday, 17 January 2020 15:36

Excellent writeup I found your link on Twitter and the discussion with Duncan Epping.

Excellent writeup I found your link on Twitter and the discussion with Duncan Epping.
Guest
Thursday, 03 December 2020

By accepting you will be accessing a service provided by a third-party external to https://jume.nl/