Monday, May 13, 2013

Guest VM crash due to vcpu-0:EPT misconfiguration


Guest VM powered off on ESXi 5.1 or 5.0 due EPT misconfiguration (vmware.log)

Snippet of the log :(vmware.log in location of .vmx file)
2013-05-03T17:27:43.262Z| vcpu-1| MONITOR PANIC: vcpu-0:EPT misconfiguration: PA b49b405b0
2013-05-03T17:27:43.262Z| vcpu-1| Core dump with build build-623860
2013-05-03T17:27:43.262Z| vcpu-1| Writing monitor corefile "/vmfs/volumes/51548019-3efd569e-d4d8-002590840e37/ServiceVM/vmmcores.gz"
2013-05-03T17:27:43.262Z| vcpu-6| Exiting vcpu-6
2013-05-03T17:27:43.262Z| vcpu-3| Exiting vcpu-3
2013-05-03T17:27:43.262Z| vcpu-7| Exiting vcpu-7
2013-05-03T17:27:43.262Z| vcpu-2| Exiting vcpu-2
2013-05-03T17:27:43.262Z| vcpu-0| Exiting vcpu-0
2013-05-03T17:27:43.262Z| vcpu-4| Exiting vcpu-4
2013-05-03T17:27:43.262Z| vcpu-5| Exiting vcpu-5
2013-05-03T17:27:43.268Z| vcpu-1| Saving anonymous memory
2013-05-03T17:27:43.280Z| vcpu-1| Dumping core for vcpu-0
2013-05-03T17:27:43.280Z| vcpu-1| CoreDump: dumping core with superuser privileges
2013-05-03T17:27:43.280Z| vcpu-1| VMK Stack for vcpu 0 is at 0x41223b241000
2013-05-03T17:27:43.280Z| vcpu-1| Beginning monitor coredump
2013-05-03T17:27:44.181Z| vcpu-1| End monitor coredump
2013-05-03T17:27:44.182Z| vcpu-1| Dumping core for vcpu-1
2013-05-03T17:27:44.182Z| vcpu-1| CoreDump: dumping core with superuser privileges
2013-05-03T17:27:44.182Z| vcpu-1| VMK Stack for vcpu 1 is at 0x41223b2c1000
2013-05-03T17:27:44.182Z| vcpu-1| Beginning monitor coredump
2013-05-03T17:27:45.062Z| vcpu-1| End monitor coredump
2013-05-03T17:27:45.063Z| vcpu-1| Dumping core for vcpu-2
2013-05-03T17:27:45.063Z| vcpu-1| CoreDump: dumping core with superuser privileges
2013-05-03T17:27:45.063Z| vcpu-1| VMK Stack for vcpu 2 is at 0x41223b301000
2013-05-03T17:27:45.063Z| vcpu-1| Beginning monitor coredump
2013-05-03T17:27:45.940Z| vcpu-1| End monitor coredump
2013-05-03T17:27:45.941Z| vcpu-1| Dumping core for vcpu-3
2013-05-03T17:27:45.941Z| vcpu-1| CoreDump: dumping core with superuser privileges
2013-05-03T17:27:45.941Z| vcpu-1| VMK Stack for vcpu 3 is at 0x41223b341000
2013-05-03T17:27:45.941Z| vcpu-1| Beginning monitor coredump
2013-05-03T17:27:46.813Z| vcpu-1| End monitor coredump
2013-05-03T17:27:46.814Z| vcpu-1| Dumping core for vcpu-4
2013-05-03T17:27:46.814Z| vcpu-1| CoreDump: dumping core with superuser privileges
2013-05-03T17:27:46.814Z| vcpu-1| VMK Stack for vcpu 4 is at 0x41223b381000
2013-05-03T17:27:46.814Z| vcpu-1| Beginning monitor coredump
2013-05-03T17:27:47.685Z| vcpu-1| End monitor coredump
2013-05-03T17:27:47.685Z| vcpu-1| Dumping core for vcpu-5
2013-05-03T17:27:47.685Z| vcpu-1| CoreDump: dumping core with superuser privileges
2013-05-03T17:27:47.685Z| vcpu-1| VMK Stack for vcpu 5 is at 0x41223b3c1000
2013-05-03T17:27:47.686Z| vcpu-1| Beginning monitor coredump
2013-05-03T17:27:48.559Z| vcpu-1| End monitor coredump
2013-05-03T17:27:48.559Z| vcpu-1| Dumping core for vcpu-6
2013-05-03T17:27:48.559Z| vcpu-1| CoreDump: dumping core with superuser privileges
2013-05-03T17:27:48.560Z| vcpu-1| VMK Stack for vcpu 6 is at 0x41223b401000
2013-05-03T17:27:48.560Z| vcpu-1| Beginning monitor coredump
2013-05-03T17:27:49.429Z| vcpu-1| End monitor coredump
2013-05-03T17:27:49.430Z| vcpu-1| Dumping core for vcpu-7
2013-05-03T17:27:49.430Z| vcpu-1| CoreDump: dumping core with superuser privileges
2013-05-03T17:27:49.430Z| vcpu-1| VMK Stack for vcpu 7 is at 0x41223b441000
2013-05-03T17:27:49.430Z| vcpu-1| Beginning monitor coredump
2013-05-03T17:27:50.305Z| vcpu-1| End monitor coredump
2013-05-03T17:27:50.306Z| vcpu-1| Dumping extended monitor data
2013-05-03T17:27:56.966Z| vcpu-1| Msg_Post: Error
2013-05-03T17:27:56.966Z| vcpu-1| [msg.log.monpanic] *** VMware ESX internal monitor error ***
2013-05-03T17:27:56.966Z| vcpu-1| --> vcpu-0:EPT misconfiguration: PA b49b405b0
2013-05-03T17:27:56.966Z| vcpu-1| [msg.log.monpanic.report] You can report this problem by selecting menu item Help > VMware on the Web > Request Support, or by going t
9%2d3efd569e%2dd4d8%2d002590840e37%2fServiceVM%2fvmmcores%2egz". Provide the log file (/vmfs/volumes/51548019-3efd569e-d4d8-002590840e37/ServiceVM/vmware.log) and the c
ore file(s) (/vmfs/volumes/51548019-3efd569e-d4d8-002590840e37/ServiceVM/vmmcores.gz, /vmfs/volumes/51548019-3efd569e-d4d8-002590840e37/ServiceVM/vmx-zdump.000).
2013-05-03T17:27:56.966Z| vcpu-1| [msg.log.monpanic.serverdebug] If the problem is repeatable, set 'Use Debug Monitor' to 'Yes' in the 'Misc' section of the Configure V
irtual Machine Web page. Then reproduce the incident and file it according to the instructions.
2013-05-03T17:27:56.966Z| vcpu-1| [msg.log.monpanic.vmSupport.vmx86] To collect data to submit to VMware support, run "vm-support".
2013-05-03T17:27:56.966Z| vcpu-1| [msg.log.monpanic.entitlement] We will respond on the basis of your support entitlement.
2013-05-03T17:27:56.966Z| vcpu-1| [msg.log.monpanic.finish] We appreciate your feedback,
2013-05-03T17:27:56.966Z| vcpu-1| -->   -- the VMware ESX team.
2013-05-03T17:27:56.966Z| vcpu-1| ----------------------------------------
2013-05-03T17:27:57.894Z| vmx| GuestRpcSendTimedOut: message to toolbox timed out.



vmkwarning.log:2013-05-03T17:27:59.813Z cpu5:4151)WARNING: PFrame: vm 1543881: 1514: Deallocating pinned pgNum 0x0, pinCount 1 throttle 0.
vobd.log:2013-05-03T17:27:56.966Z: [UserWorldCorrelator] 1984725641586us: [vob.uw.core.dumped] /bin/vmx(1543893) /vmfs/volumes/51548019-3efd569e-d4d8-002590840e37/ServiceVM/vmx-zdump.000
vobd.log:2013-05-03T17:27:56.966Z: [UserWorldCorrelator] 1984705573722us: [esx.problem.application.core.dumped] An application (/bin/vmx) running on ESXi host has crashed (1 time(s) so far). A core file may have been created at /vmfs/volumes/51548019-3efd569e-d4d8-002590840e37/ServiceVM/vmx-zdump.000.


Seems CVM crash

-rw-r--r--    1 root     root           16276242 May  3 17:27 vmmcores-1.gz
-r--------    1 root     root            5394432 May  3 17:27 vmx-zdump.000
-rw-r--r--    1 root     root             240217 May  3 17:28 vmware-5.log
-rw-------    1 root     root           54525952 May  4 02:34 vmx-ServiceVM-830969


Here is the workaround from VMware support:
1. Can you let me know what .vmx changes needs to be done to reduce the
occurrence of the bug?
- change the MMU setting to software virtualization as shown in KB Article: 1036775
- another possible workaround is to change (or add) the following line in the .vmx file:
    monitor_control.disable_mmu_largepages=TRUE

2. Can you give us the instructions to install the patch and also the
debug patch ? We will test it internally before giving to the customer ?
- we have submitted your request to engineering
- engineering will investigate that the issue in your case to see if it exactly matches the particular issue being investigated
- they will let us know if your case qualifies for a debug patch

3. Is the fix in ESXi 5.0 U3 or 5.1 U3. ?
- actually, there is no fix at this time
- the issue is still under investigation

4. does increasing the VM memory help reduce the occurrence of it ? I see
from /proc/meminfo (grep AS) about 9G is being used of 12 G in the guest VM
? May be increasing VM memory to 16G might reduce the occurrence of it.
- No, decreasing the memory for this (and other) virtual machines might reduce the occurrence of the issue.
- see KB article 1021896 for details

5.  however the information requested is published in KB Article: 2040519