nutanix: May 2013

Thursday, May 16, 2013

Create NutanixVswitch from scratch

There have been few instances migrating to and from Distributed Vswitch can cause the nutanix v
switch to be deleted or have wrong config. This will cause Genesis to crash and not start.

From CVM: ping -I 192.168.5.254 192.168.5.1 to verify if Internal Nutanixvswitch is good.

Error Message in data/logs/genesis.out:
2012-11-06 13:57:42 ERROR node_manager.py:2378 Could not load the local ESX configuration

Sample of right config of NutanixVswitch

~ # esxcfg-vswitch -l

Switch Name Num Ports Used Ports Configured Ports MTU Uplinks

vSwitchNutanix 128 3 128 1500

PortGroup Name VLAN ID Used Ports Uplinks

svm-iscsi-pg 0 1

vmk-svm-iscsi-pg 0 1

Make sure these info matches.
- No uplink interfaces
- Used ports 3
- Portgroup and # of used ports
- Name of the vswitch

Sample of vmknic: ( make sure subnet mask and ip address and enabled=true matches)

~ # esxcfg-vmknic -l

Interface Port Group/DVPort IP Family IP Address Netmask Broadcast \
MAC Address MTU TSO MSS Enabled Type

vmk1 vmk-svm-iscsi-pg IPv4 192.168.5.1 255.255.255.0 192.168.5.255 \
00:50:56:62:5a:e4 1500 65535 true STATIC

Commands to recreate the nutanix Vswitch:

1. esxcfg-vswitch -a vSwitchNutanix ( this needs to be done via CLI)

2. esxcfg-vswitch -A "svm-iscsi-pg" vSwitchNutanix

3. esxcfg-vswitch -A "vmk-svm-iscsi-pg" vSwitchNutanix
4. If vmknic does not exist,
esxcfg-vmknic -a -i 192.168.5.1 -n 255.255.255.0 -p vmk-svm-iscsi-pg
5. esxcfg-vmknic -l

6. enable the 192.168.5.1 vmknic
esxcfg-vmknic -e vmk1

7. On CVM , verify eth1 is part of svm-iscsi-pg (edit settings of CVM in Vcenter)

8.ifconfig eth1 and eth1:1 to verify if the ports are up and ip address is
configured correctly ( 192.168.5.2 and 192.168.5.254 respectively)

Monday, May 13, 2013

Guest VM crash due to vcpu-0:EPT misconfiguration

Guest VM powered off on ESXi 5.1 or 5.0 due EPT misconfiguration (vmware.log)

Snippet of the log :(vmware.log in location of .vmx file)

2013-05-03T17:27:43.262Z| vcpu-1| MONITOR PANIC: vcpu-0:EPT misconfiguration: PA b49b405b0

2013-05-03T17:27:43.262Z| vcpu-1| Core dump with build build-623860

2013-05-03T17:27:43.262Z| vcpu-1| Writing monitor corefile "/vmfs/volumes/51548019-3efd569e-d4d8-002590840e37/ServiceVM/vmmcores.gz"

2013-05-03T17:27:43.262Z| vcpu-6| Exiting vcpu-6

2013-05-03T17:27:43.262Z| vcpu-3| Exiting vcpu-3

2013-05-03T17:27:43.262Z| vcpu-7| Exiting vcpu-7

2013-05-03T17:27:43.262Z| vcpu-2| Exiting vcpu-2

2013-05-03T17:27:43.262Z| vcpu-0| Exiting vcpu-0

2013-05-03T17:27:43.262Z| vcpu-4| Exiting vcpu-4

2013-05-03T17:27:43.262Z| vcpu-5| Exiting vcpu-5

2013-05-03T17:27:43.268Z| vcpu-1| Saving anonymous memory

2013-05-03T17:27:43.280Z| vcpu-1| Dumping core for vcpu-0

2013-05-03T17:27:43.280Z| vcpu-1| CoreDump: dumping core with superuser privileges

2013-05-03T17:27:43.280Z| vcpu-1| VMK Stack for vcpu 0 is at 0x41223b241000

2013-05-03T17:27:43.280Z| vcpu-1| Beginning monitor coredump

2013-05-03T17:27:44.181Z| vcpu-1| End monitor coredump

2013-05-03T17:27:44.182Z| vcpu-1| Dumping core for vcpu-1

2013-05-03T17:27:44.182Z| vcpu-1| CoreDump: dumping core with superuser privileges

2013-05-03T17:27:44.182Z| vcpu-1| VMK Stack for vcpu 1 is at 0x41223b2c1000

2013-05-03T17:27:44.182Z| vcpu-1| Beginning monitor coredump

2013-05-03T17:27:45.062Z| vcpu-1| End monitor coredump

2013-05-03T17:27:45.063Z| vcpu-1| Dumping core for vcpu-2

2013-05-03T17:27:45.063Z| vcpu-1| CoreDump: dumping core with superuser privileges

2013-05-03T17:27:45.063Z| vcpu-1| VMK Stack for vcpu 2 is at 0x41223b301000

2013-05-03T17:27:45.063Z| vcpu-1| Beginning monitor coredump

2013-05-03T17:27:45.940Z| vcpu-1| End monitor coredump

2013-05-03T17:27:45.941Z| vcpu-1| Dumping core for vcpu-3

2013-05-03T17:27:45.941Z| vcpu-1| CoreDump: dumping core with superuser privileges

2013-05-03T17:27:45.941Z| vcpu-1| VMK Stack for vcpu 3 is at 0x41223b341000

2013-05-03T17:27:45.941Z| vcpu-1| Beginning monitor coredump

2013-05-03T17:27:46.813Z| vcpu-1| End monitor coredump

2013-05-03T17:27:46.814Z| vcpu-1| Dumping core for vcpu-4

2013-05-03T17:27:46.814Z| vcpu-1| CoreDump: dumping core with superuser privileges

2013-05-03T17:27:46.814Z| vcpu-1| VMK Stack for vcpu 4 is at 0x41223b381000

2013-05-03T17:27:46.814Z| vcpu-1| Beginning monitor coredump

2013-05-03T17:27:47.685Z| vcpu-1| End monitor coredump

2013-05-03T17:27:47.685Z| vcpu-1| Dumping core for vcpu-5

2013-05-03T17:27:47.685Z| vcpu-1| CoreDump: dumping core with superuser privileges

2013-05-03T17:27:47.685Z| vcpu-1| VMK Stack for vcpu 5 is at 0x41223b3c1000

2013-05-03T17:27:47.686Z| vcpu-1| Beginning monitor coredump

2013-05-03T17:27:48.559Z| vcpu-1| End monitor coredump

2013-05-03T17:27:48.559Z| vcpu-1| Dumping core for vcpu-6

2013-05-03T17:27:48.559Z| vcpu-1| CoreDump: dumping core with superuser privileges

2013-05-03T17:27:48.560Z| vcpu-1| VMK Stack for vcpu 6 is at 0x41223b401000

2013-05-03T17:27:48.560Z| vcpu-1| Beginning monitor coredump

2013-05-03T17:27:49.429Z| vcpu-1| End monitor coredump

2013-05-03T17:27:49.430Z| vcpu-1| Dumping core for vcpu-7

2013-05-03T17:27:49.430Z| vcpu-1| CoreDump: dumping core with superuser privileges

2013-05-03T17:27:49.430Z| vcpu-1| VMK Stack for vcpu 7 is at 0x41223b441000

2013-05-03T17:27:49.430Z| vcpu-1| Beginning monitor coredump

2013-05-03T17:27:50.305Z| vcpu-1| End monitor coredump

2013-05-03T17:27:50.306Z| vcpu-1| Dumping extended monitor data

2013-05-03T17:27:56.966Z| vcpu-1| Msg_Post: Error

2013-05-03T17:27:56.966Z| vcpu-1| [msg.log.monpanic] *** VMware ESX internal monitor error ***

2013-05-03T17:27:56.966Z| vcpu-1| --> vcpu-0:EPT misconfiguration: PA b49b405b0

2013-05-03T17:27:56.966Z| vcpu-1| [msg.log.monpanic.report] You can report this problem by selecting menu item Help > VMware on the Web > Request Support, or by going t

o "http://vmware.com/info?id=8&logFile=%2fvmfs%2fvolumes%2f51548019%2d3efd569e%2dd4d8%2d002590840e37%2fServiceVM%2fvmware%2elog&coreLocation=%2fvmfs%2fvolumes%2f5154801

9%2d3efd569e%2dd4d8%2d002590840e37%2fServiceVM%2fvmmcores%2egz". Provide the log file (/vmfs/volumes/51548019-3efd569e-d4d8-002590840e37/ServiceVM/vmware.log) and the c

ore file(s) (/vmfs/volumes/51548019-3efd569e-d4d8-002590840e37/ServiceVM/vmmcores.gz, /vmfs/volumes/51548019-3efd569e-d4d8-002590840e37/ServiceVM/vmx-zdump.000).

2013-05-03T17:27:56.966Z| vcpu-1| [msg.log.monpanic.serverdebug] If the problem is repeatable, set 'Use Debug Monitor' to 'Yes' in the 'Misc' section of the Configure V

irtual Machine Web page. Then reproduce the incident and file it according to the instructions.

2013-05-03T17:27:56.966Z| vcpu-1| [msg.log.monpanic.vmSupport.vmx86] To collect data to submit to VMware support, run "vm-support".

2013-05-03T17:27:56.966Z| vcpu-1| [msg.log.monpanic.entitlement] We will respond on the basis of your support entitlement.

2013-05-03T17:27:56.966Z| vcpu-1| [msg.log.monpanic.finish] We appreciate your feedback,

2013-05-03T17:27:56.966Z| vcpu-1| --> -- the VMware ESX team.

2013-05-03T17:27:56.966Z| vcpu-1| ----------------------------------------

2013-05-03T17:27:57.894Z| vmx| GuestRpcSendTimedOut: message to toolbox timed out.

vmkwarning.log:2013-05-03T17:27:59.813Z cpu5:4151)WARNING: PFrame: vm 1543881: 1514: Deallocating pinned pgNum 0x0, pinCount 1 throttle 0.

vobd.log:2013-05-03T17:27:56.966Z: [UserWorldCorrelator] 1984725641586us: [vob.uw.core.dumped] /bin/vmx(1543893) /vmfs/volumes/51548019-3efd569e-d4d8-002590840e37/ServiceVM/vmx-zdump.000

vobd.log:2013-05-03T17:27:56.966Z: [UserWorldCorrelator] 1984705573722us: [esx.problem.application.core.dumped] An application (/bin/vmx) running on ESXi host has crashed (1 time(s) so far). A core file may have been created at /vmfs/volumes/51548019-3efd569e-d4d8-002590840e37/ServiceVM/vmx-zdump.000.

Seems CVM crash

-rw-r--r-- 1 root root 16276242 May 3 17:27 vmmcores-1.gz

-r-------- 1 root root 5394432 May 3 17:27 vmx-zdump.000

-rw-r--r-- 1 root root 240217 May 3 17:28 vmware-5.log

-rw------- 1 root root 54525952 May 4 02:34 vmx-ServiceVM-830969

Here is the workaround from VMware support:
1. Can you let me know what .vmx changes needs to be done to reduce the
occurrence of the bug?
- change the MMU setting to software virtualization as shown in KB Article: 1036775
- another possible workaround is to change (or add) the following line in the .vmx file:
monitor_control.disable_mmu_largepages=TRUE

2. Can you give us the instructions to install the patch and also the
debug patch ? We will test it internally before giving to the customer ?
- we have submitted your request to engineering
- engineering will investigate that the issue in your case to see if it exactly matches the particular issue being investigated
- they will let us know if your case qualifies for a debug patch

3. Is the fix in ESXi 5.0 U3 or 5.1 U3. ?
- actually, there is no fix at this time
- the issue is still under investigation

4. does increasing the VM memory help reduce the occurrence of it ? I see
from /proc/meminfo (grep AS) about 9G is being used of 12 G in the guest VM
? May be increasing VM memory to 16G might reduce the occurrence of it.
- No, decreasing the memory for this (and other) virtual machines might reduce the occurrence of the issue.
- see KB article 1021896 for details

5. however the information requested is published in KB Article: 2040519

Thursday, May 2, 2013

Troubleshooting bad hard disk on Nutanix

stargate marks a disk offline if any command to it takes more than 20 seconds.

Reasons that a disk could take more than 20 seconds to respond

1. bad sectors - sudo smartctl -a /dev/sdX1 -T permissive
2. bad adapter - sudo smartctl -l sataphy /dev/sdc1
3. bad chassis
4. disk slow ( iostat from sysstats - data/logs/sysstat)
5. stargate.INFO will show the disk offline and gdb the stargate core file - stuck on fdatasync and pwrite
6. Mark the disk offline to see if goes offline again.

The following solution details, walks through the commands.

Solution

Here is an example of the outputs

1. io-stat: ( you can graph using Nagios- zk_leader:7777 or sysstats directory in /home/nutanix/data/logs)

IO on this disk was stuck for more than 20s which caused stargate to mark it offline. There were a bunch of writes ops in progress which were stuck on the pwrite/fdatasync system calls on this disk.

#TIMESTAMP 1366215700 : 04/17/2013 09:21:40 AM

Linux 2.6.35-30-server (NTNX-Ctrl-VM-3-NTNX) 04/17/2013 _x86_64_ (8 CPU)

avg-cpu: %user %nice %system %iowait %steal %idle

8.10 0.95 6.14 45.96 0.00 38.85

Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util

scd0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

sda 0.00 9.00 0.00 8.40 0.00 0.07 16.57 0.00 0.24 0.24 0.20

sdb 0.00 100.80 4.60 10.80 0.19 0.43 82.29 0.04 2.60 2.21 3.40

sdc 0.00 0.00 0.00 1.40 0.00 0.70 1024.00 13.82 10674.29 714.29 100.00

sdd 0.00 1618.60 4.60 15.20 0.16 6.38 676.77 3.04 153.43 10.91 21.60

2. smartctrl -l sataphy output: check if it increments.

ATA Phy Event Counters (GP Log 0x11)

ID Size Value Description

0x000a 2 83 Device-to-host register FISes sent due to a COMRESET

0x0001 2 38 Command failed due to ICRC error

0x0003 2 0 R_ERR response for device-to-host data FIS

0x0004 2 38 R_ERR response for host-to-device data FIS

0x0007 2 2 R_ERR response for host-to-device non-data FIS

3. lspci |grep -i sata: to find the type of controller

00:00:1f.2 Mass storage controller: Intel Corporation ICH10 6 port SATA AHCI Controller [vmhba0]

4. ncli disk ls - offline disks to see offline disks

Disk ID : 5946892

Storage Tier : DAS-SATA

Host Name : 10.30.1.212

Mount Path : /home/nutanix/data/stargate-storage/disks/9XG2WKG1

Online : false

Location : 6

5. stargate.ERROR:

E0417 09:21:49.258476 5468 extent_store.cc:546] notification=PathOffline mount_path=/home/nutanix/data/stargate-storage/disks/9XG2WKG1 ip_address=10.30.1.217 service_vm_id=17

F0417 09:21:49.263244 5468 extent_store.cc:667] Mount path /home/nutanix/data/stargate-storage/disks/9XG2WKG1 with disk id 5946892 marked offline

6. dmesg or sudo cat /var/log/messages on CVM

7. zeus_config_printer:

{

"cluster_id": 1081,

"disk_id": 5946892,

"disk_size": 925779999948,

"storage_tier": "DAS-SATA"

}

8. smartctl on disks

root@NTGTDC-Ctrl-VM-3:10.30.1.217:/var/log# smartctl -a /dev/sdc1 -T permissive

smartctl 5.43 2012-06-30 r3573 [x86_64-linux-2.6.35-30-server] (local build)

Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===

Serial Number: 9XG2WKG1

LU WWN Device Id: 5 000c50 04e7f347e

Firmware Version: SN02 <<<< SN03 is better.

User Capacity: 1,000,204,886,016 bytes [1.00 TB]

Sector Size: 512 bytes logical/physical

Device is: Not in smartctl database [for details use: -P showall]

=== START OF READ SMART DATA SECTION ===

SMART STATUS RETURN: incomplete response, ATA output registers missing

General SMART Values:

Offline data collection status: (0x82) Offline data collection activity

was completed without error.

Auto Offline Data Collection: Enabled.

Self-test execution status: ( 0) The previous self-test routine completed

without error or no self-test has ever

been run.

Total time to complete Offline

data collection: ( 642) seconds.

Offline data collection

capabilities: (0x7b) SMART execute Offline immediate.

Auto Offline data collection on/off support.

Suspend Offline collection upon new

command.

Offline surface scan supported.

Self-test supported.

Conveyance Self-test supported.

Selective Self-test supported.

SMART capabilities: (0x0003) Saves SMART data before entering

power-saving mode.

Supports SMART auto save timer.

Error logging capability: (0x01) Error logging supported.

General Purpose Logging supported.

Short self-test routine

recommended polling time: ( 1) minutes.

Extended self-test routine

recommended polling time: ( 201) minutes.

Conveyance self-test routine

recommended polling time: ( 2) minutes.

SCT capabilities: (0x10bd) SCT Status supported.

SCT Error Recovery Control supported.

SCT Feature Control supported.

SCT Data Table supported.

SMART Attributes Data Structure revision number: 10

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE

1 Raw_Read_Error_Rate 0x000f 079 069 044 Pre-fail Always - 98874921

3 Spin_Up_Time 0x0003 095 094 000 Pre-fail Always - 0

4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 25

5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0

7 Seek_Error_Rate 0x000f 100 253 030 Pre-fail Always - 420367

9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 516

10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0

12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 23

184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0

187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0

188 Command_Timeout 0x0032 098 076 000 Old_age Always - 26

189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0

190 Airflow_Temperature_Cel 0x0022 076 060 045 Old_age Always - 24 (Min/Max 20/29)

191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0

192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 19

193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 600

194 Temperature_Celsius 0x0022 024 040 000 Old_age Always - 24 (0 17 0 0 0)

195 Hardware_ECC_Recovered 0x001a 115 100 000 Old_age Always - 98874921

197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0

198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0

199 UDMA_CRC_Error_Count 0x003e 200 199 000 Old_age Always - 38

SMART Error Log Version: 1

No Errors Logged

SMART Self-test log structure revision number 1

No self-tests have been logged. [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1

SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS

1 0 0 Not_testing

2 0 0 Not_testing

3 0 0 Not_testing

4 0 0 Not_testing

5 0 0 Not_testing

Selective self-test flags (0x0):

After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.

9. gdb on core file: (stack trace output)vim stargate.core.5425.6.20130417-092157.stack_trace.txt.gz - stuck on fdatasync or pwrite.

Thread 78 (Thread 5536):

#0 0x00007f8fd985a027 in fdatasync () from /home/nutanix/toolchain/x86_64-unknown-linux-gnu/1.3/lib/libc.so.6

Thread 77 (Thread 5558):

#0 0x00007f8fd9859868 in pwritev (fd=72, vector=0x3489f460, count=7, offset=15323) at ../sysdeps/unix/sysv/linux/pwritev.c:67

10. ncli disk mark-online id=5946892

11. Commands to check the status disks from ESXi (on NX2400 ) -- esxcfg-scsidevs -A, lsscsi (centos)

12. sudo hdparm --verbose -W0 /dev/sdf - if results in the errors, fix it by rebooting ESXi

13. sudo cluster/bin/repartition_disks -d /dev/sdX

sudo cluster/bin/clean_disks -p /dev/sdX1
sudo cluster/bin/mount_disks
genesis restart

14 udevadm info -q all -n /dev/sdf - verify you get all the info similar to working disk

15. dd if=/dev/sdf of=/dev/null bs=1024 count=5 (read from disk)

IPMItool

to get the current IPaddress:

ESXi#/ipmitool lan print 1

~ # /ipmitool lan print 1| egrep "IP Address|Subnet|Gateway IP|VLAN"
IP Address Source : Static Address
IP Address : 112.18.12.167
Subnet Mask : 255.255.254.0
Default Gateway IP : 112.18.12.1
Backup Gateway IP : 0.0.0.0
802.1q VLAN ID : Disabled
802.1q VLAN Priority : 0

~ # /ipmitool lan set 1 ipsrc static

~ # /ipmitool lan set 1 netmask 255.255.248.0

Setting LAN Subnet Mask to 255.255.248.0

~ # /ipmitool lan set 1 ipaddr 101.23.40.47
~ # /ipmitool lan set 1 defgw ipaddr 101.23.40.1

Locator LED :
/ipmitool chassis identify force

Set the locator LED off

/ipmitool chassis identify 0

Set the locator LED upto 255 seconds

/ipmitool chassis identify 255

SEL events:

/ipmitool sel list
/ipmitool -v sel list ( more verbose)

sensor readings

/ipmitool sensor list

Reset the BMC:

/ipmitool mc reset cold

/ipmitool raw 0x06 0x02

To find the serial number of the system:

/ipmitool fru list

To reset the server

/ipmitool chassis power off/off/cycle/reset
/ipmitool chassis power diag -to send NMI to ESXi(server) - to create purple screen of death or bsod in MS

IPMI policy when the power is restored:

~ # /ipmitool chassis policy

chassis policy <state>

list : return supported policies

always-on : turn on when power is restored

previous : return to previous state when power is restored

always-off : stay off after power is restored

IPMI network connectivity:

ipmitool –H <target node IPMI IP> –U admin –P admin raw 0x0c 0x01 0x01 0xff 0x01 : Dedicate-NIC

ipmitool –H <target node IPMI IP> –U admin –P admin raw 0x0c 0x01 0x01 0xff 0x00 : Shared-NIC

Command change ipmi port from “Dedicated port” to “Shared port”

Remote

#ipmitool –H <target node ipmi IP> –U admin –P admin raw 0x0c 0x01 0x01 0xff 0 0 00

IPMI Power supply commands
/ipmitool -v sdr type "Power Supply"

~ # /ipmitool raw 0x06 0x52 0x07 0x70 0x01 0x0c - top power supply good
01
~ # /ipmitool raw 0x06 0x52 0x07 0x72 0x01 0x0c -bottom power supply bad.
00