Thursday, May 16, 2013

Create NutanixVswitch from scratch


There have been few instances migrating to and from Distributed Vswitch can cause the nutanix v
switch to be deleted or have wrong config. This will cause Genesis to crash and not start.

From CVM: ping -I 192.168.5.254 192.168.5.1  to verify if Internal Nutanixvswitch is good.

Error Message in data/logs/genesis.out:
2012-11-06 13:57:42 ERROR node_manager.py:2378 Could not load the local ESX configuration

Sample of right config of NutanixVswitch

~ # esxcfg-vswitch -l
Switch Name      Num Ports   Used Ports  Configured Ports  MTU     Uplinks
vSwitchNutanix                   128         3           128               1500
  PortGroup Name        VLAN ID  Used Ports  Uplinks
  svm-iscsi-pg                         0        1
  vmk-svm-iscsi-pg                 0        1

Make sure these info matches.
- No uplink interfaces
- Used ports 3
- Portgroup and # of used ports
- Name of the vswitch

Sample of vmknic: ( make sure subnet mask and ip address and enabled=true matches)

~ # esxcfg-vmknic -l
Interface  Port Group/DVPort   IP Family IP Address    Netmask         Broadcast      \
 MAC Address       MTU     TSO MSS   Enabled Type               
vmk1       vmk-svm-iscsi-pg    IPv4      192.168.5.1   255.255.255.0   192.168.5.255   \
00:50:56:62:5a:e4 1500    65535     true    STATIC        

Commands to recreate the nutanix Vswitch:


1. esxcfg-vswitch -a vSwitchNutanix ( this needs to be done via CLI)

2. esxcfg-vswitch -A "svm-iscsi-pg" vSwitchNutanix

3. esxcfg-vswitch -A "vmk-svm-iscsi-pg" vSwitchNutanix
4. If vmknic does not exist,
esxcfg-vmknic -a -i 192.168.5.1 -n 255.255.255.0 -p vmk-svm-iscsi-pg
5. esxcfg-vmknic -l

6. enable the 192.168.5.1 vmknic
esxcfg-vmknic -e vmk1

7. On CVM , verify eth1 is part of svm-iscsi-pg (edit settings of CVM in Vcenter)

8.ifconfig eth1 and eth1:1 to verify if the ports are up and ip address is
configured correctly ( 192.168.5.2 and 192.168.5.254 respectively)

Monday, May 13, 2013

Guest VM crash due to vcpu-0:EPT misconfiguration


Guest VM powered off on ESXi 5.1 or 5.0 due EPT misconfiguration (vmware.log)

Snippet of the log :(vmware.log in location of .vmx file)
2013-05-03T17:27:43.262Z| vcpu-1| MONITOR PANIC: vcpu-0:EPT misconfiguration: PA b49b405b0
2013-05-03T17:27:43.262Z| vcpu-1| Core dump with build build-623860
2013-05-03T17:27:43.262Z| vcpu-1| Writing monitor corefile "/vmfs/volumes/51548019-3efd569e-d4d8-002590840e37/ServiceVM/vmmcores.gz"
2013-05-03T17:27:43.262Z| vcpu-6| Exiting vcpu-6
2013-05-03T17:27:43.262Z| vcpu-3| Exiting vcpu-3
2013-05-03T17:27:43.262Z| vcpu-7| Exiting vcpu-7
2013-05-03T17:27:43.262Z| vcpu-2| Exiting vcpu-2
2013-05-03T17:27:43.262Z| vcpu-0| Exiting vcpu-0
2013-05-03T17:27:43.262Z| vcpu-4| Exiting vcpu-4
2013-05-03T17:27:43.262Z| vcpu-5| Exiting vcpu-5
2013-05-03T17:27:43.268Z| vcpu-1| Saving anonymous memory
2013-05-03T17:27:43.280Z| vcpu-1| Dumping core for vcpu-0
2013-05-03T17:27:43.280Z| vcpu-1| CoreDump: dumping core with superuser privileges
2013-05-03T17:27:43.280Z| vcpu-1| VMK Stack for vcpu 0 is at 0x41223b241000
2013-05-03T17:27:43.280Z| vcpu-1| Beginning monitor coredump
2013-05-03T17:27:44.181Z| vcpu-1| End monitor coredump
2013-05-03T17:27:44.182Z| vcpu-1| Dumping core for vcpu-1
2013-05-03T17:27:44.182Z| vcpu-1| CoreDump: dumping core with superuser privileges
2013-05-03T17:27:44.182Z| vcpu-1| VMK Stack for vcpu 1 is at 0x41223b2c1000
2013-05-03T17:27:44.182Z| vcpu-1| Beginning monitor coredump
2013-05-03T17:27:45.062Z| vcpu-1| End monitor coredump
2013-05-03T17:27:45.063Z| vcpu-1| Dumping core for vcpu-2
2013-05-03T17:27:45.063Z| vcpu-1| CoreDump: dumping core with superuser privileges
2013-05-03T17:27:45.063Z| vcpu-1| VMK Stack for vcpu 2 is at 0x41223b301000
2013-05-03T17:27:45.063Z| vcpu-1| Beginning monitor coredump
2013-05-03T17:27:45.940Z| vcpu-1| End monitor coredump
2013-05-03T17:27:45.941Z| vcpu-1| Dumping core for vcpu-3
2013-05-03T17:27:45.941Z| vcpu-1| CoreDump: dumping core with superuser privileges
2013-05-03T17:27:45.941Z| vcpu-1| VMK Stack for vcpu 3 is at 0x41223b341000
2013-05-03T17:27:45.941Z| vcpu-1| Beginning monitor coredump
2013-05-03T17:27:46.813Z| vcpu-1| End monitor coredump
2013-05-03T17:27:46.814Z| vcpu-1| Dumping core for vcpu-4
2013-05-03T17:27:46.814Z| vcpu-1| CoreDump: dumping core with superuser privileges
2013-05-03T17:27:46.814Z| vcpu-1| VMK Stack for vcpu 4 is at 0x41223b381000
2013-05-03T17:27:46.814Z| vcpu-1| Beginning monitor coredump
2013-05-03T17:27:47.685Z| vcpu-1| End monitor coredump
2013-05-03T17:27:47.685Z| vcpu-1| Dumping core for vcpu-5
2013-05-03T17:27:47.685Z| vcpu-1| CoreDump: dumping core with superuser privileges
2013-05-03T17:27:47.685Z| vcpu-1| VMK Stack for vcpu 5 is at 0x41223b3c1000
2013-05-03T17:27:47.686Z| vcpu-1| Beginning monitor coredump
2013-05-03T17:27:48.559Z| vcpu-1| End monitor coredump
2013-05-03T17:27:48.559Z| vcpu-1| Dumping core for vcpu-6
2013-05-03T17:27:48.559Z| vcpu-1| CoreDump: dumping core with superuser privileges
2013-05-03T17:27:48.560Z| vcpu-1| VMK Stack for vcpu 6 is at 0x41223b401000
2013-05-03T17:27:48.560Z| vcpu-1| Beginning monitor coredump
2013-05-03T17:27:49.429Z| vcpu-1| End monitor coredump
2013-05-03T17:27:49.430Z| vcpu-1| Dumping core for vcpu-7
2013-05-03T17:27:49.430Z| vcpu-1| CoreDump: dumping core with superuser privileges
2013-05-03T17:27:49.430Z| vcpu-1| VMK Stack for vcpu 7 is at 0x41223b441000
2013-05-03T17:27:49.430Z| vcpu-1| Beginning monitor coredump
2013-05-03T17:27:50.305Z| vcpu-1| End monitor coredump
2013-05-03T17:27:50.306Z| vcpu-1| Dumping extended monitor data
2013-05-03T17:27:56.966Z| vcpu-1| Msg_Post: Error
2013-05-03T17:27:56.966Z| vcpu-1| [msg.log.monpanic] *** VMware ESX internal monitor error ***
2013-05-03T17:27:56.966Z| vcpu-1| --> vcpu-0:EPT misconfiguration: PA b49b405b0
2013-05-03T17:27:56.966Z| vcpu-1| [msg.log.monpanic.report] You can report this problem by selecting menu item Help > VMware on the Web > Request Support, or by going t
9%2d3efd569e%2dd4d8%2d002590840e37%2fServiceVM%2fvmmcores%2egz". Provide the log file (/vmfs/volumes/51548019-3efd569e-d4d8-002590840e37/ServiceVM/vmware.log) and the c
ore file(s) (/vmfs/volumes/51548019-3efd569e-d4d8-002590840e37/ServiceVM/vmmcores.gz, /vmfs/volumes/51548019-3efd569e-d4d8-002590840e37/ServiceVM/vmx-zdump.000).
2013-05-03T17:27:56.966Z| vcpu-1| [msg.log.monpanic.serverdebug] If the problem is repeatable, set 'Use Debug Monitor' to 'Yes' in the 'Misc' section of the Configure V
irtual Machine Web page. Then reproduce the incident and file it according to the instructions.
2013-05-03T17:27:56.966Z| vcpu-1| [msg.log.monpanic.vmSupport.vmx86] To collect data to submit to VMware support, run "vm-support".
2013-05-03T17:27:56.966Z| vcpu-1| [msg.log.monpanic.entitlement] We will respond on the basis of your support entitlement.
2013-05-03T17:27:56.966Z| vcpu-1| [msg.log.monpanic.finish] We appreciate your feedback,
2013-05-03T17:27:56.966Z| vcpu-1| -->   -- the VMware ESX team.
2013-05-03T17:27:56.966Z| vcpu-1| ----------------------------------------
2013-05-03T17:27:57.894Z| vmx| GuestRpcSendTimedOut: message to toolbox timed out.



vmkwarning.log:2013-05-03T17:27:59.813Z cpu5:4151)WARNING: PFrame: vm 1543881: 1514: Deallocating pinned pgNum 0x0, pinCount 1 throttle 0.
vobd.log:2013-05-03T17:27:56.966Z: [UserWorldCorrelator] 1984725641586us: [vob.uw.core.dumped] /bin/vmx(1543893) /vmfs/volumes/51548019-3efd569e-d4d8-002590840e37/ServiceVM/vmx-zdump.000
vobd.log:2013-05-03T17:27:56.966Z: [UserWorldCorrelator] 1984705573722us: [esx.problem.application.core.dumped] An application (/bin/vmx) running on ESXi host has crashed (1 time(s) so far). A core file may have been created at /vmfs/volumes/51548019-3efd569e-d4d8-002590840e37/ServiceVM/vmx-zdump.000.


Seems CVM crash

-rw-r--r--    1 root     root           16276242 May  3 17:27 vmmcores-1.gz
-r--------    1 root     root            5394432 May  3 17:27 vmx-zdump.000
-rw-r--r--    1 root     root             240217 May  3 17:28 vmware-5.log
-rw-------    1 root     root           54525952 May  4 02:34 vmx-ServiceVM-830969


Here is the workaround from VMware support:
1. Can you let me know what .vmx changes needs to be done to reduce the
occurrence of the bug?
- change the MMU setting to software virtualization as shown in KB Article: 1036775
- another possible workaround is to change (or add) the following line in the .vmx file:
    monitor_control.disable_mmu_largepages=TRUE

2. Can you give us the instructions to install the patch and also the
debug patch ? We will test it internally before giving to the customer ?
- we have submitted your request to engineering
- engineering will investigate that the issue in your case to see if it exactly matches the particular issue being investigated
- they will let us know if your case qualifies for a debug patch

3. Is the fix in ESXi 5.0 U3 or 5.1 U3. ?
- actually, there is no fix at this time
- the issue is still under investigation

4. does increasing the VM memory help reduce the occurrence of it ? I see
from /proc/meminfo (grep AS) about 9G is being used of 12 G in the guest VM
? May be increasing VM memory to 16G might reduce the occurrence of it.
- No, decreasing the memory for this (and other) virtual machines might reduce the occurrence of the issue.
- see KB article 1021896 for details

5.  however the information requested is published in KB Article: 2040519

Thursday, May 2, 2013

Troubleshooting bad hard disk on Nutanix

stargate marks a disk offline if any command to it  takes more than 20 seconds.

Reasons that a disk could take more than 20 seconds to respond

1. bad sectors - sudo smartctl -a /dev/sdX1 -T permissive
2. bad adapter - sudo smartctl  -l sataphy /dev/sdc1
3. bad chassis
4. disk slow ( iostat from sysstats - data/logs/sysstat)
5. stargate.INFO will show the disk offline and gdb the stargate core file - stuck on fdatasync and pwrite
6. Mark the disk offline to see if goes offline again.

The following solution details, walks through the commands.
Solution
Here is an example of the outputs

1. io-stat: ( you can graph using Nagios- zk_leader:7777 or sysstats directory in /home/nutanix/data/logs)

IO on this disk was stuck for more than 20s which caused stargate to mark it offline. There were a bunch of writes ops in progress which were stuck on the pwrite/fdatasync system calls on this disk.

#TIMESTAMP 1366215700 : 04/17/2013 09:21:40 AM
Linux 2.6.35-30-server (NTNX-Ctrl-VM-3-NTNX)    04/17/2013      _x86_64_        (8 CPU)
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           8.10    0.95    6.14   45.96    0.00   38.85
Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
scd0              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sda               0.00     9.00    0.00    8.40     0.00     0.07    16.57     0.00    0.24   0.24   0.20
sdb               0.00   100.80    4.60   10.80     0.19     0.43    82.29     0.04    2.60   2.21   3.40
sdc               0.00     0.00    0.00    1.40     0.00     0.70  1024.00    13.82 10674.29 714.29 100.00
sdd               0.00  1618.60    4.60   15.20     0.16     6.38   676.77     3.04  153.43  10.91  21.60
2. smartctrl -l sataphy output: check if it increments.

ATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x000a  2           83  Device-to-host register FISes sent due to a COMRESET
0x0001  2           38  Command failed due to ICRC error
0x0003  2            0  R_ERR response for device-to-host data FIS
0x0004  2           38  R_ERR response for host-to-device data FIS
0x0007  2            2  R_ERR response for host-to-device non-data FIS
3. lspci |grep -i sata: to find the type of controller
00:00:1f.2 Mass storage controller: Intel Corporation ICH10 6 port SATA AHCI Controller [vmhba0]

4. ncli disk ls - offline disks to see offline disks
Disk ID                   : 5946892
Storage Tier              : DAS-SATA
Host Name                 : 10.30.1.212
Mount Path                : /home/nutanix/data/stargate-storage/disks/9XG2WKG1
Online                    : false
Location                  : 6
5. stargate.ERROR:
E0417 09:21:49.258476  5468 extent_store.cc:546] notification=PathOffline mount_path=/home/nutanix/data/stargate-storage/disks/9XG2WKG1 ip_address=10.30.1.217 service_vm_id=17
F0417 09:21:49.263244  5468 extent_store.cc:667] Mount path /home/nutanix/data/stargate-storage/disks/9XG2WKG1 with disk id 5946892 marked offline
6. dmesg or sudo cat /var/log/messages on CVM
7. zeus_config_printer:
{
  "cluster_id": 1081,
  "disk_id": 5946892,
  "disk_size": 925779999948,
  "storage_tier": "DAS-SATA"
}


8. smartctl on disks
root@NTGTDC-Ctrl-VM-3:10.30.1.217:/var/log# smartctl  -a /dev/sdc1 -T permissive
smartctl 5.43 2012-06-30 r3573 [x86_64-linux-2.6.35-30-server] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF INFORMATION SECTION ===
Serial Number:    9XG2WKG1
LU WWN Device Id: 5 000c50 04e7f347e
Firmware Version: SN02  <<<< SN03 is better.
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Size:      512 bytes logical/physical
Device is:        Not in smartctl database [for details use: -P showall]
=== START OF READ SMART DATA SECTION ===
SMART STATUS RETURN: incomplete response, ATA output registers missing
General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (  642) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 201) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x10bd) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   079   069   044    Pre-fail  Always       -       98874921
  3 Spin_Up_Time            0x0003   095   094   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       25
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   100   253   030    Pre-fail  Always       -       420367
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       516
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       23
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   098   076   000    Old_age   Always       -       26
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   076   060   045    Old_age   Always       -       24 (Min/Max 20/29)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       19
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       600
194 Temperature_Celsius     0x0022   024   040   000    Old_age   Always       -       24 (0 17 0 0 0)
195 Hardware_ECC_Recovered  0x001a   115   100   000    Old_age   Always       -       98874921
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   199   000    Old_age   Always       -       38
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]
SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

9. gdb on core file: (stack trace output)
vim stargate.core.5425.6.20130417-092157.stack_trace.txt.gz - stuck on fdatasync or pwrite.

Thread 78 (Thread 5536):
#0  0x00007f8fd985a027 in fdatasync () from /home/nutanix/toolchain/x86_64-unknown-linux-gnu/1.3/lib/libc.so.6
Thread 77 (Thread 5558):
#0  0x00007f8fd9859868 in pwritev (fd=72, vector=0x3489f460, count=7, offset=15323) at ../sysdeps/unix/sysv/linux/pwritev.c:67

 10. ncli disk mark-online id=5946892
11. Commands to check the status disks from ESXi (on NX2400 ) -- esxcfg-scsidevs -A, lsscsi (centos)

12. s
udo hdparm --verbose -W0 /dev/sdf - if results in the errors, fix it by rebooting ESXi

13. 
sudo cluster/bin/repartition_disks -d /dev/sdX
sudo cluster/bin/clean_disks -p /dev/sdX1
sudo cluster/bin/mount_disks
genesis restart

14 udevadm info -q all -n /dev/sdf - verify you get all the info similar to working disk

15. 
 dd if=/dev/sdf of=/dev/null bs=1024 count=5 (read from disk)

IPMItool

to get the current IPaddress:

ESXi#/ipmitool  lan print 1
~ # /ipmitool lan print 1| egrep "IP Address|Subnet|Gateway IP|VLAN"
IP Address Source       : Static Address
IP Address              : 112.18.12.167
Subnet Mask             : 255.255.254.0
Default Gateway IP      : 112.18.12.1
Backup Gateway IP       : 0.0.0.0
802.1q VLAN ID          : Disabled
802.1q VLAN Priority    : 0


~ # /ipmitool  lan set 1 ipsrc static
~ # /ipmitool lan set 1 netmask 255.255.248.0
Setting LAN Subnet Mask to 255.255.248.0
~ # /ipmitool lan set 1 ipaddr 101.23.40.47 
~ # /ipmitool lan set 1 defgw ipaddr 101.23.40.1


 Locator LED :
/ipmitool chassis identify force

Set the locator LED off

/ipmitool chassis identify 0

Set the locator LED upto 255 seconds

 /ipmitool chassis identify 255


 SEL events:

/ipmitool sel list
/ipmitool -v sel list
 ( more verbose)

sensor readings

/ipmitool sensor list

Reset the BMC:

/ipmitool mc reset cold
/ipmitool  raw 0x06 0x02

To find the serial number of the system:

/ipmitool fru list
To reset the server

/ipmitool chassis power off/off/cycle/reset
/ipmitool chassis power diag -to send NMI to ESXi(server) - to create purple screen of death or bsod in MS


IPMI policy when the power is restored:


~ # /ipmitool chassis policy
chassis policy <state>
   list        : return supported policies
   always-on   : turn on when power is restored
   previous    : return to previous state when power is restored
   always-off  : stay off after power is restored

IPMI network connectivity:
ipmitool –H <target node IPMI IP> –U admin –P admin raw 0x0c 0x01 0x01 0xff 0x01                        : Dedicate-NIC
ipmitool –H <target node IPMI IP> –U admin –P admin raw 0x0c 0x01 0x01 0xff 0x00                        : Shared-NIC

Command change ipmi port from “Dedicated port” to “Shared port”

Remote
#ipmitool –H <target node ipmi IP> –U admin –P admin raw 0x0c 0x01 0x01 0xff 0 0 00


IPMI Power supply commands
/ipmitool -v sdr type "Power Supply"


~ # /ipmitool  raw 0x06 0x52 0x07 0x70 0x01 0x0c - top power supply good
 01
~ # /ipmitool raw 0x06 0x52 0x07 0x72 0x01 0x0c -bottom power supply bad.
 00