Wednesday, June 26, 2013

How to create a VM on Nutanix Cluster running KVM ?

I will go in depth on configuration modification done for running Nutanix cluster on KVM later.
(if you are interested, i like this doc

KVM Architecture Overview - Google Drive )

For now, let us take a Nutanix cluster running KVM and create VM on it.


[root@NTNX-12AM2K480036-A Create]# lsb_release   - vmware -v
LSB Version:    :base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch

[root@NTNX-12AM2K480036-A Create]# virsh nodeinfo
CPU model:           x86_64
CPU(s):              24
CPU frequency:       1600 MHz
CPU socket(s):       1
Core(s) per socket:  6
Thread(s) per core:  2
NUMA cell(s):        2
Memory size:         49486468 KiB

virsh sysinfo (smbiosDump) - dmidecode on linux works as well.
<sysinfo type='smbios'>
  <bios>
    <entry name='vendor'>American Megatrends Inc.</entry>
    <entry name='version'>2.1b      </entry>
    <entry name='date'>10/28/2011</entry>
    <entry name='release'>8.16</entry>
  </bios>

Step 1.

- Login to nutanix Controller VM and Create Iscsi disk on Nutanix Container ( ncli ctr ls)
a. ncli vdisk create name=kvm-training-disk9 ctr-name=xyz  max-capacity=16

ncli> vdisk ls names=kvm-training-disk9
    Name                      : kvm-training-disk9
    Container ID              : 779
    Max Capacity              : 16 GB (17,179,869,184 bytes)
    ISCSI Target              : iqn.2010-06.com.nutanix:kvm-training-disk9-e3878625
    ISCSI LUN                 : 0









b. On KVM terminal - Verify that you are able to see the iscsi targets.

[root@NTNX-12AM2K480036-A ~]# sudo  iscsiadm -m discovery -t sendtargets -p 192.168.5.2:3260|egrep "iso|disk9" ( esxcfg-scsidevs -m)
192.168.5.2:3260,1 iqn.2010-06.com.nutanix:gasmith-training-cdrom-centos-6.4-x86_64-bin-dvd1.iso-bca6c6aa
192.168.5.2:3260,1 iqn.2010-06.com.nutanix:CentOS-6.4-x86_64-bin-DVD1.iso-c0e9bd87
192.168.5.2:3260,1 iqn.2010-06.com.nutanix:kvm-training-disk9-e3878625


c. Define the pool - same as creating datastore vmkfstools -C

 virsh pool-define-as --name kvm-training-disk9 --type iscsi --source-host 192.168.5.2 \
--source-dev iqn.2010-06.com.nutanix:kvm-training-disk9-e3878625 \
--target /dev/disk/by-path
Pool kvm-training-disk9 defined

where name can be specific to VM name,  192.168.5.2 is internal CVM IP, source dev is iqn name
of the iscsi lun, and  have it defined in /disk/by-path.

[root@NTNX-12AM2K480036-A by-path]# cd /dev/disk/by-path
root@NTNX-12AM2K480036-A by-path]# ls
ip-192.168.5.2:3260-iscsi-iqn.2010-06.com.nutanix:CentOS-6.4-x86_64-bin-DVD1.iso-c0e9bd87-lun-0
ip-192.168.5.2:3260-iscsi-iqn.2010-06.com.nutanix:kvm-training-disk9-e3878625-lun-0

d. Activate the pool

[root@NTNX-12AM2K480036-A ~]# virsh pool-list
Name                 State      Autostart
-----------------------------------------
CentOS-6.4.iso       active     no
default              active     yes
(it shows only active pools)


[root@NTNX-12AM2K480036-A ~]# virsh pool-list --all -- shows all the pool (esxcfg-scsidevs )
Name                 State      Autostart
-----------------------------------------
CentOS-6.4.iso       active     no
default              active     yes
kvm-training-disk9   inactive   no


Activate the pool
virsh # pool-start kvm-training-disk9
Pool kvm-training-disk9 started

Autostart the pool if there is a reboot
virsh # pool-autostart kvm-training-disk9
Pool kvm-training-disk9 marked as autostarted

virsh # pool-list ( esxcfg-scsidevs -m)
Name                 State      Autostart
-----------------------------------------
CentOS-6.4.iso       active     no
default              active     yes
kvm-training-disk9   active     yes

Verify the config
[root@NTNX-12AM2K480036-A ~]# virsh pool-dumpxml kvm-training-disk9
<pool type='iscsi'>
  <name>kvm-training-disk9</name>
  <uuid>3e42d29d-9037-1faa-12e1-af450904b5ab</uuid>
  <capacity unit='bytes'>17179869184</capacity>
  <allocation unit='bytes'>17179869184</allocation>
  <available unit='bytes'>0</available>
  <source>
    <host name='192.168.5.2'/>
    <device path='iqn.2010-06.com.nutanix:kvm-training-disk9-e3878625'/>
  </source>
  <target>
    <path>/dev/disk/by-path</path>
    <permissions>
      <mode>0755</mode>
      <owner>-1</owner>
      <group>-1</group>
    </permissions>
  </target>
</pool>

List the volume

[root@NTNX-12AM2K480036-A ~]# virsh vol-list --pool kvm-training-disk9 (esxcfg-scsidevs -m)

Name                 Path
-----------------------------------------
unit:0:0:0           /dev/disk/by-path/ip-192.168.5.2:3260-iscsi-iqn.2010-06.com.nutanix:kvm-training-disk9-e3878625-lun-0


[root@NTNX-12AM2K480036-A ~]# virsh vol-info --pool kvm-training-disk9 unit:0:0:0
Name:           unit:0:0:0
Type:           block
Capacity:       16.00 GiB
Allocation:     16.00 GiB


Create the VM with following config
cat ~/KVM/Create/disk9 ( chmod +x)

#!/usr/bin/env bash

virt-install \
  --description "CentOS 6.4 - minimal desktop" \
  --connect qemu:///system \
  --name kvm-training9 \
  --disk vol=kvm-training-disk9/unit:0:0:0,format=raw,cache=none,io=native,bus=virtio \
  --ram 1024 \
  --vcpu 1 \
  --graphics vmc,port=5905,listen=0.0.0.0 \
  --os-type linux \
  --os-variant rhel6 \
  --disk vol=CentOS-6.4.iso/unit:0:0:0,format=raw,io=native,bus=ide,device=cdrom \
  --noautoconsole \
  --wait 0  --network network=VM-Network,model=virtio \
  --force


run ~/KVM/Create/disk9

[root@NTNX-12AM2K480036-A by-path]# virsh list ( similar to vim-cmd vmsvc/getallvms or vm-support -V esxcli vm  process list)
 Id    Name                           State
----------------------------------------------------
 1     NTNX-12AM2K480036-A-CVM        running

52    kvm-training9                  running

virsh # dumpxml 52  ---- like vmx file <domain type='kvm' id='52'>
  <name>kvm-training9</name>
  <uuid>81f4f17f-b9e8-d533-1b89-6295c5ff6048</uuid>
  <description>CentOS 6.4 - minimal desktop</description>
  <memory unit='KiB'>1048576</memory>
  <currentMemory unit='KiB'>1048576</currentMemory>
  <vcpu placement='static'>1</vcpu>
  <os>
    <type arch='x86_64' machine='rhel6.4.0'>hvm</type>
    <boot dev='hd'/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <pae/>
  </features>
  <clock offset='utc'/>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/libexec/qemu-kvm</emulator>
    <disk type='block' device='disk'>
      <driver name='qemu' type='raw' cache='none' io='native'/>
      <source dev='/dev/disk/by-path/ip-192.168.5.2:3260-iscsi-iqn.2010-06.com.nutanix:kvm-training-disk9-e3878625-lun-0'/>
      <target dev='vda' bus='virtio'/>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </disk>
    <disk type='block' device='cdrom'>
      <driver name='qemu' type='raw' io='native'/>
      <source dev='/dev/disk/by-path/ip-192.168.5.2:3260-iscsi-iqn.2010-06.com.nutanix:CentOS-6.4-x86_64-bin-DVD1.iso-c0e9bd87-lun-0'/>
      <target dev='hdc' bus='ide'/>
      <readonly/>
      <alias name='ide0-1-0'/>
      <address type='drive' controller='0' bus='1' target='0' unit='0'/>
    </disk>
    <controller type='usb' index='0'>
      <alias name='usb0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>
    </controller>
    <controller type='ide' index='0'>
      <alias name='ide0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
    </controller>
    <interface type='network'>
      <mac address='52:54:00:3e:4c:f5'/>
      <source network='VM-Network'/>
      <target dev='vnet6'/>
      <model type='virtio'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>
    <serial type='pty'>
      <source path='/dev/pts/6'/>
      <target port='0'/>
      <alias name='serial0'/>
    </serial>
    <console type='pty' tty='/dev/pts/6'>
      <source path='/dev/pts/6'/>
      <target type='serial' port='0'/>
      <alias name='serial0'/>
    </console>
    <input type='tablet' bus='usb'>
      <alias name='input0'/>
    </input>
    <input type='mouse' bus='ps2'/>
    <graphics type='vnc' port='5909' autoport='no' listen='0.0.0.0'>
      <listen type='address' address='0.0.0.0'/>
    </graphics>
    <video>
      <model type='cirrus' vram='9216' heads='1'/>
      <alias name='video0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </video>
    <memballoon model='virtio'>
      <alias name='balloon0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </memballoon>
  </devices>
  <seclabel type='dynamic' model='selinux' relabel='yes'>
    <label>unconfined_u:system_r:svirt_t:s0:c399,c943</label>
    <imagelabel>unconfined_u:object_r:svirt_image_t:s0:c399,c943</imagelabel>
  </seclabel>
</domain>


 [root@NTNX-12AM2K480036-A Create]# ps -ef|grep qemu |grep training9- similar to vmx module in vmware
qemu     14276     1  0 15:44 ?        00:00:42 /usr/libexec/qemu-kvm -name kvm-training9 -S -M rhel6.4.0 -enable-kvm -m 1024 -smp 1,sockets=1,cores=1,threads=1 -uuid 81f4f17f-b9e8-d533-1b89-6295c5ff6048 -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/kvm-training9.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/dev/disk/by-path/ip-192.168.5.2:3260-iscsi-iqn.2010-06.com.nutanix:kvm-training-disk9-e3878625-lun-0,if=none,id=drive-virtio-disk0,format=raw,cache=none,aio=native -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive file=/dev/disk/by-path/ip-192.168.5.2:3260-iscsi-iqn.2010-06.com.nutanix:CentOS-6.4-x86_64-bin-DVD1.iso-c0e9bd87-lun-0,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw,aio=native -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,fd=36,id=hostnet0,vhost=on,vhostfd=39 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:3e:4c:f5,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0 -vnc 0.0.0.0:9 -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5


virt-top (esxtop) -1,2,3
 virt-top 17:40:34 - x86_64 24/24CPU 1600MHz 48326MB
9 domains, 8 active, 8 running, 0 sleeping, 0 paused, 1 inactive D:0 O:0 X:0
CPU: 1.0%  Mem: 22008 MB (22008 MB by guests)

   ID S RXBY TXBY RXPK TXPK DOMAIN       INTERFACE
    1 R  23K  25K   96   89 NTNX-12AM2K4 vnet0
   43 R  723    0    9    0 gasmith-trai vnet2
    52 R    0    0    0    0 kvm-training vnet6

We have virt_install in Nutanix CVM to automate these steps ( create iscsi disk, create pool and install VM)


 .CVM:10.3.202.19:~/nutanix_kvm/bin$ ./virt_install --cdrom /ImageStore/win7.iso --disk 128 --nic VM-Network --vnc_port 5999 --os_type windows --os_variant win7 --name kvm-testing-win27


2013-06-27 11:21:43 INFO batch_worker.py:190 Preparing nutanix disks: 0%
2013-06-27 11:21:46 INFO batch_worker.py:190 Preparing nutanix disks: 50%
2013-06-27 11:21:46 INFO batch_worker.py:190 Preparing nutanix disks: 100%
2013-06-27 11:21:46 INFO batch_worker.py:190 Creating libvirt storage pools: 0%
2013-06-27 11:21:50 INFO batch_worker.py:190 Creating libvirt storage pools: 50%
2013-06-27 11:21:52 INFO batch_worker.py:190 Creating libvirt storage pools: 100%
2013-06-27 11:21:52 INFO kvm_domain_template.py:156 Running virt-install

 ( connect to VNC -:99 disable Adapt and max quality in vnc viewer)
Connect to the console and install the CentOS.(virt-manager)

virsh # list --all (vmsvc/getallvms)
 Id    Name                           State
----------------------------------------------------
 1     NTNX-12AM2K480036-C-CVM        running
 40    kvm-training03                 running
 41    kvm-training6                  running
 42    kvm-training4                  running
 47    kvm-testing-win21              running
 48    kvm-testing-win24              running
 -     kvm-testing-win99              shut off



virsh # start kvm-testing-win99  -- vim-cmd vmsvc/power.on
Domain kvm-testing-win99 started









Tuesday, June 18, 2013

Curator does thankless job of keeping Nutanix cluster Clean and Lean.

Curator uses map reduce logic to clean up deleted vdisks, containers and update reference count.
It monitors under-replicated or over-replicated extent groups , redistribute extent groups for node and block awareness. Based on upper and lower threshold, it migrates extent groups between the tiers based on "hotness" of the data.  Partial scan is initiated every 30 minutes if there is "to Remove", "ILM needed" or "Diskspace utilization". Full scan initiated every 6 hours does additional function of updating ref. counts. For ILM and cleaning extent groups, curator finds those extent groups informs stargate to do the actual job and chronos acts as admission control on how many of these jobs are forwarded to stargate.

BTW, curator does these in the background and nutanix cluster has optimum gflags and
settings for the curator. With every release of the nutanix software, more of these configs
will be tuned automatically based on workload.

Gflags for configuring curator: ( if you need to change gflags, please contact nutanix
support or sales team).
1. Lower threshold is configured via
ncli sp edit ilm-thresh ( default 70)
2.upper ilm threshold -  --curator_tier_usage_ilm_threshold_percent 
3. how much to migrate between tiers upto lower threshold -
curator_tier_free_up_percent_by_ilm

4.  how often chronos asks stargate to work on curator jobs: chronos_master_handshake_period_msecs
5. 
--curator_next_tier_usage_ilm_threshold_percent=95 (default 90) -- migrate to next tier only if next tier has so much free space.
6. -curator_full_scan_period_secs  - how often full scan is run
7. --chronos_master_node_max_active_requests - # number of requests sent to stargate at every handshake.


How to manually run a full scan:

for svm in `svmips`; do wget -O - "http://$svm:2010/master/api/client/StartCuratorTasks?task_type=2"; done


Here is an example of hot tier usage and ILM migration activity if these params are set aggressively and this
could cause unnecessary network traffic and I/O activities. As noticed in the following figure, that before Apr17th,
there were a lot of migrate activities from the SSD tier. This picture is plots the usage of SSD tier.


How to check how much of your data was accessed in last 30 minutes in any tier ?
heat-map-analysis

 
How to find when curator full scan and partial scan was run. ?

Curator Jobs

Job idExecution idJob nameStatusReasonsZeus config validStart timeEnd timeTotal time (secs)
165656Partial ScanSucceededILMYesTue Jun 18 10:34:18 2013Tue Jun 18 10:39:51 2013333
165654Partial ScanSucceededILMYesTue Jun 18 10:03:48 2013Tue Jun 18 10:09:08 2013320
165652Partial ScanSucceededILMYesTue Jun 18 09:33:18 2013Tue Jun 18 09:38:51 2013333
165650Partial ScanSucceededILMYesTue Jun 18 09:03:17 2013Tue Jun 18 09:08:50 2013333
165647Partial ScanSucceededILMYesTue Jun 18 08:32:47 2013Tue Jun 18 08:38:07 2013320
065642Full ScanSucceededILM ToRemoveYesTue Jun 18 08:02:17 2013Tue Jun 18 08:17:50 2013933
165640Partial ScanSucceededPeriodicYesTue Jun 18 07:50:28 2013Tue Jun 18 07:55:49 2013321

Tier Usage:

Storage Pool: NTNX-SP1 ILM Down Migrate threshold: 85

Tier NameTier UsageTier SizeTier Usage Pct
SSD-PCIe1355.50 GB1481.57 GB91%
SSD-SATAN/AN/AN/A
DAS-SATA15362.15 GB51371.99 GB29%

Are all Nodes balanced disk usage ?

Storage Pool: NTNX-SP1 Tier: SSD-PCIe

Mean Usage Pct92%
Zone of Balance85% - 99%
Usage Spread Pct8%
StatusBalanced
Rack IdService VMDisk IdDisk UsageDisk SizeDisk Usage PctInside Zone of Balance
453898548849049320988.31 GB93.09 GB94%Yes
453898548223588.48 GB93.09 GB95%Yes
453898548234888.02 GB93.09 GB94%Yes
453898548285987.77 GB93.09 GB94%Yes
4538985526739568233697919588.48 GB93.09 GB95%Yes
4538985526739568433697917488.61 GB93.09 GB95%Yes
4538985526739568633697921188.58 GB93.09 GB95%Yes
4538985526739568833697918488.41 GB93.09 GB94%Yes
490725470490725463490725475163.83 GB184.21 GB88%Yes
490725470490725465490725476160.78 GB184.21 GB87%Yes
490725470490725467490725477163.83 GB184.21 GB88%Yes
490725470490725471490725478160.41 GB184.21 GB87%Yes

What are the activities done during last patial scan ?
MapReduce job 65657
Job id65657
Job namePartialScan MapReduce
StatusSucceeded
Map tasks done36/36
Reduce tasks done24/24
Start timeTue Jun 18 10:35:18 2013
End timeTue Jun 18 10:39:48 2013
Total time (secs)270

Map Tasks

Task idTask TypeDesired StatusStatusNode idStart timeEnd timeTotal time (secs)
0ExtentGroupIdMapTaskSucceededSucceeded472452227Tue Jun 18 10:35:18 2013Tue Jun 18 10:37:31 2013133
1ExtentGroupIdMapTaskSucceededSucceeded490493246Tue Jun 18 10:35:18 2013Tue Jun 18 10:37:37 2013139
2ExtentGroupIdMapTaskSucceededSucceeded490725549Tue Jun 18 10:35:18 2013Tue Jun 18 10:36:59 2013101
3ExtentGroupIdMapTaskSucceededSucceeded490725550Tue Jun 18 10:35:18 2013Tue Jun 18 10:36:51 201393
4ExtentGroupIdMapTaskSucceededSucceeded472452000Tue Jun 18 10:35:18 2013Tue Jun 18 10:36:17 201359
5ExtentGroupIdMapTaskSucceededSucceeded472581426Tue Jun 18 10:35:18 2013Tue Jun 18 10:37:06 2013108
6ExtentGroupIdMapTaskSucceededSucceeded472451186Tue Jun 18 10:35:18 2013Tue Jun 18 10:37:48 2013150
7ExtentGroupIdMapTaskSucceededSucceeded490725552Tue Jun 18 10:35:18 2013Tue Jun 18 10:36:14 201356
8ExtentGroupIdMapTaskSucceededSucceeded490725511Tue Jun 18 10:35:18 2013Tue Jun 18 10:36:34 201376
9ExtentGroupIdMapTaskSucceededSucceeded472451018Tue Jun 18 10:35:18 2013Tue Jun 18 10:36:53 201395
10ExtentGroupIdMapTaskSucceededSucceeded472451324Tue Jun 18 10:35:18 2013Tue Jun 18 10:36:57 201399
11ExtentGroupIdMapTaskSucceededSucceeded472337394Tue Jun 18 10:35:18 2013Tue Jun 18 10:36:34 201376
12ExtentGroupAccessDataMapTaskSucceededSucceeded472452227Tue Jun 18 10:35:18 2013Tue Jun 18 10:35:31 201313
13ExtentGroupAccessDataMapTaskSucceededSucceeded490725552Tue Jun 18 10:35:18 2013Tue Jun 18 10:36:14 201356
14ExtentGroupAccessDataMapTaskSucceededSucceeded490725549Tue Jun 18 10:35:18 2013Tue Jun 18 10:35:59 201341
15ExtentGroupAccessDataMapTaskSucceededSucceeded490725550Tue Jun 18 10:35:18 2013Tue Jun 18 10:35:51 201333
16ExtentGroupAccessDataMapTaskSucceededSucceeded472452000Tue Jun 18 10:35:18 2013Tue Jun 18 10:36:17 201359
17ExtentGroupAccessDataMapTaskSucceededSucceeded472581426Tue Jun 18 10:35:18 2013Tue Jun 18 10:36:06 201348
18ExtentGroupAccessDataMapTaskSucceededSucceeded472451186Tue Jun 18 10:35:18 2013Tue Jun 18 10:35:48 201330
19ExtentGroupAccessDataMapTaskSucceededSucceeded472452227Tue Jun 18 10:35:32 2013Tue Jun 18 10:36:31 201359
20ExtentGroupAccessDataMapTaskSucceededSucceeded490725511Tue Jun 18 10:35:18 2013Tue Jun 18 10:35:34 201316
21ExtentGroupAccessDataMapTaskSucceededSucceeded472451018Tue Jun 18 10:35:18 2013Tue Jun 18 10:35:53 201335
22ExtentGroupAccessDataMapTaskSucceededSucceeded490493246Tue Jun 18 10:35:18 2013Tue Jun 18 10:35:37 201319
23ExtentGroupAccessDataMapTaskSucceededSucceeded472337394Tue Jun 18 10:35:18 2013Tue Jun 18 10:35:34 201316
24VDiskOplogMapTaskSucceededSucceeded472451324Tue Jun 18 10:35:18 2013Tue Jun 18 10:35:57 201339
25VDiskOplogMapTaskSucceededSucceeded472337394Tue Jun 18 10:35:34 2013Tue Jun 18 10:36:34 201360
26VDiskOplogMapTaskSucceededSucceeded490725511Tue Jun 18 10:35:34 2013Tue Jun 18 10:36:34 201360
27VDiskOplogMapTaskSucceededSucceeded490493246Tue Jun 18 10:35:37 2013Tue Jun 18 10:36:37 201360
28VDiskOplogMapTaskSucceededSucceeded472451186Tue Jun 18 10:35:48 2013Tue Jun 18 10:36:48 201360
29VDiskOplogMapTaskSucceededSucceeded490725550Tue Jun 18 10:35:51 2013Tue Jun 18 10:36:51 201360
30VDiskOplogMapTaskSucceededSucceeded472451018Tue Jun 18 10:35:53 2013Tue Jun 18 10:36:53 201360
31VDiskOplogMapTaskSucceededSucceeded472451324Tue Jun 18 10:35:57 2013Tue Jun 18 10:36:57 201360
32VDiskOplogMapTaskSucceededSucceeded490725549Tue Jun 18 10:35:59 2013Tue Jun 18 10:36:59 201360
33VDiskOplogMapTaskSucceededSucceeded472581426Tue Jun 18 10:36:06 2013Tue Jun 18 10:37:06 201360
34VDiskOplogMapTaskSucceededSucceeded490725552Tue Jun 18 10:36:14 2013Tue Jun 18 10:37:14 201360
35VDiskOplogMapTaskSucceededSucceeded490725552Tue Jun 18 10:36:14 2013Tue Jun 18 10:37:14 201360

Reduce Tasks

Task idTask TypeDesired StatusStatusNode idStart timeEnd timeTotal time (secs)
0DiskIdReduceTaskSucceededSucceeded472452227Tue Jun 18 10:35:18 2013Tue Jun 18 10:39:31 2013253
1DiskIdReduceTaskSucceededSucceeded472452227Tue Jun 18 10:35:18 2013Tue Jun 18 10:39:31 2013253
2DiskIdReduceTaskSucceededSucceeded472452000Tue Jun 18 10:35:18 2013Tue Jun 18 10:39:17 2013239
3DiskIdReduceTaskSucceededSucceeded472452000Tue Jun 18 10:35:18 2013Tue Jun 18 10:39:17 2013239
4DiskIdReduceTaskSucceededSucceeded490493246Tue Jun 18 10:35:18 2013Tue Jun 18 10:39:37 2013259
5DiskIdReduceTaskSucceededSucceeded490493246Tue Jun 18 10:35:18 2013Tue Jun 18 10:39:37 2013259
6DiskIdReduceTaskSucceededSucceeded490725549Tue Jun 18 10:35:18 2013Tue Jun 18 10:38:59 2013221
7DiskIdReduceTaskSucceededSucceeded490725549Tue Jun 18 10:35:18 2013Tue Jun 18 10:38:59 2013221
8DiskIdReduceTaskSucceededSucceeded472337394Tue Jun 18 10:35:18 2013Tue Jun 18 10:39:34 2013256
9DiskIdReduceTaskSucceededSucceeded472337394Tue Jun 18 10:35:18 2013Tue Jun 18 10:39:34 2013256
10DiskIdReduceTaskSucceededSucceeded490725550Tue Jun 18 10:35:18 2013Tue Jun 18 10:38:51 2013213
11DiskIdReduceTaskSucceededSucceeded490725550Tue Jun 18 10:35:18 2013Tue Jun 18 10:38:51 2013213
12ExtentGroupIdReduceTaskSucceededSucceeded472581426Tue Jun 18 10:35:18 2013Tue Jun 18 10:39:06 2013228
13ExtentGroupIdReduceTaskSucceededSucceeded472581426Tue Jun 18 10:35:18 2013Tue Jun 18 10:39:06 2013228
14ExtentGroupIdReduceTaskSucceededSucceeded490725552Tue Jun 18 10:35:18 2013Tue Jun 18 10:39:14 2013236
15ExtentGroupIdReduceTaskSucceededSucceeded490725552Tue Jun 18 10:35:18 2013Tue Jun 18 10:39:14 2013236
16ExtentGroupIdReduceTaskSucceededSucceeded472451018Tue Jun 18 10:35:18 2013Tue Jun 18 10:38:53 2013215
17ExtentGroupIdReduceTaskSucceededSucceeded472451018Tue Jun 18 10:35:18 2013Tue Jun 18 10:38:53 2013215
18ExtentGroupIdReduceTaskSucceededSucceeded490725511Tue Jun 18 10:35:18 2013Tue Jun 18 10:39:34 2013256
19ExtentGroupIdReduceTaskSucceededSucceeded490725511Tue Jun 18 10:35:18 2013Tue Jun 18 10:39:34 2013256
20ExtentGroupIdReduceTaskSucceededSucceeded472451324Tue Jun 18 10:35:18 2013Tue Jun 18 10:38:57 2013219
21ExtentGroupIdReduceTaskSucceededSucceeded472451324Tue Jun 18 10:35:18 2013Tue Jun 18 10:38:57 2013219
22ExtentGroupIdReduceTaskSucceededSucceeded472451186Tue Jun 18 10:35:18 2013Tue Jun 18 10:39:48 2013270
23ExtentGroupIdReduceTaskSucceededSucceeded472451186Tue Jun 18 10:35:18 2013Tue Jun 18 10:39:48 2013270

Job Counters

NameValue
MapExtentGroupIdMap535252
ReduceDiskIdExtentGroupId1070510
MapExtentGroupAccessDataMap535213
NumExtentGroupsToMigrateForILM4740
NumExtentGroupsToMigrateForDiskBalancing0
MapVDiskOplogMap764
NumHostVDiskTasks3
FgHostVDiskTaskCount3
FgDeleteToRemoveOplogMapEntryTaskCount0
FgDeleteVDiskBlocksTaskCount0
MapVDiskBlockMap0
NumExtentGroupsWithReplicaOnSameNode0
NumExtentGroupsWithReplicaOnSameRack3275
NumFixExtentGroupsTasksReplicaOnSameRack23
FgDeleteExtentGroupsWithNonEidExtentsTaskCount0
NumExtentGroupsWithNonEidExtentsToDelete0
NumInvalidExtentGroupAccessDataMapEntries0
BgFixExtentGroupTaskCount5779
BgMergeExtentGroupsTaskCount0
BgCompressExtentsTaskCount0
BgDeduplicateExtentTaskCount0
BgMigrateExtentsTaskCount0
BgCopyBlockmapMetadataTaskCount0
BgUpdateRefcountsTaskCount0
InternalError0

What are the activities done during full scan ?

MapReduce job 65643
Job id65643
Job nameFullScan MapReduce #1
StatusSucceeded
Map tasks done25/25
Reduce tasks done24/24
Start timeTue Jun 18 08:03:32 2013
End timeTue Jun 18 08:06:33 2013
Total time (secs)181

Map Tasks

Task idTask TypeDesired StatusStatusNode idStart timeEnd timeTotal time (secs)
0VDiskOplogMapTaskSucceededSucceeded490725550Tue Jun 18 08:03:33 2013Tue Jun 18 08:03:50 201317
1VDiskOplogMapTaskSucceededSucceeded490725552Tue Jun 18 08:03:33 2013Tue Jun 18 08:04:13 201340
2VDiskOplogMapTaskSucceededSucceeded490725552Tue Jun 18 08:03:33 2013Tue Jun 18 08:04:13 201340
3VDiskOplogMapTaskSucceededSucceeded472451018Tue Jun 18 08:03:33 2013Tue Jun 18 08:03:52 201319
4VDiskOplogMapTaskSucceededSucceeded472451018Tue Jun 18 08:03:33 2013Tue Jun 18 08:03:52 201319
5VDiskOplogMapTaskSucceededSucceeded490725511Tue Jun 18 08:03:33 2013Tue Jun 18 08:04:33 201360
6VDiskOplogMapTaskSucceededSucceeded490725511Tue Jun 18 08:03:33 2013Tue Jun 18 08:04:33 201360
7VDiskOplogMapTaskSucceededSucceeded472451324Tue Jun 18 08:03:33 2013Tue Jun 18 08:03:56 201323
8VDiskOplogMapTaskSucceededSucceeded472451324Tue Jun 18 08:03:33 2013Tue Jun 18 08:03:56 201323
9VDiskOplogMapTaskSucceededSucceeded472451186Tue Jun 18 08:03:33 2013Tue Jun 18 08:03:47 201314
10VDiskOplogMapTaskSucceededSucceeded472451186Tue Jun 18 08:03:33 2013Tue Jun 18 08:03:47 201314
11VDiskOplogMapTaskSucceededSucceeded490493246Tue Jun 18 08:03:36 2013Tue Jun 18 08:04:36 201360
12NfsInodeMapTaskSucceededSucceeded472452227Tue Jun 18 08:03:33 2013Tue Jun 18 08:04:30 201357
13NfsInodeMapTaskSucceededSucceeded490493246Tue Jun 18 08:03:33 2013Tue Jun 18 08:03:36 20133
14NfsInodeMapTaskSucceededSucceeded472452227Tue Jun 18 08:03:33 2013Tue Jun 18 08:04:30 201357
15NfsInodeMapTaskSucceededSucceeded490725549Tue Jun 18 08:03:33 2013Tue Jun 18 08:03:58 201325
16NfsInodeMapTaskSucceededSucceeded472452000Tue Jun 18 08:03:33 2013Tue Jun 18 08:04:16 201343
17NfsInodeMapTaskSucceededSucceeded490725549Tue Jun 18 08:03:33 2013Tue Jun 18 08:03:58 201325
18NfsInodeMapTaskSucceededSucceeded472337394Tue Jun 18 08:03:33 2013Tue Jun 18 08:04:33 201360
19NfsInodeMapTaskSucceededSucceeded472581426Tue Jun 18 08:03:33 2013Tue Jun 18 08:04:04 201331
20NfsInodeMapTaskSucceededSucceeded472452000Tue Jun 18 08:03:33 2013Tue Jun 18 08:04:16 201343
21NfsInodeMapTaskSucceededSucceeded490725550Tue Jun 18 08:03:33 2013Tue Jun 18 08:03:50 201317
22NfsInodeMapTaskSucceededSucceeded490493246Tue Jun 18 08:03:33 2013Tue Jun 18 08:03:36 20133
23NfsInodeMapTaskSucceededSucceeded472337394Tue Jun 18 08:03:33 2013Tue Jun 18 08:04:33 201360
24NfsVDiskMapTaskSucceededSucceeded472581426Tue Jun 18 08:03:33 2013Tue Jun 18 08:04:04 201331

Reduce Tasks

Task idTask TypeDesired StatusStatusNode idStart timeEnd timeTotal time (secs)
0NfsInodeReduceTaskSucceededSucceeded472452227Tue Jun 18 08:03:33 2013Tue Jun 18 08:06:30 2013177
1NfsInodeReduceTaskSucceededSucceeded472452227Tue Jun 18 08:03:33 2013Tue Jun 18 08:06:30 2013177
2NfsInodeReduceTaskSucceededSucceeded472452000Tue Jun 18 08:03:33 2013Tue Jun 18 08:06:16 2013163
3NfsInodeReduceTaskSucceededSucceeded472452000Tue Jun 18 08:03:33 2013Tue Jun 18 08:06:16 2013163
4NfsInodeReduceTaskSucceededSucceeded490493246Tue Jun 18 08:03:33 2013Tue Jun 18 08:05:36 2013123
5NfsInodeReduceTaskSucceededSucceeded490493246Tue Jun 18 08:03:33 2013Tue Jun 18 08:05:36 2013123
6NfsInodeReduceTaskSucceededSucceeded490725549Tue Jun 18 08:03:33 2013Tue Jun 18 08:05:58 2013145
7NfsInodeReduceTaskSucceededSucceeded490725549Tue Jun 18 08:03:33 2013Tue Jun 18 08:05:58 2013145
8NfsInodeReduceTaskSucceededSucceeded472337394Tue Jun 18 08:03:33 2013Tue Jun 18 08:06:33 2013180
9NfsInodeReduceTaskSucceededSucceeded472337394Tue Jun 18 08:03:33 2013Tue Jun 18 08:06:33 2013180
10NfsInodeReduceTaskSucceededSucceeded490725550Tue Jun 18 08:03:33 2013Tue Jun 18 08:05:50 2013137
11NfsInodeReduceTaskSucceededSucceeded490725550Tue Jun 18 08:03:33 2013Tue Jun 18 08:05:50 2013137
12NfsDirectoryReduceTaskSucceededSucceeded472581426Tue Jun 18 08:03:33 2013Tue Jun 18 08:06:04 2013151
13NfsDirectoryReduceTaskSucceededSucceeded472581426Tue Jun 18 08:03:33 2013Tue Jun 18 08:06:04 2013151
14NfsDirectoryReduceTaskSucceededSucceeded490725552Tue Jun 18 08:03:33 2013Tue Jun 18 08:06:13 2013160
15NfsDirectoryReduceTaskSucceededSucceeded490725552Tue Jun 18 08:03:33 2013Tue Jun 18 08:06:13 2013160
16NfsDirectoryReduceTaskSucceededSucceeded472451018Tue Jun 18 08:03:33 2013Tue Jun 18 08:05:52 2013139
17NfsDirectoryReduceTaskSucceededSucceeded472451018Tue Jun 18 08:03:33 2013Tue Jun 18 08:05:52 2013139
18NfsDirectoryReduceTaskSucceededSucceeded490725511Tue Jun 18 08:03:33 2013Tue Jun 18 08:06:33 2013180
19NfsDirectoryReduceTaskSucceededSucceeded490725511Tue Jun 18 08:03:33 2013Tue Jun 18 08:06:33 2013180
20NfsDirectoryReduceTaskSucceededSucceeded472451324Tue Jun 18 08:03:33 2013Tue Jun 18 08:05:56 2013143
21NfsDirectoryReduceTaskSucceededSucceeded472451324Tue Jun 18 08:03:33 2013Tue Jun 18 08:05:56 2013143
22NfsDirectoryReduceTaskSucceededSucceeded472451186Tue Jun 18 08:03:33 2013Tue Jun 18 08:05:47 2013134
23NfsDirectoryReduceTaskSucceededSucceeded472451186Tue Jun 18 08:03:33 2013Tue Jun 18 08:05:47 2013134

Job Counters

NameValue
MapVDiskOplogMap764
FgHostVDiskTaskCount1
FgDeleteToRemoveOplogMapEntryTaskCount0
NumHostVDiskTasks1
FgAddNfsInodeContainerIdTaskCount0
NumNfsInodesUpdatedWithContainerId0
FgDeleteNfsInodesTaskCount0
NumNfsInodesDeleted0
NumNfsVDisksProcessed879
NfsReduceChildLinkCount5853
NfsReduceParentLinkCount5859
NfsReduceAttributeCount5859
NfsReduceVDiskCount879
FgFixNfsInodeLinksTaskCount0
FgFixNfsLinkAcrossContainersTaskCount0
FgFixNfsVDiskTaskCount0
FgDeleteNfsDirectoryCount0
BgFixExtentGroupTaskCount0
BgMergeExtentGroupsTaskCount0
BgCompressExtentsTaskCount0
BgDeduplicateExtentTaskCount0
BgMigrateExtentsTaskCount0
BgCopyBlockmapMetadataTaskCount0
BgUpdateRefcountsTaskCount0
InternalError0
NameIdValue
NumNfsDirectoryInodes19930
NumNfsDirectoryInodes19940
NumNfsDirectoryInodes2876983
NumNfsDirectoryInodes3369989193
NumNfsDirectoryInodes19950
NumNfsDirectoryInodes2877211
NumNfsDirectoryInodes19960
NumNfsDirectoryInodes193305844
NumNfsDirectoryInodes1933059261
NumNfsDirectoryInodes4133650
NumNfsDirectoryInodes3234873

Scripts to check Network Stats in a Nutanix Cluster.

Nutanix cluster captures sysstats every often so you can use it graph , using our Nagios tool and run scripts against it
If there is any network latency and unreachable, you use the following script:
Here is the script that checks the ping_hosts.INFO


 for i in `svmips` ; do (echo ; echo "SVM: $i" ; ssh $i cat data/logs/sysstats/ping_hosts.INFO | egrep -v "IP : time" | \
awk '/^#TIMESTAMP/ || $3>13.00 || $3=unreachable' | egrep -B1 " ms|unreachable" | egrep -v "\-\-" ); done

This will print if there is any unreachable or ping response taking more than 13 ms.


Here is another script that prints network utilization of above 1.2Gbps ( you can use Nagios to graph but
 it does not combine both Rx and Tx Bps


for i in `svmips`; do (echo CVM:$i; ssh $i cd data/logs/sysstats;cat sar.INFO |egrep "eth0"| awk '/^#TIMESTAMP/ || \
$6 > 30000 || $7 > 30000' | egrep -B1 " eth0" | awk '{print $1,$2,$6,$7,($6+$7)/1024}'|awk '$5 > 120');done

Here is the modification of above script to check Average BW during certain time: - 6pm to 12 midnight.


for i in `svmips`; do (echo CVM:$i; ssh $i cat data/logs/sysstats/sar.INFO |egrep "eth0"| awk '/^#TIMESTAMP/ || \
$6 > 30000 || $7 > 30000' | egrep -B1 " eth0" | awk '{print $1,$2,$6,$7,($6+$7)/1024}');done |\
 egrep "^06|^07|^08|^09|^10|^11"|grep PM|awk '{sum+=$5} END { print "Average = ",sum/NR}'
Or find the total number of times,network utilization crossed 2G between certain time
for i in `svmips`; do (echo CVM:$i; ssh $i cd data/logs/sysstats;cat sar.INFO |egrep "eth0"| awk '/^#TIMESTAMP/ || \
$6 > 30000 || $7 > 30000' | egrep -B1 " eth0" | awk '{print $1,$2,$6,$7,($6+$7)/1024}'|awk '$5 > 200');done|\
 grep -v CVM|wc-l 


Used this script to verify if the customer network usage dropped to 1G(between 2pm to 3pm)

for i in `svmips`; do (echo CVM:$i; ssh $i cat data/logs/sysstats/sar.INFO |egrep "eth0"| awk '/^#TIMESTAMP/ || \
$6 > 50000 || $7 > 50000' | egrep -B1 " eth0" | awk '{print $1,$2,$6,$7,($6+$7)/1024}');done | egrep "^02"|grep PM

Tuesday, June 11, 2013

Standby or unused Uplink is used after rebooting a ESXi 5.0 Update 1 host

Versions AffectedESXi 5.0; ESXi 5.0 Update 1
Description
Symptom:
Diagnostics.py sequential write performance is poor 
 and esxtop with n switch shows that 1Gbps network is used instead of 
10 Gbps.
Solution
It is due to vmware issues explained on these  KBs on ESXi 5.0 update 1:
kb2008144 
kb2030006
Workaround I: Remove 1Gbps from the vswitch configuration (validated)

esxcfg-nics -l - to find one Gig link ids ( eg, vmnic2 and vmnic3)
esxcfg-vswitch -l - to find the vswitch portgroups that use these links
esxcfg-vswitch -U vmnic2 vSwitch0 


Workaround II:
To work around this issue, try setting the NIC Failback option to yes on 
vswitch as well port group level.

TagsNetworking; VMware; Troubleshooting

Access Nutanix NFS from a different NFS client

Nutanix NFS can be exported to a non-nutanix NFS client on different subnet.

1. Whitelist NFS datastore onNutanix 

ncli> cluster add-to-nfs-whitelist  ip-subnet-masks=10.1.59.210/255.255.255.255

where 10.1.59.210 is non-Nutanix NFS client.

2. Verify that NFS datastore exported correctly - run this command on Nutanix Controller VM

showmount -e
Export list for TEST-13SM35190018-1-CVM:
/TEST-CTR1 10.3.177.28,10.3.177.27,10.3.177.26,10.3.177.25,10.1.59.210/255.255.255.255,192.168.5.0/255.255.255.128


3. Nutanix Centos is stig compliant, we have iptables to prevent accessing Nutanix CVM from another subnet. So here are the iptable rules to allow NFS access. Run these commands on Controller VM ( this is needed only if Nutanix CVM and NFS client are in
different subnets)
Open Port mapper:
for i in `svmips`; do ssh $i "sudo iptables -t filter -A WORLDLIST -p tcp -m tcp --dport 111  -j ACCEPT"; done
Open NFS/Mountd port:
for i in `svmips`; do ssh $i "sudo iptables -t filter -A WORLDLIST -p tcp -m tcp --dport 2049 -j ACCEPT"; done
Save the rules:

sudo iptables-save
 /etc/init.d/iptables save

4. Mount it on remote .210 client (NFS client) 

10.1.59.210:~$ sudo mount 10.3.177.29:/TEST-CTR1 /mnt
esxi: esxcfg-nas -a  -o 10.3.1.177.29 -s /TEST-CTR1 NTNX-Datastore

5. This KB might be useful as well

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1007352

Centos Guest VM Hanging at eth0 every alternate Boot on ESXi 5.0

Description
Symptom:
Every alternate reboot on Centos VM hangs on eth0.

Troubleshooting:
- add set -x /etc/sysconfig/network-scripts/ifup-eth to find exactly where it is hanging.
- in this case it hang at arping trying to find the duplicate IP.
 if ! /sbin/arping -q -c 2 -w 3 -D -I ${REALDEVICE} ${ipaddr[$idx]}


 
Solution
Root Cause:
Arping Uses real time instead of relative time to wait for 3 seconds , 
so if real time goes back by an hour during this 3 seconds, 
it will wait for 1 hour 3 seconds instead of 3 seconds. So the 
root cause was time difference between Centos VM and ESXi.

Workaround:

- adding 2 seconds so there is no race condition between time changes.
or
- make sure ESXi time and Centos VM time have correct time ( in one 
customer  case, they had wrong time set on Centos VM
and it was off by 2 hours, even if NTP is defined in Centos VM,
the time difference was too large for NTP ) - Most preferable.
or
- if Centos VM has to have different time than ESXi,then remove time sync
via vmware tools.
   vmware KB
TagsTroubleshooting