nutanix: 2012

Tuesday, December 11, 2012

ESXtop:
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1008205

SIOC:
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1019687

Dynamic Add node and IPV6

Nutanix IPV6 requirements

Nutanix software version 2.6 requires IPv6 Link Local
a. to discover Nutanix nodes for configuring the nodes for the first time,
b. discover during dynamic add node
c. Reconfiguring IP address.
Most switches support IPv6 neighbour discovery protocol by IPV6 link local address even if IPv6 is not enabled in the routers, so this blog explains the link local address and procedure to verify if it is enabled in the switch and verify it in the controller VM. Please note that all the controller VMs should be connected
same broadcast domain, so that IPV6 link local addresses are reachable.

Verify IPv6 connectivity from the Controller VM

nutanix@NTNX-Ctrl-VM-1-NTNX:172.16.8.84:~$ ifconfig eth0|grep inet6

inet6 addr: fe80::20c:29ff:fef2:cb25/64

nutanix@NTNX-Ctrl-VM-2-NTNX:172.16.8.85:~$ ifconfig eth0|grep inet6

inet6 addr: fe80::20c:29ff:feb0:3e61/64

nutanix@NTNX-Ctrl-VM-1-NTNX:172.16.8.84:~$ ping6 -I eth0 fe80::20c:29ff:feb0:3e61

PING fe80::20c:29ff:feb0:3e61(fe80::20c:29ff:feb0:3e61) from fe80::20c:29ff:fef2:cb25 eth0: 56 data bytes

64 bytes from fe80::20c:29ff:feb0:3e61: icmp_seq=1 ttl=64 time=18.0 ms

64 bytes from fe80::20c:29ff:feb0:3e61: icmp_seq=2 ttl=64 time=0.212 ms

64 bytes from fe80::20c:29ff:feb0:3e61: icmp_seq=3 ttl=64 time=0.180 ms

Dynamic ADD NODE:

Make sure Rackable Unit Serial is different from Existing Nodes. Newer Factory
installed systems do have different Rackable unit Serial Number.

If you see less than the number of nodes that are ready to be added to existing cluster
, probably they are not connected to same switch.

From the existing clusters' CVM:
ncli cluster discover-nodes
Cluster Id :
Hypervisor Address : 10.x.y.158
Ip : fe80::20c:29ff:feab:7822%eth0
Ipmi Address : 192.168.2.116
Node Position : A
Node Serial : 487c804a-dd23-49cc-bcc1-f8e7123dc0b3
Rackable Unit Model : NX-2000
Rackable Unit Serial : 2
Service Vm Address : 10.x.y.166
Svm Id :
Svm Version : ServiceVM-1.23_Ubuntu
Cluster Id :
Hypervisor Address : 10.14.23.161
Ip : fe80::20c:29ff:fe4d:5273%eth0
Ipmi Address : 192.x.y.119
Node Position : D
Node Serial : ac77f7cf-a248-46af-9d56-023392978bd9
Rackable Unit Model : NX-2000
Rackable Unit Serial : 2
Service Vm Address : 10.x.y.169
Svm Id :
Svm Version : ServiceVM-1.23_Ubuntu
Cluster Id :
Hypervisor Address : 10.x.y.160
Ip : fe80::20c:29ff:fea3:2642%eth0
Ipmi Address : 192.x.y.118
Node Position : C
Node Serial : b9237d59-4979-4829-9a83-dfaa64dd4b5c
Rackable Unit Model : NX-2000
Rackable Unit Serial : 2
Service Vm Address : 10.x.y.168
Svm Id :
Svm Version : ServiceVM-1.23_Ubuntu

Cluster Id :
Hypervisor Address : 10.x.y.159
Ip : fe80::20c:29ff:fee9:a13c%eth0
Ipmi Address : 192.x.y117
Node Position : B
Node Serial : 6c31971d-fe7b-43e5-979f-2953a48a9d62
Rackable Unit Model : NX-2000
Rackable Unit Serial : 2
Service Vm Address : 10.x.y.167
Svm Id :
Svm Version : ServiceVM-1.23_Ubuntu

ncli cluster add-node ncli cluster add-node node-serial=487c804a-dd23-49cc-bcc1-f8e7123dc0b3;ncli cluster add-node node-serial=6c31971d-fe7b-43e5-979f-2953a48a9d62;ncli cluster add-node node-serial=b9237d59-4979-4829-9a83-dfaa64dd4b5c;ncli cluster add-node node-serial=ac77f7cf-a248-46af-9d56-023392978bd9
Node added successfully
Node added successfully
Node added successfully

Node added successfully
ncli host list|grep "Service VM Address"
Service VM Address : 10.x.y.49
Service VM Address : 10.x.y.48
Service VM Address : 10.x.y.50
Service VM Address : 10.x.y.51
Service VM Address : 10.x.y.166
Service VM Address : 10.x.y.167
Service VM Address : 10.x.y.168
Service VM Address : 10.x.y.169
cluster status|grep CVM

CVM: 10.x.y.166 Up
CVM: 10.x.y.167 Up
CVM: 10.x.y.168 Up
CVM: 10.x.y.169 Up
CVM: 10.x.y.48 Up
CVM: 10.1x.y.49 Up
CVM: 10.x.y.50 Up
CVM: 10.x.y.51 Up, ZeusLeader
2012-10-21 20:38:10 INFO cluster:906 Success!

nodetool -h localhost ring ( one by one nodes will get added in Limbo state and they become normal) - wait for all nodes to become normal.
Address Status State Load Owns Token
pVOFDgRpkq7rwjiZf0A7PdlGDLSswKByL8RZOTKcrHowOfT5FYbhPvy7PJvJ
10.x.y3.51 Up Normal 5.2 GB 20.00% 8zNLWFUeWeHJqvTxC9Fwc0CeIXGI5Xx7LnDjM2prxJR7YmfBrU1GnbaHPDnJ
10.x.y.49 Up Normal 4.02 GB 20.00% E1bIw6wcpRQ0XIGqRXkkN1Y5Af0b9ShinS36jxJxH9r56yZMqJxPztsE3Jiz
10.x.y.50 Up Normal 2.96 GB 20.00% TWCak3rlqTeO315iAG3asx0QNlPXfLkiqZswbC91t5TrLz1hsdBRDRCSR2OK
10.x.y.166 Up Limbo 774.37 MB 20.00% eZvBW6nzS9dTKtTMw6HVJ5RVNmeijP0UI2l8OyI76MYQLPsPcOjVoJzLcndo
10.x.y.48 Up Normal 4.35 GB 20.00% pVOFDgRpkq7rwjiZf0A7PdlGDLSswKByL8RZOTKcrHowOfT5FYbhPvy7PJvJ

. nodetool -h localhost ring -- all 8 nodes in Normal mode

Connect to nutanix console

(https://CVM_IP). Edit Storage Pool - add SATA drives ( + until 5) and PCIe ( + until 1) on new nodes
Edit container and add datastore on 4 new nodes

add 4 new nodes to vcenter

Friday, June 1, 2012

Vcenter Disconnects due to APD or Firewall

When ESXi5.0 disconnects temporarily from the Vcenter, more often or not, it is related to the Storage issues and other times it could be ESXi5.0 firewall.

Storage Issues: (APD)

Symptoms:

Vcenter Disconnects and Errors in VMKernel Log:

2011-07-30T14:47:41.361Z cpu1:2642)WARNING: NMP: nmpDeviceAttemptFailover:658:Retry world failover device "naa.60a98000572d54724a34642d71325763" - failed to issue command due to Not found (APD), try again...

To prevent APD during Storage Array maintenance:

1. If you have maintenance window, on Vcenter -> ESXi host -> Configuration -> Storage - unmount the datastore (datastore tab) and then detach the device( devices tab). ( u need VMs to be powered off, no heartbeat datastore and there are quite a few prerequistes)

2. The following commands also can be executed to prevent APD, if you have too many devices to do the unmount.( depending on SCSI timeout values, it is good to follow step 1)

To enable w/o requiring downtime during storage Array maintenance, execute: ( this might help in not having to unmount/detach datastores/devices)

# esxcfg-advcfg -s 1/VMFS3/FailVolumeOpenIfAPD

Note: This command might prevent new storage devices to be discovered. So use only during maintenance.

Revert the changes back via the following command.

To check the value of this option, execute:

# esxcfg-advcfg -g /VMFS3/FailVolumeOpenIfAPD

Revert it back after the maintenance:

# esxcfg-advcfg -s 0 /VMFS3/FailVolumeOpenIfAPD

Firewall Issues Causing Vcenter Disconnects:

Symptoms : Other than vcenter disconnects

Vpxa.log (on ESXi host):

Stolen/full sync required message:
"2012-02-02T18:32:49.941Z [6101DB90 info 'Default'
opID=HB-host-56@2627-b61e8cd4-e4] [VpxaMoService::GetChangesInt]
Forcing a full host synclastSentMasterGen = 2627 MasterGenNo from vpxd
= 2618
2012-02-02T18:32:49.941Z [6101DB90 verbose 'Default'
opID=HB-host-56@2627-b61e8cd4-e4] [VpxaMoService::GetChangesInt] Vpxa
restarted or stolen by other server. Start a full sync"

Difficulty translating between host and vpxd:
2012-01-24T19:26:05.705Z [FFDACB90 warning 'Default']
[FetchQuickStats] GetTranslators -- host to vpxd translation is empty.
Dropping results
2012-01-24T19:26:05.706Z [FFDACB90 warning 'Default']
[AddEntityMetric] GetTranslators -- host to vpxd translation is empty.
Dropping results
2012-01-24T19:26:05.706Z [FFDACB90 warning 'Default']
[AddEntityMetric] GetTranslators -- host to vpxd translation is empty.
Dropping results
2012-01-24T19:26:05.706Z [FFDACB90 warning 'Default']
[AddEntityMetric] GetTranslators -- host to vpxd translation is empty.
Dropping results
2012-01-24T19:26:05.706Z [FFDACB90 warning 'Default']
[AddEntityMetric] GetTranslators -- host to vpxd translation is empty. Difficulty translating between host and vpxd:
Vpxd.log (on vCenter Server):

Timeouts/Failed to respond and Host Sync failures:
2012-01-24T18:50:15.015-08:00 [00808 error 'HttpConnectionPool']
[ConnectComplete] Connect error A connection attempt failed because
the connected party did not properly respond after a period of time,
or established connection failed be
cause connected host has failed to respond.

After trying few options , our team successfully able to avoid vcenter disconnects with:

esxcli network firewall load - if it is unloaded , then you can't enable HA, so we have to load it

esxcli network firewall set --enabled false
esxcli network firewall set --default-action true

esxcli network firewall get

rc.local: (to persist between reboots)

1) echo -e "# disable firewall service\nesxcli network firewall load\nesxcli network firewall set --enabled false\nesxcli network firewall set --default-action true\n# disable firewall service" >> /etc/rc.local

2) auto-backup.sh

Addendum:

We had a useful simple and dirty scripts to monitor dropping, as Vcenter disconnects were intermittent and will recover after couple of minutes and we can't sit infront of the computer all the day

a. cat check_vcenter_disconnect_hep.sh ( ./check_venter_disconnect_hep.sh >> public_html/disconnect.txt )
#!/bin/sh
while true
do
dddd=`date -u +%Y-%m-%dT%H`
echo $dddd
for seq in `seq 1 4`
do
echo "hep$seq"
~/scripts/hep$seq |grep $dddd
done
sleep 1800
done

b. cat hep1 ( we had hep1-hep4)
#!/usr/bin/expect
spawn ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no root@172.x.y.z
expect "word:"
send "esxi12345"
expect "#"
#send "rm /var/run/log/vpxa.\[1-9\]\r"
#expect "#"
send "gunzip /var/run/log/vpxa*.gz\r"
expect "#"
send "egrep stolen /var/run/log/vpxa*\r"
expect "#"
send "egrep -i dropping /var/run/log/vpx*\r"
expect "#"
send "egrep -i performance /var/log/vmkernel.log\r"
expect "#"
send "exit\r"

Useful Webpages:
http://www.virtualizationpractice.com/all-paths-down-16250/
PDLs/APDs http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2004684

Monday, May 21, 2012

Creating Full clones on Nutanix via NFS VAAI

Aim: to create 320 VMs on Nutanix NFS datastore and power on 320 VMs .
Guest VM size - Windows 7, 20G HDD on NFS, 2G mem with VmwareTools.
Number of ESX hosts - 4 ESX hosts ( 80 VMs per node)
Storage - Same ESX servers ( no additional hardware, other than Arista switch interconnecting these
ESX servers) - Compute and Storage convergence.

Script help and source: http://www.vmdev.info/?p=202, Tabrez and Steve Poitras

Vcenter before running the script:( except for the clone in local datastore, there are no VMs other than Nutanix controller VMs)

On Nutanix: - Create storage pool, container and NFS datastore - from a clean cluster.
a. Create Storage Pool

b. Create container

c. Create NFS datastore

ESXi :now sees the datastore

esxcfg-nas -l
NTNX-ctr1 is /ctr1 from 192.168.5.2 mounted available

Script to create the full clone of thick vdisk from 1Win7-Clone on NFS datastore :(1Win7-clone - has vmware tools and power budget disabled in Windows 7 so that it does not go into standby mode)

Connect-VIServer 10.2.8.59 -User administrator -Password ntnx
1. $vm = Get-VM 1Win7-clone |Get-View
2. $cloneFolder = $vm.parent
$cloneSpec = new-object Vmware.Vim.VirtualMachineCloneSpec
$cloneSpec.Location = new-object Vmware.Vim.VirtualMachineRelocateSpec
3. $cloneSpec.Location.DiskMoveType = [Vmware.Vim.VirtualMachineRelocateDiskMoveOptions]::moveAllDiskBackingsAndAllowSharing
4. $cloneSpec.Location.Transform = [Vmware.Vim.VirtualMachineRelocateTransformation]::flat

5.$global:testIterations = 320
for($i=1; $i -le $global:testIterations; $i++){
$cloneName = "Windows7-$i"
$vm.CloneVM( $cloneFolder, $cloneName, $cloneSpec ) }

Explanation:

1. Get-View of our clone master

2. use same datastore as clone

3. allow sharing - To create full clone by copying all disks (but not snapshot metadata), from the root to the child-most disk, except for non-child-most disks previously copied to the target

4.flat -causes it to be created as a thick disk

5. from 1 to 320 creates Windows7-$num with clone spec defined.

The following Vcenter snapshot shows that clone creation with NFS VAAI in progress and 320 VMs being created

Maintenance:
#To Remove VM
Remove-VM Windows7-*

# To Power on
Start-VM Windows7-*

#To start VMs on specific ESX server:

Get VM-Host ip| Get-VM Windows7-*| where {$_.'PowerState' -eq "PoweredOff"} | Start-VM -RunAsync -Confirm:$false

Get VM-Host ip| Get-VM Windows7-*| where {$_.'PowerState' -eq "Suspended"} | Start-VM

#Migrate VM: (DRS should do it when powering on)

$global:testIterations = 80

for($i=1; $i -le $global:testIterations; $i++){

Get-VM -Name Windows7-$i | Move-VM -Destination (Get-VMHost 10.2.8.51) -RunAsync
}$global:testIterations = 240

for($i=161; $i -le $global:testIterations; $i++){Get-VM -Name Windows7-$i | Move-VM -Destination (Get-VMHost 10.2.8.53) -RunAsync

}

$global:testIterations = 320

for($i=241; $i -le $global:testIterations; $i++){Get-VM -Name Windows7-$i | Move-VM -Destination (Get-VMHost 10.2.8.54) -RunAsync

}

# Get IP from VM to see if it is booted (vmware tools need to be installed ) -
Get-VM NTNX* |where {$_.'PowerState' -eq "PoweredOn"}| %{
write-host $_.Guest.IPAddress[0]}
10.2.8.59
10.2.8.60
10.2.8.55
10.2.8.56
10.2.8.57
10.2.8.58

to get count:

$global:count = 0
Get-VM Windows7-* |where {$_.'PowerState' -eq "PoweredOn"}| %{
$ip = $_.Guest.IPAddress[0]
if ($ip -ne " ") { write-host $ip
$global:count +=1
}}
write-host "Count of IPs is " $global:count

<snippet>

169.254.127.112

169.254.165.80
169.254.104.109
169.254.11.254
169.254.248.101
169.254.239.204
169.254.186.164
169.254.127.112
169.254.24.136
169.254.123.158
169.254.129.15
169.254.212.87
169.254.47.86
Count of VMs with ip is 320 ( monitor upto 320 is up)

Tuesday, May 8, 2012

Bringing Hadoop closer to live data!

Of late, I have been reading and listening to my collegues talk about Hadoop , MapReduce and Twitter's Spout and bolts!

Most important part of Hadoop HDFS - MapReduce is already being implemented in Nutanix. This allows us to bring live data to the Hadoop cluster in read-only mode accessing same vdisks rather than waiting for server to dispatch the data nightly. Instead of Map Reduce, we could run Spot and Bolts to map reduce continuously.

If we plan to run Hadoop Jobs nightly, we could even have an Adaptive Chameleon Compute Clusters, which at day time runs regular jobs (VDI,etc) and in night time it runs Hadoop. VMware has a lot of tools and PowerCLI commands could acheive this by powering off or changing resources to Hadoop VMs.

Just so excited to work with Nutanix and so much more we could do with decoupling from Centralized
Storage to Distributed Storage.

We are just scratching the surface. That is my assignment to read further and step into the future.

Friday, May 4, 2012

ESXi reboot reason:
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1019238

Beyond SAN and FCOE

SAN Introduction:

When Ethernet was 100 Mb/s and 10 Mb/s and Direct Attach Storage was restricting the growth and clustering of servers ( veritas clustering/sun clustering), to provide Networking of Storage Arrays and reliable transportation layer, new technology came into play (fiber channel). With reliable Fiber Channel, 10/8 bit encoding ( providing elasticity, additional frames for IDLE/K28.5), SCSI CDBs were encoded in FCF.With LOGIN mechanisms (FLOGI/PLOGI/PRLI/TPLS), FC Zoning/Port-Security/Lun masking provided the access control. FSPF/RCF/BF/DIA/Principal Switch to manage multiple SAN/FC switches.
Then came NPIV to reduce the pain of managing multiple domains, that brought the pain of one host flapping causing other host to fail (bad TCAM programming). Even after few years of SAN administration, very select group of engineers understand this complex technology.

FCOE Introduction:

While this revolution was going on, Gigabit ethernet went through its own metamorphosis into 1G/10G/40G/100G. Question came to Engineers on relevance of FC, as we can have reliable GigE via LossLess Switches with much less Latency and PAUSE frames ( to manage Congestion). Multiple vendors came up with their own method of encapsulating FC frames (CEE/DCE). Vendors started building CNA adapters, with less thought given whether FC or Ethernet should have MSI or MSI-X. Only adapter that has all
the functionalities SR_IOV/VNTAG/VIFs is Palo Adapterv(M81KR) that works only with UCS. There are two technologies VEPA/VNTAG competing on CNA and it is not interoperable. A specific FCOE switch is needed to support VEPA or VNTAG.

http://infrastructureadventures.com/2010/12/08/io-virtualization-overview-cna-sr-iov-vn-tag-and-vepa/

I feel it is force fitting FC in the GigE to extend the life of FC, so the vendors can sell expensive Storage Arrays and expensive Switches ( FCOE/Lossless/PFC).
But we don't need storage arrays or SAN switch with advent of powerful CPUs, larger hard drives, faster Solid State Devices, PCI storages where you can bring Storage back to the server.
Storage Array:
Storage Array is multiple disks formatted in vendor specific RAID with front end cache with NFS/FC/CNA adapter connected to expensive SAN switch through FC cables and hosts also have expensive adapters (FC HBAs/GigE or CNAs).
Most of the array vendors take 6 months to a year , to certify the latest HDDs, SDDs , Adapters, memory and have tedious process of upgrade the storage Array.

Nutanix Introduction:

Radical or not so Radical approach, as we have powerful CPUs, bigger network Pipe (10G/40G), with advent of faster spindle (local drives), solid state devices and PCI storage, Nutanix makes it easier to adapt newer technologies in this area without having replace your storage array.
Whenever a new CPU comes into market, HDD with faster spindle, new network adapters, we can easily replace and get better performance.

Nutanix depends on standard GigE infrastructure to make this all work, without having spend on seperate Storage Processors, Storage Array , FC/FCOE infrastructure and GigE infrastructure.

Instead of hodge podge approach of migrating SAN from FC to FCOE approach with multiple (non)standards, this approach provides future safe and leapfrogging current approach.

Conclusion:
Regular big vendors, don't want disruption rather want slow process of changing one component at a time, so that they have time to reinvent and have their customers buy over provisioned switches/servers/storage array/vendor specific HDDs which will be End of Life in few years.Customers will be stuck in reprovision, upgrade in baby steps, migrating to new storage Array/switch, learning new technologies and spending training dollars, instead of being productive with few basic technology moving parts.
Best of the FCOE technology that needs to be incorporated in future is:
vntag to create VIFs(SRIOV)/ LossLess or LessLoss Ethernet/PFC/SPMA/FPMA are awesome technologies that needs to make it to next Gen Ethernet Switches, but not VFC/FCoE/FIP.

It is time to build a futuristic datacenter with Nutanix Complete Cluster.

Tuesday, April 17, 2012

Changing router address in nutanix

On controller VMs:
1. nutanix@NTNX-Ctrl-VM-2-NTNX:/etc/network$ more interfaces (change it for startup)
gateway 10.50.80.253
2. route add default gw 10.50.80.253 ( to change it live without rebooting)
route delete default gw 10.50.80.254
3. cluster.cfg (change cluster.cfg on one and push it to others and vma)
grep gateway cluster.cfg
gateway = 10.50.80.253
4. Verify: netstat -r
5. You may have to use route delete the old address.
On VMA:
1.route add -net default gw 10.50.80.253
route delete -net default gw 10.50.80.254

2. vi-admin@pds-nxvma:/etc/sysconfig/network> cat routes
default 10.50.80.254 - -
3. Verify - netstat -r
4. cluster.cfg
5. You may have to delete the old route.
On ESXi:

/etc #

Verify:
esxcfg-route
VMkernel default gateway is 10.50.80.254
/etc # esxcfg-route -l
VMkernel Routes:
Network Netmask Gateway Interface
10.50.80.0 255.255.255.0 Local Subnet vmk0
192.168.5.0 255.255.255.0 Local Subnet vmk1
default 0.0.0.0 10.50.80.254 vmk0

VMWARE HA: isolation address
If old route is there, you may have to delete it. Vmware HA isolation IP might be using old router address, you can change it by disabling and enabling HA or changing this das.isolationaddress http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1006421

Saturday, April 14, 2012

VM manipulation

1. /vmfs/volumes/4f611c3f-ca118ee2-d237-0025904c4fce/ServiceVM-1.23_Ubuntu # vim-cmd vmsvc/getallvms

2. vimcmd-vmsvc/power.getstate (power.off, if needed)

3. /vmfs/volumes/4f611c3f-ca118ee2-d237-0025904c4fce/ServiceVM-1.23_Ubuntu # vim-cmd vmsvc/unregister 8

4. /vmfs/volumes/4f611c3f-ca118ee2-d237-0025904c4fce/ServiceVM-1.23_Ubuntu # sed -i "s/NTNX-1-local-ds-NTNX/LOCAL_USSEAVM013/g" ServiceVM-1.23_Ubuntu.vmx

5. vim-cmd solo/registervm /vmfs/volumes/4f611c3f-ca118ee2-d237-0025904c4fce/ServiceVM-1.23_Ubuntu/ServiceVM-1.23_Ubuntu.vmx

6. vim-cmd hostsvc/maintenance_mode_exit, if it is in maintenance mode.

Wednesday, April 4, 2012

vmware-esxi-configuration-backup

http://www.r71.nl/kb/technical/212-vmware-esxi-configuration-backup

The commands vicfg-cfgbackup.pl (esxcfg-cfgbackup.pl) allow you to backup and restore the configuration of your ESX 3i host. Install the RCLI on your pc, download it by clicking the link on the webpage on an ESXi host.

To backup the host you would run the command.

vicfg-cfgbackup.pl --server -s

To restore your backup configuration to your host you would run the following command. This will cause the host to reboot once the process is complete.

vicfg-cfgbackup.pl --server -l

NOTES

- While this command can be used to restore your configuration, it is not a substitute for backup of your virtual machines. At this time, a new install of ESX 3i Installable will wipe out any existing datastores that exist on the host when you perform a new install.

- You can use the -q switch to supress the confirmation request that you will receive when you restore a configuratio backup or restore a host back to factory defaults.

- You must have all VMs on the host stopped and have put the host in maintenance mode. Also the build of the install on the ESX host must match the build that is recorded in the backup file. If that is not the case, you can use the -f switch to force the restore. This document is based on the 0.26 version of the RCLI and at this point both the maintenance mode and build number requirements are not enforced.

vicfg-cfgbackup.pl --server -f -l

To reset the host backup to factory defaults, use the below command. It will put the host in maintenance mode and it will not wipe out any existing datastores. Thus you will be able to add an existing virtual machines back into inventory by either manually adding the virtual machines or by restoring a configuration backup.

vicfg-cfgbackup.pl --server -r

Friday, March 9, 2012

Persist the firewall rules

http://www.virtuallyghetto.com/2011/08/how-to-persist-configuration-changes-in.html

1. esxcli network firewall set --default-action true
2. esxcli network firewall set --enabled false
3. echo "esxcli network firewall set --default-action true" >> /etc/rc.local
4. echo "esxcli network firewall set --enabled false" >>/etc/rc.local

5. /sbin/auto-backup.sh ----- preserves changes it right away

This firewall might help, if you are having consistent Vcenter Disconnects (vpxa--
vpxd communication)

Monday, March 5, 2012

VMs with Red Exclamation and no Alerts listed

Fix:

- restart vpxa on ESXi(services.sh restart)

- disconnect the host from Vcenter and connect it back after couple minutes.

- Use Refresh via PowerCLI.

Here is the powercli script to do the reload into vcenter database without having to do the above (simple command)
Connect-VIServer <Vcenter_IP>
(Get-View -ViewType VirtualMachine) | %{$_.reload()}

Monday, February 27, 2012

Get all VMs and the MAC address, Duplicate IPs

1. ESXi with
Get-VMHostNetworkAdapter | select VMhost, Name, Mac,Ip
Get-VMHost|Get-VMHostNetworkAdapter
Get-VMHost|Get-VirtualSwitch

2. Get-VM | `Select-Object -Property Name,@{N="MacAdresses";E={$_.NetworkAdapters | ForEach-Object {$_.MacAddress}}},VMHost`

3. Connect-VIServer -Server vSphereServer

$match_address = "172......3"

Get-VM | %{
$vmIPs = $_.Guest.IPAddress
foreach($ip in $vmIPs) {
if ($ip -eq $match_address) {
"Found VM with matching address: {0}" -f $_.Name
}
}
}

4. Get-VMHost |Get-VMHostNetworkAdapter | Where-Object {$_.Mac -eq "00:25:90:2a:6a:a2"}

5.Get-VM -Location Proteus-Cluster | where {$_.'PowerState' -eq "PoweredOn"} | `Select-Object VMHost,Name`

( Verified the number of powered on VMs are the same - 41)
172.16.13.1 Jenkins-Dogfood-Proteus
172.16.13.1 build-amazon-vm-0
172.16.13.1 steve-windows

Thursday, February 23, 2012

Top in Batch Mode, and Expect Script to monitor ESX logs

Unix Top:

0,15,30,45 * * * * /home/nutanix/serviceability/top_pprof.sh > /dev/null 2>&1

nutanix@NTNX-Ctrl-VM-3-NTNX:~$ more serviceability/top_pprof.sh
#!/bin/sh
if [ ! -d /home/nutanix/data/logs/top ]; then
mkdir /home/nutanix/data/logs/top
fi
m=`date "+%d%m"`
dod=`date "+%H%M"`
if [ ! -d /home/nutanix/data/logs/top/$m ]; then
mkdir /home/nutanix/data/logs/top/$m
fi
top -b -n1 > /home/nutanix/data/logs/top/$m/$dod.top
#TOP IN BATCH MODE
mem=`grep stargate /home/nutanix/data/logs/top/$m/$dod.top | awk '{print $10}'|sort -r | head -1| cut -d"." -f1`
if [ ${mem} -gt 55 ]; then
# IF MEM USAGE is more than 55.
curl http://localhost:2009 2>/dev/null |html2text -ascii -width 600 >/home/nutanix/data/logs/top/$m/$dod.stargatepage
curl http://localhost:2009/h/pprof/growth >/home/nutanix/data/logs/top/$m/$dod.pprof 2>/dev/null
curl http://localhost:2009/h/pprof/heapstats >/home/nutanix/data/logs/top/$m/$dod.pprof.heap 2>/dev/null
/home/nutanix/toolchain/x86_64-unknown-linux-gnu/1.1/bin/pprof -pdf /home/nutanix/bin/stargate http://localhost:2009/h/pprof/growth > /home/nutanix/data/logs/top/$m/$dod.pprof.pdf 2>/dev/null
fi

Monitor ESX logs from webpage:
ns:~/public_html/ATLAS$ ~/scripts/check_vcenter_disconnect.sh >> ~/public_html/ATLAS/index.html &

nutanix@uranus:~/public_html/ATLAS$ cat /data/home/nutanix/scripts/check_vcenter_disconnect.sh
#!/bin/sh
while true
do
dddd=`date -u +%Y-%m-%dT%H`
echo $dddd
for seq in `seq 1 4`
do
echo "atlas$seq"
~/scripts/atlas$seq |grep $dddd
done
sleep 1800 <<<< I did not use Cron, because we might forget to turn it off, altleast here when uranus rebooted, it is gone.
done

I could have create single expect script..
nutanix@uranus:~/public_html/ATLAS$ cat /data/home/nutanix/scripts/atlas*
#!/usr/bin/expect
spawn ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no root@172.
expect "word:"
send "a\r"
expect "#"
send "gunzip /var/run/log/vpxa*.gz\r"
expect "#"
send "egrep stolen /var/run/log/vpxa*\r"
expect "#"
send "egrep -i dropping /var/run/log/vpx*\r"
expect "#"
send "exit\r"

atlas2:
#!/usr/bin/expect
spawn ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no root@172
expect "word:"
send "\r"
expect "#"
send "gunzip /var/run/log/vpxa*.gz\r"
expect "#"
send "egrep stolen /var/run/log/vpxa*\r"
expect "#"
send "egrep -i dropping /var/run/log/vpx*\r"
expect "#"
send "exit\r"

atlas3:
#!/usr/bin/expect
spawn ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no root@172.
expect "word:"
send "a\r"
expect "#"
send "gunzip /var/run/log/vpxa*.gz\r"
expect "#"
send "egrep stolen /var/run/log/vpxa*\r"
expect "#"
send "egrep -i dropping /var/run/log/vpx*\r"
expect "#"

atlas4:
send "exit\r"
#!/usr/bin/expect
spawn ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no root@17
expect "word:"
send "apple4u2$\r"
expect "#"
send "gunzip /var/run/log/vpxa*.gz\r"
expect "#"
send "egrep stolen /var/run/log/vpxa*\r"
expect "#"
send "egrep -i dropping /var/run/log/vpx*\r"
expect "#"

Saturday, February 11, 2012

Create VMs from Template - PowerCLI

add-pssnapin VMware.VimAutomation.Core
Connect-VIServer 1751 -User administrator -Password Password
$global:location = "Atlas-Cluster"
$global:datastore = "Atlas-NTNX-datastore"
$global:sourceVM = @("Atlas_Windows_Template")
$global:testIterations = 10
for($i=1; $i -le $global:testIterations; $i++){
new-vm -Name WinDowsAtlas$i -VMhost 170 -Template Atlas_Windows_Template -Datastore Atlas-NTNX-datastore }
$global:testIterations = 20
for($i=11; $i -le $global:testIterations; $i++){
new-vm -Name WinDowsAtlas31$i -VMhost 171 -Template Atlas_Windows_Template -Datastore Atlas-NTNX-datastore }
$global:testIterations = 30
for($i=21; $i -le $global:testIterations; $i++){
new-vm -Name WinDowsAtlas31$i -VMhost 172 -Template Atlas_Windows_Template -Datastore Atlas-NTNX-datastore }
$global:testIterations = 40
for($i=31; $i -le $global:testIterations; $i++){
new-vm -Name WinDowsAtlas31$i -VMhost 173 -Template Atlas_Windows_Template -Datastore Atlas-NTNX-datastore }
get-VM WinDows*|Start-VM

get-VMHost ip_Addr| get-VM Windows7-*| where {$_.'PowerState' -eq "PoweredOff"} | Start-VM -RunAsync
get-VM Windows7-*| where {$_.'PowerState' -eq "Suspended"} |Start-VM -RunAsync -Confirm:$false

Wednesday, February 1, 2012

Troubleshooting storage issues with ESXi5.0

~ # vmkload_mod -l|grep vmfs3
vmfs3 2 316

~ # vmkiscsi-tool -D -l vmhba37

=========Discovery Properties for Adapter vmhba37=========
iSnsDiscoverySettable : 0
iSnsDiscoveryEnabled : 0
iSnsDiscoveryMethod : 0
iSnsHost.ipAddress : ::
staticDiscoverySettable : 0
staticDiscoveryEnabled : 1
sendTargetsDiscoverySettable : 0
sendTargetsDiscoveryEnabled : 1
slpDiscoverySettable : 0
DISCOVERY ADDRESS : 192.168.5.2
STATIC DISCOVERY TARGET
NAME : iqn.2010-06.com.nutanix:shared_data1-516ad8b0
ADDRESS : 192.168.5.2:62000
BOOT : No
LAST ERR : LOGIN: No Errors
STATIC DISCOVERY TARGET
NAME : iqn.2010-06.com.nutanix:shared_data2-3ae58737
ADDRESS : 192.168.5.2:62001
BOOT : No
LAST ERR : LOGIN: No Errors
STATIC DISCOVERY TARGET
NAME : iqn.2010-06.com.nutanix:shared_data4-8d6bc1cd
ADDRESS : 192.168.5.2:62003
BOOT : No
LAST ERR : LOGIN: No Errors
STATIC DISCOVERY TARGET
NAME : iqn.2010-06.com.nutanix:shared_data3-cbdfce84
ADDRESS : 192.168.5.2:62002
BOOT : No
LAST ERR : LOGIN: No Errors
/dev/disks # esxcli storage core path list | egrep "Target Ident|vmhba37"
Runtime Name: vmhba37:C0:T3:L0
Adapter: vmhba37
Target Identifier: 00023d000001,iqn.2010-06.com.nutanix:shared_data3-cbdfce84,t,1
Target Identifier: sata.0:0
Runtime Name: vmhba37:C0:T2:L0
Adapter: vmhba37
Target Identifier: 00023d000001,iqn.2010-06.com.nutanix:shared_data4-8d6bc1cd,t,1
Target Identifier: sata.0:0
Target Identifier: sata.0:0
Runtime Name: vmhba37:C0:T1:L0
Adapter: vmhba37
Target Identifier: 00023d000001,iqn.2010-06.com.nutanix:shared_data2-3ae58737,t,1
Target Identifier: sata.0:0
Runtime Name: vmhba37:C0:T0:L0
Adapter: vmhba37
Target Identifier: 00023d000001,iqn.2010-06.com.nutanix:shared_data1-516ad8b0,t,1
Target Identifier: sata.0:0
Target Identifier: sata.0:0

/dev/disks # esxcli storage core path list | egrep "Target Identifier|Runtime|naa|Device:"|grep -B1 -A1 NUTAN
Runtime Name: vmhba37:C0:T3:L0
Device: t10.NUTANIX_shared_data3_852409da74945ca63d601382a1db06b2c8f6de10
Target Identifier: 00023d000001,iqn.2010-06.com.nutanix:shared_data3-cbdfce84,t,1
--
Runtime Name: vmhba37:C0:T2:L0
Device: t10.NUTANIX_shared_data4_7ffbcad0adc23e030f69935fe1c0dd5c4a18d187
Target Identifier: 00023d000001,iqn.2010-06.com.nutanix:shared_data4-8d6bc1cd,t,1
--
Runtime Name: vmhba37:C0:T1:L0
Device: t10.NUTANIX_shared_data2_e6c1a2244603b85e94955787f6862d54a277290d
Target Identifier: 00023d000001,iqn.2010-06.com.nutanix:shared_data2-3ae58737,t,1
--
Runtime Name: vmhba37:C0:T0:L0
Device: t10.NUTANIX_shared_data1_f1f06a8844eaa030085a1ab5bbd6575daa0ea3b7
Target Identifier: 00023d000001,iqn.2010-06.com.nutanix:shared_data1-516ad8b0,t,1
/dev/disks # esxcli storage core path list -d t10.NUTANIX_shared_data1_f1f06a8844eaa030085a1ab5bbd6575daa0ea3b7
iqn.1998-01.com.vmware:ntnx-4-0eaa89c5-00023d000001,iqn.2010-06.com.nutanix:shared_data1-516ad8b0,t,1-t10.NUTANIX_shared_data1_f1f06a8844eaa030085a1ab5bbd6575daa0ea3b7
UID: iqn.1998-01.com.vmware:ntnx-4-0eaa89c5-00023d000001,iqn.2010-06.com.nutanix:shared_data1-516ad8b0,t,1-t10.NUTANIX_shared_data1_f1f06a8844eaa030085a1ab5bbd6575daa0ea3b7
Runtime Name: vmhba37:C0:T0:L0
Device: t10.NUTANIX_shared_data1_f1f06a8844eaa030085a1ab5bbd6575daa0ea3b7
Device Display Name: NUTANIX iSCSI Disk (t10.NUTANIX_shared_data1_f1f06a8844eaa030085a1ab5bbd6575daa0ea3b7)
Adapter: vmhba37
Channel: 0
Target: 0
LUN: 0
Plugin: NMP
State: active
Transport: iscsi
Adapter Identifier: iqn.1998-01.com.vmware:ntnx-4-0eaa89c5
Target Identifier: 00023d000001,iqn.2010-06.com.nutanix:shared_data1-516ad8b0,t,1
Adapter Transport Details: iqn.1998-01.com.vmware:ntnx-4-0eaa89c5
Target Transport Details: IQN=iqn.2010-06.com.nutanix:shared_data1-516ad8b0 Alias= Session=00023d000001 PortalTag=1
/dev/disks # esxcli storage core path list -d t10.NUTANIX_shared_data2_e6c1a2244603b85e94955787f6862d54a277290d
iqn.1998-01.com.vmware:ntnx-4-0eaa89c5-00023d000001,iqn.2010-06.com.nutanix:shared_data2-3ae58737,t,1-t10.NUTANIX_shared_data2_e6c1a2244603b85e94955787f6862d54a277290d
UID: iqn.1998-01.com.vmware:ntnx-4-0eaa89c5-00023d000001,iqn.2010-06.com.nutanix:shared_data2-3ae58737,t,1-t10.NUTANIX_shared_data2_e6c1a2244603b85e94955787f6862d54a277290d
Runtime Name: vmhba37:C0:T1:L0
Device: t10.NUTANIX_shared_data2_e6c1a2244603b85e94955787f6862d54a277290d
Device Display Name: NUTANIX iSCSI Disk (t10.NUTANIX_shared_data2_e6c1a2244603b85e94955787f6862d54a277290d)
Adapter: vmhba37
Channel: 0
Target: 1
LUN: 0
Plugin: NMP
State: active
Transport: iscsi
Adapter Identifier: iqn.1998-01.com.vmware:ntnx-4-0eaa89c5
Target Identifier: 00023d000001,iqn.2010-06.com.nutanix:shared_data2-3ae58737,t,1
Adapter Transport Details: iqn.1998-01.com.vmware:ntnx-4-0eaa89c5
Target Transport Details: IQN=iqn.2010-06.com.nutanix:shared_data2-3ae58737 Alias= Session=00023d000001 PortalTag=1

~ # esxcfg-scsidevs -A

vmhba37 t10.NUTANIX_shared_data4_7ffbcad0adc23e030f69935fe1c0dd5c4a18d187
vmhba37 t10.NUTANIX_shared_data3_852409da74945ca63d601382a1db06b2c8f6de10
vmhba37 t10.NUTANIX_shared_data2_e6c1a2244603b85e94955787f6862d54a277290d
vmhba37 t10.NUTANIX_shared_data1_f1f06a8844eaa030085a1ab5bbd6575daa0ea3b7

But
~ # esxcfg-scsidevs -m

t10.NUTANIX_shared_data1_f1f06a8844eaa030085a1ab5bbd6575daa0ea3b7:1 /vmfs/devices/disks/t10.NUTANIX_shared_data1_f1f06a8844eaa030085a1ab5bbd6575daa0ea3b7:1 4f

/dev/disks # esxcli storage vmfs extent list
Volume Name VMFS UUID Extent Number Device Name Partition
-------------------- ----------------------------------- ------------- ------------------------------------------------------------------------ ---------
NTNX-4-local-ds-NTNX 4f1748db-b24ddb90-00b1-0025904b0c90 0 t10.ATA_____INTEL_SSDSA2CW300G3_____________________CVPR133400KE300EGN__ 3
NTNX_datastore_1 4f1de5c4-8b7c3ecc-a9ed-0025904b0c98 0 t10.NUTANIX_shared_data1_f1f06a8844eaa030085a1ab5bbd6575daa0ea3b7 1

/dev/disks # ls
vml.0100000000435650523133333430304b4533303045474e2020494e54454c20:8
t10.NUTANIX_shared_data1_f1f06a8844eaa030085a1ab5bbd6575daa0ea3b7 vml.01000000007368617265645f64617461315f66316630366138383434656161303330303835613161623562564449534b20
t10.NUTANIX_shared_data1_f1f06a8844eaa030085a1ab5bbd6575daa0ea3b7:1 vml.01000000007368617265645f64617461315f66316630366138383434656161303330303835613161623562564449534b20:1

/dev/disks # esxcli storage filesystem list
Mount Point Volume Name UUID Mounted Type Size Free
------------------------------------------------- -------------------- ----------------------------------- ------- ------ ------------- -------------
/vmfs/volumes/4f1748db-b24ddb90-00b1-0025904b0c90 NTNX-4-local-ds-NTNX 4f1748db-b24ddb90-00b1-0025904b0c90 true VMFS-5 294742130688 54638149632
/vmfs/volumes/4f1de5c4-8b7c3ecc-a9ed-0025904b0c98 NTNX_datastore_1 4f1de5c4-8b7c3ecc-a9ed-0025904b0c98 true VMFS-3 2195533594624 1080925618176
/vmfs/volumes/4ee7c1ed-df35b248-89f3-0025902ef1c9 4ee7c1ed-df35b248-89f3-0025902ef1c9 true vfat 4293591040 4278583296
/vmfs/volumes/0663f37a-83c2d90f-7056-e4ffaac98b5b 0663f37a-83c2d90f-7056-e4ffaac98b5b true vfat 261853184 114044928
/vmfs/volumes/ec72867e-e39b1732-b736-44c76538fffd ec72867e-e39b1732-b736-44c76538fffd true vfat 261853184 114515968
/vmfs/volumes/4ee7c1e5-15813146-c506-0025902ef1c9 4ee7c1e5-15813146-c506-0025902ef1c9 true vfat 299712512 114843648

Resolution Was:
Connect Vcenter, click on Devices in Storage Config, right click on unmounted disk
and then attach it.
Then Rescan VMFS volumes will be in unmounted state, right click and mount it.

VMNIC0 Failures.

tail vmkernel.log
2012-02-01T20:03:32.032Z cpu0:4780)<6>e1000e: vmnic0 NIC Link is Down

/var/log # esxcfg-vswitch -l
Switch Name Num Ports Used Ports Configured Ports MTU Uplinks
vSwitch0 128 7 128 1500 vmnic3.p1,vmnic0,vmnic1

PortGroup Name VLAN ID Used Ports Uplinks
VM Network 0 2 vmnic3.p1,vmnic0,vmnic1
Management Network 0 1 vmnic1,vmnic3.p1,vmnic0

Try to remove vmnic0 from vswitch
esxcfg-vswitch -U vmnic0 vSwitch0

fix the link and add it again
esxcfg-vswitch -L vmnic0 vSwitch0

/var/log # esxcli network vswitch standard list
vSwitch0
Name: vSwitch0
Class: etherswitch
Num Ports: 128
Used Ports: 7
Configured Ports: 128
MTU: 1500
CDP Status: listen
Beacon Enabled: false
Beacon Interval: 1
Beacon Threshold: 3
Beacon Required By:
Uplinks: vmnic0, vmnic3.p1, vmnic1
Portgroups: VM Network, Management Network

To check override.
/var/log # esxcli network vswitch standard portgroup policy failover get -p "Management Network"
Load Balancing: srcport
Network Failure Detection: link
Notify Switches: true
Failback: true
Active Adapters: vmnic1
Standby Adapters: vmnic3.p1, vmnic0
Unused Adapters:
Override Vswitch Load Balancing: false
Override Vswitch Network Failure Detection: false
Override Vswitch Notify Switches: false
Override Vswitch Failback: false
Override Vswitch Uplinks: true

Tuesday, January 31, 2012

MKS (Mouse Keyboard Screen Error) when opening VM console

In the lab, it got resolved by

services.sh restart

Tuesday, January 24, 2012

Host profiles - how to remove irrelavant config check

Host profiles create sample config of the ESXi server, and you can apply that
to all nodes in that cluster.

http://pubs.vmware.com/vsphere-50/topic/com.vmware.ICbase/PDF/vsphere-esxi-vcenter-server-50-host-profiles-guide.pdf

http://www.vmware.com/files/pdf/techpaper/VMW-Host-Profiles-Tech-Overview.pdf

Here I can right click on iqn on SWiscsi to remove the non-relevant checks.

I remove it from the bottom. Then I apply this profile to the cluster.

For eg.
Option Syslog.global.logDir doesn't match the specified criteria
Original config was:

Changed to the following to prevent error:

______________________________________________________________________________________
PS C:\Users\Nutanix01> Get-VMhostprofile

Name Description ReferenceHostId
---- ----------- ---------------
esx5_host_profile Base ESX5 Host Profile HostSystem-host-9
esxi3 HostSystem-host-107

PS C:\Users\Nutanix01> Get-VMhostprofile esxi3 |fl

ServerId : /VIServer=root@ip/
Server : ip
Description :
ReferenceHostId : HostSystem-host-107
ReferenceHost : ip
Name : esxi3
ExtensionData : VMware.Vim.HostProfile
Id : HostProfile-hostprofile-2001
Uid : /VIServer=root@ip:443/VMHostProfile=HostProfile-hostprofile-2001/

Test-VMHostProfileCompliance -Profile esxi3
PS C:\Users\Nutanix01> Test-VMHostProfileCompliance -Profile esxi3

VMHostId VMHostProf IncomplianceElementList
ileId
-------- ---------- -----------------------
HostSys... HostPro... {:Host is unavailable for checking compliance.}
HostSys... HostPro... {network.hostPortGroup["key-vim-profile-host-HostPortgrou...
HostSys... HostPro... {storage.psa_psaProfile_PluggableStorageArchitectureProfi...
HostSys... HostPro... {storage.psa_psaProfile_PluggableStorageArchitectureProfi...

Sample to create a hostprofile

PS C:\Users\Nutanix01> Get-VMHost Name |New-VMHostProfile -Name "Server4"

Name Description ReferenceHostId
---- ----------- ---------------
Server4 HostSystem-host-65

PS C:\Users\Nutanix01> Get-VMHostProfile -ReferenceHost ip

Name Description ReferenceHostId
---- ----------- ---------------
Server4 HostSystem-host-65

PS C:\Users\Nutanix01> Get-VMHost ip |Apply-VMHostProfile -AssociateOnly -Profile Server4

new-vm -Name WinAtlas2 -VMhost 172.ip -Template Atlas_Windows_Template -Datastore Atlas-NTNX-datastore

Friday, January 20, 2012

Poweroff Certain VMs

get-VM -Location *Proteus* | where { $_.name -ne "Proteus-C1" -And $_.name -ne "Proteus-C2" -And $_.name -ne "Proteus-C3" -And $_.name -ne "Proteus-C4" } | Stop-VM -Confirm:$false

Stop-VM (get-VM AlWinDows* |where {$_.'PowerState' -eq "PoweredOn"}) -RunAsync -Confirm:$false

$global:testIterations = 75
for($i=1; $i -le $global:testIterations; $i++){
Stop-VM (get-VM *Win* -Location Atlas-Cluster |where {$_.'PowerState' -eq "PoweredOn"}) -RunAsync -Confirm:$false
Start-Sleep -Seconds 100
Start-VM (get-VM *Win* -Location Atlas-Cluster |where {$_.'PowerState' -ne "PoweredOn"}) -RunAsync -Confirm:$false
}

get VMHost 10.2.8.51| get-VM Windows7-*| where {$_.'PowerState' -eq "PoweredOn"} | Stop-VM

Thursday, January 19, 2012

Collection of ESXi commands to manage VMs

1. esxcli vms vm list ( esxcli vm process list) - lists VM and the world id, you can kill, send NMI mask using vm-support command.(esxcli vm process kill ,
2. vm-support -V, vm-support -x ,vm-support -X - captures screenshot,etc.,
vmdumper
vmdumper: [options]

3.vim-cmd /vmsvc/getallvms
4.vim-cmd /vmsvc/unregister
5. vim-cmd /solo/register /path/to/file.vmx
6. vim-cmd /vmsvc/power.getstate
7.vim-cmd /vmsvc/power.off
8.vim-cmd /vmsvc/power.on
9. vim-cmd vmsvc/device.getdevices $id|egrep -i "vmdk|compat"
10. vim-cmd vmsvc/get.config
11. vim-cmd vmsvc/get.datastores 35
12.vim-cmd vmsvc/device.diskaddexisting
13. vim-cmd hostsvc/datastore/summary datastore2
14. vim-cmd hostsvc/refresh_services
15. vim-cmd vmsvc/message
16. vim-cmd vmsvc/tools.install

To find the ip addresses allocated

for vm in `vim-cmd vmsvc/getallvms | awk '{print $1}'|grep -v -i vmid`; do vim-cmd vmsvc/get.summary $vm |grep -i ipAdd;done

I used to find a duplicate ip....

Monday, January 16, 2012

Vcenter Appliance Troubleshooting

VMware vCenter Server Appliance 5.0 GA with embedded DB2 database fails

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2006812

Interacting with db2:
http://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&docType=kc&externalId=2004506&sliceId=1&docTypeID=DT_KB_1_1&dialogID=278854278&stateId=1%200%20278856044
The vCenter Server service fails to start after the service was stopped or restarted when using DB2 database

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1021581

vcenter-2:/ # su - db2inst1
db2inst1@vcenter-2:~> db2start
SQL1026N The database manager is already active.
db2inst1@vcenter-2:~> db2ls

Install Path Level Fix Pack Special Install Number I nstall Date Installer UID
-------------------------------------------------------------------------------- -------------------------------------
/opt/db2/v9.7.2 9.7.0.2 2 Tu e Aug 23 17:54:41 2011 UTC 0
db2inst1@vcenter-2:~> db2ilist
db2inst1db2 => connect to VCDB
db2 => set schema vc

db2inst1@vcenter-2:~> db2 get database config for vcdb | grep LOG
Catalog cache size (4KB) (CATALOGCACHE_SZ) = 300
Log buffer size (4KB) (LOGBUFSZ) = 256
Log file size (4KB) (LOGFILSIZ) = 8192
Number of primary log files (LOGPRIMARY) = 16
Number of secondary log files (LOGSECOND) = 112
Changed path to log files (NEWLOGPATH) =
Path to log files = /storage/db/db2/home/ db2inst1/db2inst1/NODE0000/SQL00001/SQLOGDIR/
Overflow log path (OVERFLOWLOGPATH) =
Mirror log path (MIRRORLOGPATH) =
Block log on disk full (BLK_LOG_DSK_FUL) = NO
Block non logged operations (BLOCKNONLOGGED) = NO
Percent max primary log space by transaction (MAX_LOG) = 0
Num. of active log files for 1 active UOW(NUM_LOG_SPAN) = 0
Log retain for recovery enabled (LOGRETAIN) = OFF
First log archive method (LOGARCHMETH1) = OFF
Options for logarchmeth1 (LOGARCHOPT1) =
Second log archive method (LOGARCHMETH2) = OFF
Options for logarchmeth2 (LOGARCHOPT2) =
Log pages during index build (LOGINDEXBUILD) = OFF

db2 => list tables for all to list all tables
=====================================================================================
Error:(vpxa.log)
2012-01-16T15:14:43.269-08:00 [7FFFF3B09700 error 'Default'] [Vdb::LockRepositoryHelper] SQLExecDirect failed: 57011:-438:ODBC error: (57011) - [IBM][CLI Driver][DB2/LINUXX8664] SQL0438N Application raised error or warning with diagnostic text: "Error deleting from VPX_SESSIONL". SQLSTATE=57011
-->
2012-01-16T15:14:43.271-08:00 [7FFFF3B09700 error 'Default'] Unable to get exclusive access to vCenter repository. Please check if another vCenter instance is running against the same database schema.
2012-01-16T15:14:43.275-08:00 [7FFFF3B09700 error 'Default'] Unhandled exception

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1021581

We followed the steps, it seems working now. But we need to monitor..

vcenter-2:/ # su - db2inst1
db2inst1@vcenter-2:~> db2start
SQL1026N The database manager is already active.
db2inst1@vcenter-2:~> db2ls

Install Path Level Fix Pack Special Install Number I nstall Date Installer UID
-------------------------------------------------------------------------------- -------------------------------------
/opt/db2/v9.7.2 9.7.0.2 2 Tu e Aug 23 17:54:41 2011 UTC 0
db2inst1@vcenter-2:~> db2ilist
db2inst1db2 => connect to VCDB
db2 => set schema vc
db2inst1@vcenter-2:~> db2 get database config for vcdb | grep LOG

Catalog cache size (4KB) (CATALOGCACHE_SZ) = 300
Log buffer size (4KB) (LOGBUFSZ) = 256
Log file size (4KB) (LOGFILSIZ) = 8192
Number of primary log files (LOGPRIMARY) = 16
Number of secondary log files (LOGSECOND) = 112
Changed path to log files (NEWLOGPATH) =

Path to log files = /storage/db/db2/home/ db2inst1/db2inst1/NODE0000/SQL00001/SQLOGDIR/
Overflow log path (OVERFLOWLOGPATH) =
Mirror log path (MIRRORLOGPATH) =
Block log on disk full (BLK_LOG_DSK_FUL) = NO
Block non logged operations (BLOCKNONLOGGED) = NO
Percent max primary log space by transaction (MAX_LOG) = 0
Num. of active log files for 1 active UOW(NUM_LOG_SPAN) = 0
Log retain for recovery enabled (LOGRETAIN) = OFF
First log archive method (LOGARCHMETH1) = OFF
Options for logarchmeth1 (LOGARCHOPT1) =
Second log archive method (LOGARCHMETH2) = OFF
Options for logarchmeth2 (LOGARCHOPT2) =
Log pages during index build (LOGINDEXBUILD) = OFF

db2 => select * from VPX_SESSIONLOCK

APPL_ID
--------------------------------------------------------------------------------------------------------------------------------
127.0.0.1.52263.120117000832

db2 => delete from VPX_SESSIONLOCK
SQL0100W No row was found for FETCH, UPDATE or DELETE; or the result of a
query is an empty table. SQLSTATE=02000

Verifying the group
db2inst1@vcenter-5:~> cat /etc/group |grep -i db
DBSYSMON:!:111:vc

db2inst1@vcenter-5:~> cat /etc/passwd|grep vc
vc:x:1004:100::/opt/db2/home//vc:/bin/false

Restarting the service.

vcenter-5:~ # service vmware-vpxd status

vmware-vpxd is stopped
tomcat is running

vcenter-5:~ # service vmware-vpxd start

Waiting for embedded DB2 database to startup: .success
Cleaning session lock table: success
Verifying EULA acceptance: success

Shutting down ldap-server..done
Starting ldap-server

Which controller VM is hosting the shared lun ?

for ips in $SVM_IPS; do echo "SVM:$ips"; echo "----------------"; curl http://$ips:2009 2>/dev/null | html2text -width 500 -ascii| sed -n "/Hosted VDisks/,/Extent Store/p" |egrep -v "Hosted VDisks|Extent Store|dedup usage|VDisk Id|Fragments"| awk '{print $2}'; echo; done

SVM:172.16.13.21
----------------
amazon-build-home
VS5TEST-SHARED-DS-01
PDS-TEST
shared_data7
SPTest.snapshot
NFS-laura
NFS-laura2
ubu-stress-fc-proteus4-3-vdisk-4f03e93f-0
laura-ubuntu-clone
ubu-stress-fc-proteus4-0-vdisk-4f03e93f-1
ubu-stress-fc-proteus4-1-vdisk-4f03e93f-1
ubu-nfs-disk1
nfs-laura
ubu-stress-fc-proteus4-2-vdisk-4f03e93f-1
ubu-stress-fc-proteus4-4-vdisk-4f03e93f-1
vdisk-jenkins-home
vdisk-jenkins-var
NFS-rdm-root
NFS-rdm-disk-3
NFS-rdm-disk-4
shared_datastore0
NFS-rdm-disk-2

SVM:172.16.13.22
----------------
VS5TEST-SHARED-DS-02
VS5TEST-SHARED-DS-03
laura-ubuntu
ubu-nfs-proteus1-0-vdisk-4ef58401-0
shared_datastore1

SVM:172.16.13.23
----------------
jenkins-home2
jenkins-dogfood
jenkins-var2
28
ubu-stress-test-0-vdisk-4eea1dcd-0

SVM:172.16.13.24
----------------
ubu-stress-fc-proteus4-2-vdisk-4f03e93f-0
VS5TEST-SHARED-DS-04
ubu-stress-fc-proteus4-3-vdisk-4f03e93f-1
shared_datastore2