nutanix: August 2013

Tuesday, August 20, 2013

Introducing NEDE ( Nutanix Elastic Dedup Engine)

Nutanix elastic dedup engine is

software driven,
scalable/distributed/inline data reduction technology for Flash and cache tiers
fingerprints on sequential write ,
dedup in RAM/Flash on Read.

Nutanix software is so modular, that Nutanix software development team developed this module and plugged into existing modules (cassandra/stargate).
NEDE uses no-SQL database for indexing using already existing keyspace extentgroup map . We did not even have to create a new keyspace.

Next steps:
NEDE will utilize scale-out Map-Reduce technology already existing in NDFS for offline deduplication.

Nutanix DR will be using NEDE to reduce the amount data transferred across the WAN.

NEDE eliminates the duplicate 4K blocks, so we have more space available in hot tier and memory, for unique blocks.

We have seen the need for it as more and more of our users migrate VMs from
other storage vendors to us ( where VAAI plugin snapshot or Linked Clone can't be taken advantage) and reduce the usage of hot tier. CBRC (http://myvirtualcloud.net/?p=3094) is of some help but 2G cache is not enough.

Convergence and Dedup: Dedup engine is distributed, and the data/indexing is localized based
as it is VM aware and less network utilization than centralized storage/SAN/NAS solution.

Nutanix uses fixed block dedup as vmdk is a block device which is formatted on the guest OS with specific block size. For example Windows NTFS uses 4K block size, so dedup uses fixed 4K block size.

Content Cache: (new with this feature) refers to the <SHA1 Fingerprint> to <chunkData> cache. This will be on RAM+flash.

Outline:

On seq writes , compute SHA1 for 4k blocks , store it in keyspace.
On reads, if it has SHA1 index already, serve it from Content Cache, else read from Extent store and populate the cache.

Overhead due to dedup:

Computing the index causes 10% additional overhead on CPU. Indexing additional storage is
less than 1% in the table.

Write path overview:

Seq Writes will be fingerprinted at 4K chunk size with SHA1.

Read Path Overview

Check the extent cache for data
If not, if it has fingerprints, check Content Cache
if not, read from Extent store
if it has fingerprints in egroup map, populate the read into content cache
else populate the read into Extent cache.

We have further LRU and hash tables, single touch and multiple touch LRUs for memory
and flash chunks. I will explain this more later.

Glags Configurable:

stargate_content_cache_max_flash
stargate_content_cache_max_memory
stargate_content_cache_single_touch_memory_pct,
stargate_content_cache_single_touch_flash_pct

Metrics

http://<CVM Name or IP>:2009/h/vars?regex=stargate.dedup

http://<CVM Name or IP>:2009/h/vars?regex=content_cache

I will expand this blog after VMworld2013 - with various terminologies used and also
how this helps boot storms, virus scan,what percentage of hot tier usage is reduced.

Config: Create a container with Finger print option
ID                        : 38608
    Name                      : dedup-test-container
   VStore Name(s)            : dedup-test-container
    Random I/O Pri Order      : SSD-PCIe,SSD-SATA,DAS-SATA
    Sequential I/O Pri Order : SSD-PCIe,SSD-SATA,DAS-SATA
    Oplog Configured          : true
    Oplog Highly Available    : true
    Fingerprint On Write      : on

Monday, August 5, 2013

Unable to enable HA on one ESXi host

Problem Description:

Host ABCD (x.y.z.150) is unable to start vSphere HA. The current state is "vSphere HA Agent Unreachable". I have tried to start HA twice, but this did not resolve the issue.

KBs to review:

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2010626

ttp://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2011974

Logs to look for in ESXi: /var/log/vpxa.log and /var/log/fdm.log ( /var/run/log)

fdm.log snippet:

2013-08-04T21:01:18.968Z [FFDD3B90 error 'Cluster' opID=SWI-79b9207c] [ClusterDatastore::DoAcquireDatastoreWork] open(/vmfs/volumes/9e9989cf-f687e31c/.vSphere-HA/FDM-F78AC28A-8862-48C5-BC1C-F369CCABE58E-1480-9c9b8fc-ANTHMSASVC5/protectedlist) failed: Device or resource busy

2013-08-04T21:01:44.224Z [38498B90 error 'Default' opID=SWI-4593d696] SSLStreamImpl::BIOWrite (0d3fe098) Write failed: Broken pipe

2013-08-04T21:01:44.224Z [38498B90 error 'Default' opID=SWI-4593d696] SSLStreamImpl::DoClientHandshake (0d3fe098) SSL_connect failed with BIO Error

2013-08-04T21:01:44.224Z [38498B90 error 'Message' opID=SWI-4593d696] [MsgConnectionImpl::FinishSSLConnect] Error N7Vmacore3Ssl12SSLExceptionE(SSL Exception: BIO Error) on handshake

Workaround:

- check if there are high latencies on the storage

- restart services.sh

- enable/refresh HA again in the vcenter.

Errors on FDM.log:

2013-08-04T23:23:33.577Z [FFF18B90 verbose 'Cluster' opID=SWI-5a8f10c4] [ClusterManagerImpl::IsBadIP] x.y.z.199 is bad ip

2013-08-04T23:23:34.578Z [FFF18B90 verbose 'Cluster' opID=SWI-5a8f10c4] [ClusterManagerImpl::IsBadIP] x.y.z.199 is bad ip

Workaround:

On x.y.z.199, review the fdm.log and run services.sh restart.( you could disconnect and connect the host, which restarts services, but I find services.sh restart fixing more issues)

Friday, August 2, 2013

How much of a detail can you get about a VM from your storage ?

Most storage vendors say that they are VM aware, but it is difficult for the centralized storage vendors to identify VMs, even getting host level statistics is pain , because you need map a LUN to WWNN and then WWNN to a host.

With Nutanix, it is a breeze, because it is a converged platform and Nutanix is VM aware in addition to statistics, Nutanix localizes data based on where the VM is accessing data from. This is quick
overview, in no way it is complete in what other stats we can get.

From CLI:

ncli vm list -- list of VMs running, it gives CPU ,memory, vdisks configured.
ncli vdisk ls vm-name="name of the VM"
ncli vm ls-stats name="name of the VM"

Example:
Snippet of ncli vm ls
   ID                        : 50160a6e-d5c2-041d-7a2d-541530f8c86b
    Name                      : nfs-ubu-stress-Colossus09-1-4
    VM IP Addresses           :
    Hypervisor Host ID        : 3
    Hypervisor Host Name      : 10.3.177.183
    Memory (MB)               : 4096
    Virtual CPUs              : 2
    VDisk Count               : 1
    VDisks                    : NFS:19812

ncli vm ls-stats name=nfs-ubu-stress-Colossus09-1-20    Name                      : nfs-ubu-stress-Colossus09-1-20
VM IP Addresses           : 10.3.58.235
    Hypervisor Host ID        : 746301033
    Memory (MB)               : 4096
    Virtual CPUs              : 2
    Disk Bandwidth (Kbps)     : 25230
    Network Bandwidth (Kbps) : 0
    Latency (micro secs)      : 2215
    CPU Usage Percent         : 100%
    Memory Usage              : 1.02 GB (1,090,516,000 bytes)

GUI:

From REST API:

nutanix@NTNX-450-A-CVM:10.1.59.66:~$ cat test_resp.py
#!/usr/bin/python
import json as json
import requests

def main():
base_url = "https://colossus09-c1.corp.nutanix.com:9440/PrismGateway/services/rest/v1/"
s = requests.Session()
s.auth = ('admin', 'admin')
s.headers.update({'Content-Type': 'application/json; charset=utf-8'})

print s.get(base_url + 'vms/50169534-35e1-a1de-c23e-1d1135151293', verify=False).json()
#just VMs will get the all VMs and then you actively get specific vm
if __name__ == "__main__":
main()

run test_resp.py

Output for one VM:

 {
      "vmId": " {
      "vmId": "50169534-35e1-a1de-c23e-1d1135151293",
      "powerState": "on",
      "vmName": "nfs-ubu-stress-Colossus09-1-4",
      "guestOperatingSystem": "Ubuntu Linux (64-bit)",
      "ipAddresses": [],
      "hostName": "10.3.177.183",
      "hostId": 3,
      "memoryCapacityInMB": 4096,
      "memoryReservedCapacityInMB": 0,
      "numVCpus": 2,
      "cpuReservedInHz": 0,
      "numNetworkAdapters": 1,
      "nutanixVirtualDisks": [
        "/ctr1/nfs-ubu-stress-Colossus09-1-4/nfs-ubu-stress-Colossus09-1-4.vmdk"
      ],
      "vdiskNames": [
        "NFS:18594"
      ],
      "vdiskFilePaths": [
        "/ctr1/nfs-ubu-stress-Colossus09-1-4/nfs-ubu-stress-Colossus09-1-4-flat.vmdk"
      ],
      "diskCapacityInBytes": 53687091200,
      "timeStampInUsec": 1375472003986000,
      "protectionDomianName": null,
      "consistencyGroupName": null,
      "stats": {
        "hypervisor_memory_usage_ppm": "330000",
        "avg_io_latency_usecs": "218757",
        "write_io_ppm": "1000000",
        "seq_io_ppm": "411998",
        "read_io_ppm": "0",
        "hypervisor_num_transmitted_bytes": "-1",
        "hypervisor_num_received_bytes": "-1",
        "total_transformed_usage_bytes": "0",
        "hypervisor_avg_read_io_latency_usecs": "0",
        "hypervisor_num_write_io": "15760",
        "num_iops": "113",
        "random_io_ppm": "588001",
        "total_untransformed_usage_bytes": "-1",
        "avg_read_io_latency_usecs": "-1",
        "io_bandwidth_kBps": "18807",
        "hypervisor_avg_io_latency_usecs": "6000",
        "hypervisor_num_iops": "788",
        "hypervisor_cpu_usage_ppm": "460000",
        "hypervisor_io_bandwidth_kBps": "31636"
      }
    },",
      "powerState": "on",
      "vmName": "nfs-ubu-stress-Colossus09-1-4",
      "guestOperatingSystem": "Ubuntu Linux (64-bit)",
      "ipAddresses": [],
      "hostName": "10.3.177.183",
      "hostId": 3,
      "memoryCapacityInMB": 4096,
      "memoryReservedCapacityInMB": 0,
      "numVCpus": 2,
      "cpuReservedInHz": 0,
      "numNetworkAdapters": 1,
      "nutanixVirtualDisks": [
        "/ctr1/nfs-ubu-stress-Colossus09-1-4/nfs-ubu-stress-Colossus09-1-4.vmdk"
      ],
      "vdiskNames": [
        "NFS:18594"
      ],
      "vdiskFilePaths": [
        "/ctr1/nfs-ubu-stress-Colossus09-1-4/nfs-ubu-stress-Colossus09-1-4-flat.vmdk"
      ],
      "diskCapacityInBytes": 53687091200,
      "timeStampInUsec": 1375472003986000,
      "protectionDomianName": null,
      "consistencyGroupName": null,
      "stats": {
        "hypervisor_memory_usage_ppm": "330000",
        "avg_io_latency_usecs": "218757",
        "write_io_ppm": "1000000",
        "seq_io_ppm": "411998",
        "read_io_ppm": "0",
        "hypervisor_num_transmitted_bytes": "-1",
        "hypervisor_num_received_bytes": "-1",
        "total_transformed_usage_bytes": "0",
        "hypervisor_avg_read_io_latency_usecs": "0",
        "hypervisor_num_write_io": "15760",
        "num_iops": "113",
        "random_io_ppm": "588001",
        "total_untransformed_usage_bytes": "-1",
        "avg_read_io_latency_usecs": "-1",
        "io_bandwidth_kBps": "18807",
        "hypervisor_avg_io_latency_usecs": "6000",
        "hypervisor_num_iops": "788",
        "hypervisor_cpu_usage_ppm": "460000",
        "hypervisor_io_bandwidth_kBps": "31636"
      }
    },