Tuesday, August 20, 2013

Introducing NEDE ( Nutanix Elastic Dedup Engine)

Nutanix elastic dedup engine is
  • software driven,
  • scalable/distributed/inline data reduction technology for Flash and cache tiers
  • fingerprints on sequential write ,
  • dedup in RAM/Flash on Read.

Nutanix software is so modular, that Nutanix software development team developed this module and plugged into existing modules (cassandra/stargate).
NEDE uses no-SQL database for indexing using already existing keyspace extentgroup map . We did not  even have to create a new keyspace.

Next steps:
NEDE will utilize scale-out Map-Reduce technology already existing in NDFS for offline deduplication.

Nutanix DR will be using NEDE to reduce the amount data transferred across the WAN.

NEDE eliminates the duplicate 4K blocks, so we have more space available in hot tier and memory, for unique blocks.

We have seen the need for it as more and more of our users migrate VMs from
other storage vendors to us ( where VAAI plugin snapshot or Linked Clone can't be taken advantage)  and reduce the usage of hot tier. CBRC (http://myvirtualcloud.net/?p=3094) is of some help but 2G cache is not enough.

Convergence and Dedup:  Dedup engine is distributed, and the data/indexing is localized based
as it is VM aware and less network utilization than centralized storage/SAN/NAS solution.

Nutanix uses  fixed block dedup as vmdk is a block device which is formatted on the guest OS with specific block size. For example Windows NTFS uses 4K block size, so dedup uses fixed 4K block size.


Content Cache: (new with this feature) refers to the <SHA1 Fingerprint> to <chunkData> cache. This will be on RAM+flash.


Outline:
  • On seq writes , compute SHA1 for 4k blocks , store it in keyspace.
  •  On reads, if it has SHA1 index already, serve it from Content Cache, else read from Extent store and populate the cache.

Overhead due to dedup:

Computing the index causes 10%  additional overhead on CPU. Indexing additional storage is
less than 1% in the table.
 
Write path overview:

Seq Writes will be fingerprinted at 4K chunk size with SHA1.


Read Path Overview

  • Check the extent cache for data
  • If not, if  it has fingerprints, check Content Cache
  • if not, read from Extent store
  • if it has fingerprints in egroup map, populate the  read into content cache
  • else populate the read into Extent cache.

We have further LRU and hash tables, single touch and multiple  touch LRUs for memory
and flash chunks. I will explain this more later.

Glags Configurable:
  • stargate_content_cache_max_flash
  • stargate_content_cache_max_memory
  • stargate_content_cache_single_touch_memory_pct,
  • stargate_content_cache_single_touch_flash_pct

Metrics

http://<CVM Name or IP>:2009/h/vars?regex=stargate.dedup








http://<CVM Name or IP>:2009/h/vars?regex=content_cache


 

I will expand this blog after VMworld2013 - with various terminologies used and also
how this helps boot storms, virus scan,what percentage of hot tier usage is reduced.

Config: Create a container with Finger print option
    ID                        : 38608
    Name                      : dedup-test-container
     VStore Name(s)            : dedup-test-container
    Random I/O Pri Order      : SSD-PCIe,SSD-SATA,DAS-SATA
    Sequential I/O Pri Order  : SSD-PCIe,SSD-SATA,DAS-SATA
    Oplog Configured          : true
    Oplog Highly Available    : true
    Fingerprint On Write      : on




Monday, August 5, 2013

Unable to enable HA on one ESXi host

Problem Description:
Host ABCD (x.y.z.150) is unable to start vSphere HA.  The current state is "vSphere HA Agent Unreachable".  I have tried to start HA twice, but this did not resolve the issue.
KBs to review:

Logs to look for in ESXi:  /var/log/vpxa.log and /var/log/fdm.log ( /var/run/log)

fdm.log snippet:

2013-08-04T21:01:18.968Z [FFDD3B90 error 'Cluster' opID=SWI-79b9207c] [ClusterDatastore::DoAcquireDatastoreWork] open(/vmfs/volumes/9e9989cf-f687e31c/.vSphere-HA/FDM-F78AC28A-8862-48C5-BC1C-F369CCABE58E-1480-9c9b8fc-ANTHMSASVC5/protectedlist) failed: Device or resource busy
2013-08-04T21:01:44.224Z [38498B90 error 'Default' opID=SWI-4593d696] SSLStreamImpl::BIOWrite (0d3fe098) Write failed: Broken pipe
2013-08-04T21:01:44.224Z [38498B90 error 'Default' opID=SWI-4593d696] SSLStreamImpl::DoClientHandshake (0d3fe098) SSL_connect failed with BIO Error
2013-08-04T21:01:44.224Z [38498B90 error 'Message' opID=SWI-4593d696] [MsgConnectionImpl::FinishSSLConnect] Error N7Vmacore3Ssl12SSLExceptionE(SSL Exception: BIO Error) on handshake

Workaround: 

- check if there are high latencies on the storage
- restart services.sh
- enable/refresh HA again in the vcenter.

Errors on FDM.log:

2013-08-04T23:23:33.577Z [FFF18B90 verbose 'Cluster' opID=SWI-5a8f10c4] [ClusterManagerImpl::IsBadIP] x.y.z.199 is bad ip
2013-08-04T23:23:34.578Z [FFF18B90 verbose 'Cluster' opID=SWI-5a8f10c4] [ClusterManagerImpl::IsBadIP] x.y.z.199 is bad ip

Workaround:

On x.y.z.199, review the fdm.log and run services.sh restart.( you could disconnect and connect the host, which restarts services, but I find services.sh restart fixing more issues)





Friday, August 2, 2013

How much of a detail can you get about a VM from your storage ?

Most storage vendors say that they are VM aware, but it is difficult for the centralized storage vendors to identify VMs, even getting host level statistics is pain , because you need map a LUN to WWNN and then WWNN to a host.

With Nutanix, it is a breeze, because it is a converged platform and Nutanix is VM aware in addition to statistics, Nutanix localizes data based on where the VM is accessing data from. This is quick
overview, in no way it is complete in what other stats we can get.

From CLI:

ncli vm list -- list of VMs running, it gives CPU ,memory, vdisks  configured.
ncli vdisk ls vm-name="name of the VM"
ncli vm  ls-stats name="name of the VM"

Example:
Snippet of ncli vm ls
   ID                        : 50160a6e-d5c2-041d-7a2d-541530f8c86b
    Name                      : nfs-ubu-stress-Colossus09-1-4
    VM IP Addresses           :
    Hypervisor Host ID        : 3
    Hypervisor Host Name      : 10.3.177.183
    Memory (MB)               : 4096
    Virtual CPUs              : 2
    VDisk Count               : 1
    VDisks                    : NFS:19812

ncli vm ls-stats name=nfs-ubu-stress-Colossus09-1-20    Name                      : nfs-ubu-stress-Colossus09-1-20
 VM IP Addresses           : 10.3.58.235
    Hypervisor Host ID        : 746301033
    Memory (MB)               : 4096
    Virtual CPUs              : 2
    Disk Bandwidth (Kbps)     : 25230
    Network Bandwidth (Kbps)  : 0
    Latency (micro secs)      : 2215
    CPU Usage Percent         : 100%
    Memory Usage              : 1.02 GB (1,090,516,000 bytes)


GUI:




From REST API:


 nutanix@NTNX-450-A-CVM:10.1.59.66:~$ cat test_resp.py
#!/usr/bin/python
import json as json
import requests

def main():
  base_url = "https://colossus09-c1.corp.nutanix.com:9440/PrismGateway/services/rest/v1/"
  s = requests.Session()
  s.auth = ('admin', 'admin')
  s.headers.update({'Content-Type': 'application/json; charset=utf-8'})

  print s.get(base_url + 'vms/50169534-35e1-a1de-c23e-1d1135151293', verify=False).json()
#just VMs will get the all VMs and then you actively get specific vm
if __name__ == "__main__":
  main()

run test_resp.py


Output for one VM:

 {
      "vmId": " {
      "vmId": "50169534-35e1-a1de-c23e-1d1135151293",
      "powerState": "on",
      "vmName": "nfs-ubu-stress-Colossus09-1-4",
      "guestOperatingSystem": "Ubuntu Linux (64-bit)",
      "ipAddresses": [],
      "hostName": "10.3.177.183",
      "hostId": 3,
      "memoryCapacityInMB": 4096,
      "memoryReservedCapacityInMB": 0,
      "numVCpus": 2,
      "cpuReservedInHz": 0,
      "numNetworkAdapters": 1,
      "nutanixVirtualDisks": [
        "/ctr1/nfs-ubu-stress-Colossus09-1-4/nfs-ubu-stress-Colossus09-1-4.vmdk"
      ],
      "vdiskNames": [
        "NFS:18594"
      ],
      "vdiskFilePaths": [
        "/ctr1/nfs-ubu-stress-Colossus09-1-4/nfs-ubu-stress-Colossus09-1-4-flat.vmdk"
      ],
      "diskCapacityInBytes": 53687091200,
      "timeStampInUsec": 1375472003986000,
      "protectionDomianName": null,
      "consistencyGroupName": null,
      "stats": {
        "hypervisor_memory_usage_ppm": "330000",
        "avg_io_latency_usecs": "218757",
        "write_io_ppm": "1000000",
        "seq_io_ppm": "411998",
        "read_io_ppm": "0",
        "hypervisor_num_transmitted_bytes": "-1",
        "hypervisor_num_received_bytes": "-1",
        "total_transformed_usage_bytes": "0",
        "hypervisor_avg_read_io_latency_usecs": "0",
        "hypervisor_num_write_io": "15760",
        "num_iops": "113",
        "random_io_ppm": "588001",
        "total_untransformed_usage_bytes": "-1",
        "avg_read_io_latency_usecs": "-1",
        "io_bandwidth_kBps": "18807",
        "hypervisor_avg_io_latency_usecs": "6000",
        "hypervisor_num_iops": "788",
        "hypervisor_cpu_usage_ppm": "460000",
        "hypervisor_io_bandwidth_kBps": "31636"
      }
    },",
      "powerState": "on",
      "vmName": "nfs-ubu-stress-Colossus09-1-4",
      "guestOperatingSystem": "Ubuntu Linux (64-bit)",
      "ipAddresses": [],
      "hostName": "10.3.177.183",
      "hostId": 3,
      "memoryCapacityInMB": 4096,
      "memoryReservedCapacityInMB": 0,
      "numVCpus": 2,
      "cpuReservedInHz": 0,
      "numNetworkAdapters": 1,
      "nutanixVirtualDisks": [
        "/ctr1/nfs-ubu-stress-Colossus09-1-4/nfs-ubu-stress-Colossus09-1-4.vmdk"
      ],
      "vdiskNames": [
        "NFS:18594"
      ],
      "vdiskFilePaths": [
        "/ctr1/nfs-ubu-stress-Colossus09-1-4/nfs-ubu-stress-Colossus09-1-4-flat.vmdk"
      ],
      "diskCapacityInBytes": 53687091200,
      "timeStampInUsec": 1375472003986000,
      "protectionDomianName": null,
      "consistencyGroupName": null,
      "stats": {
        "hypervisor_memory_usage_ppm": "330000",
        "avg_io_latency_usecs": "218757",
        "write_io_ppm": "1000000",
        "seq_io_ppm": "411998",
        "read_io_ppm": "0",
        "hypervisor_num_transmitted_bytes": "-1",
        "hypervisor_num_received_bytes": "-1",
        "total_transformed_usage_bytes": "0",
        "hypervisor_avg_read_io_latency_usecs": "0",
        "hypervisor_num_write_io": "15760",
        "num_iops": "113",
        "random_io_ppm": "588001",
        "total_untransformed_usage_bytes": "-1",
        "avg_read_io_latency_usecs": "-1",
        "io_bandwidth_kBps": "18807",
        "hypervisor_avg_io_latency_usecs": "6000",
        "hypervisor_num_iops": "788",
        "hypervisor_cpu_usage_ppm": "460000",
        "hypervisor_io_bandwidth_kBps": "31636"
      }
    },