Tuesday, August 20, 2013

Introducing NEDE ( Nutanix Elastic Dedup Engine)

Nutanix elastic dedup engine is
  • software driven,
  • scalable/distributed/inline data reduction technology for Flash and cache tiers
  • fingerprints on sequential write ,
  • dedup in RAM/Flash on Read.

Nutanix software is so modular, that Nutanix software development team developed this module and plugged into existing modules (cassandra/stargate).
NEDE uses no-SQL database for indexing using already existing keyspace extentgroup map . We did not  even have to create a new keyspace.

Next steps:
NEDE will utilize scale-out Map-Reduce technology already existing in NDFS for offline deduplication.

Nutanix DR will be using NEDE to reduce the amount data transferred across the WAN.

NEDE eliminates the duplicate 4K blocks, so we have more space available in hot tier and memory, for unique blocks.

We have seen the need for it as more and more of our users migrate VMs from
other storage vendors to us ( where VAAI plugin snapshot or Linked Clone can't be taken advantage)  and reduce the usage of hot tier. CBRC (http://myvirtualcloud.net/?p=3094) is of some help but 2G cache is not enough.

Convergence and Dedup:  Dedup engine is distributed, and the data/indexing is localized based
as it is VM aware and less network utilization than centralized storage/SAN/NAS solution.

Nutanix uses  fixed block dedup as vmdk is a block device which is formatted on the guest OS with specific block size. For example Windows NTFS uses 4K block size, so dedup uses fixed 4K block size.


Content Cache: (new with this feature) refers to the <SHA1 Fingerprint> to <chunkData> cache. This will be on RAM+flash.


Outline:
  • On seq writes , compute SHA1 for 4k blocks , store it in keyspace.
  •  On reads, if it has SHA1 index already, serve it from Content Cache, else read from Extent store and populate the cache.

Overhead due to dedup:

Computing the index causes 10%  additional overhead on CPU. Indexing additional storage is
less than 1% in the table.
 
Write path overview:

Seq Writes will be fingerprinted at 4K chunk size with SHA1.


Read Path Overview

  • Check the extent cache for data
  • If not, if  it has fingerprints, check Content Cache
  • if not, read from Extent store
  • if it has fingerprints in egroup map, populate the  read into content cache
  • else populate the read into Extent cache.

We have further LRU and hash tables, single touch and multiple  touch LRUs for memory
and flash chunks. I will explain this more later.

Glags Configurable:
  • stargate_content_cache_max_flash
  • stargate_content_cache_max_memory
  • stargate_content_cache_single_touch_memory_pct,
  • stargate_content_cache_single_touch_flash_pct

Metrics

http://<CVM Name or IP>:2009/h/vars?regex=stargate.dedup








http://<CVM Name or IP>:2009/h/vars?regex=content_cache


 

I will expand this blog after VMworld2013 - with various terminologies used and also
how this helps boot storms, virus scan,what percentage of hot tier usage is reduced.

Config: Create a container with Finger print option
    ID                        : 38608
    Name                      : dedup-test-container
     VStore Name(s)            : dedup-test-container
    Random I/O Pri Order      : SSD-PCIe,SSD-SATA,DAS-SATA
    Sequential I/O Pri Order  : SSD-PCIe,SSD-SATA,DAS-SATA
    Oplog Configured          : true
    Oplog Highly Available    : true
    Fingerprint On Write      : on