Nutanix elastic dedup engine is
Nutanix software is so modular, that Nutanix software development team developed this module and plugged into existing modules (cassandra/stargate).
NEDE uses no-SQL database for indexing using already existing keyspace extentgroup map . We did not even have to create a new keyspace.
Next steps:
NEDE will utilize scale-out Map-Reduce technology already existing in NDFS for offline deduplication.
Nutanix DR will be using NEDE to reduce the amount data transferred across the WAN.
NEDE eliminates the duplicate 4K blocks, so we have more space available in hot tier and memory, for unique blocks.
We have seen the need for it as more and more of our users migrate VMs from
other storage vendors to us ( where VAAI plugin snapshot or Linked Clone can't be taken advantage) and reduce the usage of hot tier. CBRC (http://myvirtualcloud.net/?p=3094) is of some help but 2G cache is not enough.
Convergence and Dedup: Dedup engine is distributed, and the data/indexing is localized based
as it is VM aware and less network utilization than centralized storage/SAN/NAS solution.
Nutanix uses fixed block dedup as vmdk is a block device which is formatted on the guest OS with specific block size. For example Windows NTFS uses 4K block size, so dedup uses fixed 4K block size.
Content Cache: (new with this feature) refers to the <SHA1 Fingerprint> to <chunkData> cache. This will be on RAM+flash.
Outline:
Overhead due to dedup:
Computing the index causes 10% additional overhead on CPU. Indexing additional storage is
less than 1% in the table.
Write path overview:
Seq Writes will be fingerprinted at 4K chunk size with SHA1.
Read Path Overview
We have further LRU and hash tables, single touch and multiple touch LRUs for memory
and flash chunks. I will explain this more later.
Glags Configurable:
Metrics
I will expand this blog after VMworld2013 - with various terminologies used and also
how this helps boot storms, virus scan,what percentage of hot tier usage is reduced.
Config: Create a container with Finger print option
ID : 38608
Name : dedup-test-container
VStore Name(s) : dedup-test-container
Random I/O Pri Order : SSD-PCIe,SSD-SATA,DAS-SATA
Sequential I/O Pri Order : SSD-PCIe,SSD-SATA,DAS-SATA
Oplog Configured : true
Oplog Highly Available : true
Fingerprint On Write : on
- software driven,
- scalable/distributed/inline data reduction technology for Flash and cache tiers
- fingerprints on sequential write ,
- dedup in RAM/Flash on Read.
Nutanix software is so modular, that Nutanix software development team developed this module and plugged into existing modules (cassandra/stargate).
NEDE uses no-SQL database for indexing using already existing keyspace extentgroup map . We did not even have to create a new keyspace.
Next steps:
NEDE will utilize scale-out Map-Reduce technology already existing in NDFS for offline deduplication.
Nutanix DR will be using NEDE to reduce the amount data transferred across the WAN.
NEDE eliminates the duplicate 4K blocks, so we have more space available in hot tier and memory, for unique blocks.
We have seen the need for it as more and more of our users migrate VMs from
other storage vendors to us ( where VAAI plugin snapshot or Linked Clone can't be taken advantage) and reduce the usage of hot tier. CBRC (http://myvirtualcloud.net/?p=3094) is of some help but 2G cache is not enough.
Convergence and Dedup: Dedup engine is distributed, and the data/indexing is localized based
as it is VM aware and less network utilization than centralized storage/SAN/NAS solution.
Nutanix uses fixed block dedup as vmdk is a block device which is formatted on the guest OS with specific block size. For example Windows NTFS uses 4K block size, so dedup uses fixed 4K block size.
Content Cache: (new with this feature) refers to the <SHA1 Fingerprint> to <chunkData> cache. This will be on RAM+flash.
Outline:
- On seq writes , compute SHA1 for 4k blocks , store it in keyspace.
- On reads, if it has SHA1 index already, serve it from Content Cache, else read from Extent store and populate the cache.
Overhead due to dedup:
Computing the index causes 10% additional overhead on CPU. Indexing additional storage is
less than 1% in the table.
Write path overview:
Seq Writes will be fingerprinted at 4K chunk size with SHA1.
Read Path Overview
- Check the extent cache for data
- If not, if it has fingerprints, check Content Cache
- if not, read from Extent store
- if it has fingerprints in egroup map, populate the read into content cache
- else populate the read into Extent cache.
We have further LRU and hash tables, single touch and multiple touch LRUs for memory
and flash chunks. I will explain this more later.
Glags Configurable:
- stargate_content_cache_max_flash
- stargate_content_cache_max_memory
- stargate_content_
cache_single_touch_memory_pct, - stargate_content_
cache_single_touch_flash_pct
Metrics
http://<CVM Name or IP>:2009/h/vars?regex=stargate.dedup
|
http://<CVM Name or IP>:2009/h/vars?regex=content_cache
|
I will expand this blog after VMworld2013 - with various terminologies used and also
how this helps boot storms, virus scan,what percentage of hot tier usage is reduced.
Config: Create a container with Finger print option
ID : 38608
Name : dedup-test-container
VStore Name(s) : dedup-test-container
Random I/O Pri Order : SSD-PCIe,SSD-SATA,DAS-SATA
Sequential I/O Pri Order : SSD-PCIe,SSD-SATA,DAS-SATA
Oplog Configured : true
Oplog Highly Available : true
Fingerprint On Write : on