Monday, January 12, 2015

Nutanix Shadow Clones



Introduction:

Nutanix shadow clones were introduced in NOS 3.2 to provide a quicker response for the read-only multi-reader vmdks that are accessed from multiple hosts. From NOS 3.5.5 and 4.0.2 onwards, the shadow clones feature is enabled by default.
Examples of multi-reader vmdks are - linked clone replicas in VMware view/ MCS Master image in Xendesktop.


Design Info:

Each VM comprises of vdisk/vmdks. Based on where it was powered on or accessed for the first time, the Controller VM (vdisk_controller) of that host will manage that vdisk. This provides data locality, reduced network traffic and many other benefits.

With multi-reader vmdks, multiple hosts can access (read) a vdisk. Therefore, the read requests from the multiple nodes has to be managed by that one controller VM. NFS server/vdisk controller process (stargate) is capable of handling multiple read requests from different nodes.

However, it is preferable for the stargate to be able to detect the multi-reader vmdks automatically and create “shadow” clone on-demand.

Before the shadow clone feature is enabled:

  • Master replica is owned by one cvm
  • All reads will be redirected to that cvm
  • extent cache of that cvm will be populated
  • No data localization on the remote cvm from where the read came.

After the shadow clone feature is enabled:

  • master replica will be owned by one CVM

  • When a remote linked-clone VM access the replica, the shadow clone of that replica vdisk will be created via zero-copy snapshot on-demand and will be owned by the remote CVM.
  • new reads will populate the extent cache and this extent cache will be shared by all vdisks in that cvm. Least recently used(LRU) data will be evicted from the extent cache, if needed.
  • if there is a write to the master replica, then the shadow clones are destroyed.
  • if VMs migrate to another node, either a new shadow clone is created or it will use the existing shadow clone owned by vdisk_controller(CVM) of that node.
  • Logic: if a vdisk has reads from 2 or more nodes AND 
    "no writes for 300 seconds OR number of reads exceeds 100", 
    then shadow clone is created
     This can be changed via gflags ( Note: Do not change gflags without consulting Nutanix support)
--stargate_nfs_adapter_read_shadow_remote_read_threshold=100
--stargate_nfs_adapter_read_shadow_threshold_secs=300
 --stargate_nfs_adapter_read_shadow_min_remote_nodes=2

 
Configuration:
  1. to verify if the shadow clone feature is enabled

ncli cluster get-params|grep -i shadow

Shadow Clones Status      : Enabled
Other way to verify,
zeus_config_printer |grep shad
shadow_clones_enabled: true

  1. If it is disabled, enable with ncli cluster edit-params enable-shadow-clones=true
  2. If you have linked clone environment, find the master replica disk’s vdisk_id using the following command

vdisk_config_printer ( vdisk_config_printer |grep -B 12 shadow|grep -B 10 -A 3 replica)

          vdisk_id: 58810154  <<<<<<
          vdisk_name: "NFS:58810154"
          vdisk_size: 4398046511104
          container_id: 1519
          creation_time_usecs: 1406123698498177
           vdisk_creator_loc: 4
           vdisk_creator_loc: 58557595
           vdisk_creator_loc: 34032844
           nfs_file_name: "replica-4e5bf2ad-c5d2-4e9d-8a91-f7b17389872e_7-flat.vmdk"  <<<<<<
           may_be_parent: true
           shadow_read_requests: true  <<<<
  1. Verify that shadow clones are created for vdisk id 58810154. As you see, parent vdisk id is replica vdisk (58810154). shadow clone vdisks are created on the different cvms ( @10, @6 are cvm ids)
vdisk_config_printer |grep 58810154
vdisk_name: "NFS:58810154#58810154@10"
parent_vdisk_id: 58810154
vdisk_name: "NFS:58810154#58810154@6"
parent_vdisk_id: 58810154
vdisk_name: "NFS:58810154#58810154@25525172"
parent_vdisk_id: 58810154
vdisk_name: "NFS:58810154#58810154@25525291"
parent_vdisk_id: 58810154
  1. The properties of the shadow clone ( creates an Immutable shadow). Here i selected for the one on the cvm id 10.
vdisk_config_printer |grep NFS:58810154#58810154@10 -B 2 -A 12
vdisk_id: 58825863
vdisk_name: "NFS:58810154#58810154@10"
parent_vdisk_id: 58810154
vdisk_size: 4398046511104
container_id: 1519
creation_time_usecs: 1406125098160169
mutability_state: kImmutableShadow  <<<<<<<
vdisk_creator_loc: 4
vdisk_creator_loc: 58557595
vdisk_creator_loc: 34032844
nfs_file_name: "replica-4e5bf2ad-c5d2-4e9d-8a91-f7b17389872e_7-flat.vmdk"
generate_vblock_copy: true
parent_nfs_file_name_hint: "replica-4e5bf2ad-c5d2-4e9d-8a91-f7b17389872e_7-flat.vmdk"

Performance Gains:
The shadow clone will improve the bootstorm as well as the reads from a linked clone VM
Additional Info with regards to performance gains:
Andre’s Blog - http://myvirtualcloud.net/?p=5979
Kees Baggerman's Blog - AppVolume Performance improvements