Nutanix cluster health is designed to detect and analyze Nutanix cluster related failures to bring more customer visibility to the issues by providing cause/impact and resolution. Nutanix cluster health utilizes plugins from NCC ( nutanix cluster check) utility.
NCC is developed by Nutanix Engineering from inputs provided by support engineers, customers, on-call engineers and solution architects. Nutanix Engineering productised their troubleshooting scripts into NCC. Nutanix Cluster health runs NCC plugins at various intervals and provides easy access to the results through UI. Nutanix customer will be able troubleshoot or identify the issues with the cluster and will result in a faster resolution. This will provide uniform troubleshooting tools across different Hypervisors.
Nutanix cluster health can be accessed from Prism Element (URL: CVM's ip address).
Nutanix Prism Element has a cluster health walk-through which shows an example disk health troubleshooting.
NCC is developed by Nutanix Engineering from inputs provided by support engineers, customers, on-call engineers and solution architects. Nutanix Engineering productised their troubleshooting scripts into NCC. Nutanix Cluster health runs NCC plugins at various intervals and provides easy access to the results through UI. Nutanix customer will be able troubleshoot or identify the issues with the cluster and will result in a faster resolution. This will provide uniform troubleshooting tools across different Hypervisors.
Nutanix cluster health can be accessed from Prism Element (URL: CVM's ip address).
Prism Element First Page - Cluster Health Access |
Here is the list of cluster health checks
List of Health Check |
CVM: -
- CPU Utilization/Load Average
- Disk - Metadata Usage (Inode)/HDD disk Usage (df) /HDD latency(sar/iostat) smartctl status/SSD latency.
- Memory committed ( /proc/meminfo)
- Network - CVM to CVM connectivity (external Vswitch), CVM to host (nutanixVswitch), Gateway config, subnet config ( verify nutanix HA network config)
- Time Drift - between CVM/Host
- CPU utilization
- Memory swap rate
- Network - 10 Gbe connectivity(vswitch and vmknic) / Nic Error Rate/Receive and Packet loss (ethtool)
- CPU utilization
- I/O latency (vdisk)
- Memory ( swap rate/usage)
- Network Rx/Tx packet loss.
nutanix@NTNX-13SM35300008-B-CVM:10.1.60.110:~$
ncli health-check ls |grep Name
Name : I/O Latency
Name : CPU Utilization
Name : Disk Metadata Usage
Name : Transmit Packet Loss
Name : CVM to CVM Connectivity
Name : Receive Packet Loss
Name : CPU Utilization
Name : CPU Utilization
Name : Transmit Packet Loss
Name : Memory Usage
Name : Receive Packet Loss
Name : HDD I/O Latency
Name : Gateway Configuration
Name : HDD S.M.A.R.T Health Status
Name : Load Level
Name : Memory Usage
Name : CVM to Host Connectivity
Name : 10 GbE Compliance
Name : Time Drift
Name : HDD Disk Usage
Name : SSD I/O Latency
Name : Memory Pressure
Name : Memory Swap Rate
Name : Subnet Configuration
Name : Memory Swap Rate
Name : Node Nic Error Rate High
Name : I/O Latency
Name : CPU Utilization
Name : Disk Metadata Usage
Name : Transmit Packet Loss
Name : CVM to CVM Connectivity
Name : Receive Packet Loss
Name : CPU Utilization
Name : CPU Utilization
Name : Transmit Packet Loss
Name : Memory Usage
Name : Receive Packet Loss
Name : HDD I/O Latency
Name : Gateway Configuration
Name : HDD S.M.A.R.T Health Status
Name : Load Level
Name : Memory Usage
Name : CVM to Host Connectivity
Name : 10 GbE Compliance
Name : Time Drift
Name : HDD Disk Usage
Name : SSD I/O Latency
Name : Memory Pressure
Name : Memory Swap Rate
Name : Subnet Configuration
Name : Memory Swap Rate
Name : Node Nic Error Rate High
Configuration of Health Checks:
- Turn Check off
- Parameters for Critical/Warn Threshold (if applicable)
- Change the schedule.
- Edit option available from CLI as well . ncli health-check edit (interval/enable/parameter-thresholds)
Each health check provides
- Cause of the failure : Example - disk running out of space
- Resolution: How to fix the issue -Example: add storage capacity or delete the data.
- Impact: What will be the user/cluster impact ?
Running the plugin manually:
NCC is superset of health check and has more plugins that can be run. NCC can be updated independently of NOS versions. ( note that the certain newer NCC plugins may be applicable only to certain NOS versions). Here are sample of few NCC options.
Future Developments:
NCC is superset of health check and has more plugins that can be run. NCC can be updated independently of NOS versions. ( note that the certain newer NCC plugins may be applicable only to certain NOS versions). Here are sample of few NCC options.
ncc health check |
List of Network Checks that can be run ( intrusive checks will affect the performance of the cluster.)
ncc network_checks |
Sample of running a network_check .
- Unification of Cluster health, Alerts and Events.
- Impact, Cause and Resolution updated via NCC updates independent of NOS upgrade.
- Provide finer Root Cause analysis and more insights into cluster health.
With vkernel for management and for vmotion in the same vlan/network and limit with NIOC on vmotion network Nutanix health
ReplyDeleteDescription: Verify whether the Controller VM is uplinked to the 10 GbE NIC.
triggered showing vmotion ips and high latency?
Autopath tried to use vmotion vmkernel instead of management?
one of the vmkernel ports should have same subnet as CVM subnet, then autopath will work.
Deletei understand for autopath i have to have cvm and vmkernel in the same network.
ReplyDeleteduring cvm failure on host scripts add esxcfg-route to another CVM external interface.
and in autopath 2.0 i can use different networks.
my question was about why nutanix choose different vmkernel? with NIOC i limit vmotion vmkernel speed...
can you send me the output of esxcfg-vmknic -l (esxi) and svmips (from CVM)
ReplyDeleteThis type of message always inspiring and I prefer to read quality content, so happy to find good place to many here in the post, the writing is just great, thanks for the post.
ReplyDeleteHerbal Remedy for Anxiety
I have read your article, it is very informative and helpful for me.I admire the valuable information you offer in your articles. Thanks for posting it..
ReplyDeleteXTRESIA
Pretty good post. I just stumbled upon your blog and wanted to say that I have really enjoyed reading your blog posts. Any way I'll be subscribing to your feed and I hope you post again soon. Big thanks for the useful info.
ReplyDeletevigrx oil reviews
Your music is amazing. You have some very talented artists. I wish you the best of success.
ReplyDeleteThebesian veins
thank you for the blog visit us forSAN Solutions in Dubai
ReplyDeleteVery significant Information for us, I have think the representation of this Information is actually superb one. This is my first visit to your site. Corporate Health Checks Sydney
ReplyDelete