Tuesday, January 27, 2015

NCC - The Swiss Army Knife of Nutanix Troubleshooting Tools.



The Swiss Army knife is a pocket size multi-tool which equips you for everyday challenges.  NCC equips you with multiple Nutanix troubleshooting tools in one package.

NCC provides multiple utilities (plugins ) for the Nutanix Infrastructure administrator to
  • check the health of Hypervisor, Nutanix Cluster Components, Network and Hardware
  • identify misconfiguration that can cause performance issues.
  • collect the logs for specific time period and components.
  • if needed, ability to execute NCC automatically and email the results at certain configurable time interval.
NCC is developed by Nutanix Engineering based on inputs provided by support engineers, customers, on-call engineers and solution architects.  NCC helps the Nutanix customer to identify the problem and fix the problem or report it to Nutanix Support. NCC enables faster problem resolution by reducing the time taken to triage an issue.

When should we run NCC ?
  • after a new install.
  • Before and After any cluster activities - add node, remove node, reconfiguration and an upgrade
  • anytime when you are troubleshooting an issue.


As mentioned in the cluster health blog, NCC is the collector agent for cluster health.

Roadmap for NCC 1.4( Feb 2015) and beyond:
  • infrastructure for alert ( ability to configure email, frequency of the checks) - (Update : NOS 4.1.3/ NCC 2.0 - Jun 2015)
  • ability to dynamically add new checks to the cluster health or alert framework after the NCC upgrade. 
  • Run NCC from the Prism UI and upload the log collector output to Nutanix FTP site.
  • Link to the KB that provides details for the specific failures. ( Fix-it-yourself) - ( Update: NOS 4.1.x/NCC 1.4.1)
  • Re-run only the failed tests.
  • Link pre-upgrade checks to the latest NCC checks. 


a. Download and upgrade NCC to 1.3.1 
http://portal.nutanix.com -> Downloads -> Tools and Firmware

NCC 1.3.1(latest version as of Jan27 '2015) will be the default version bundled with the NOS 4.1.1.
1. Upgrading NCC from CLI:
  • Jeromes-Macbook:~ jerome$ scp Downloads/nutanix-ncc-1.3.1-latest-installer.sh nutanix@10.1.65.100:
  • Login to CVM and run "chmod u+x nutanix-ncc-1.3.1-latest-installer.sh;./nutanix-ncc-1.3.1-latest-installer.sh
2. NCC Upgrade UI screenshot: (NCC 1.4 can be upgraded via UI from NOS 4.1.1.  )

b. Executing NCC healthchecks:

1. ncc health_checks - shows the list of health checks



2. Execute "ncc health_checks run_all" and monitor for messages other than PASS.

3. List of NCC Status


4. Results of a NCC check run on a lab cluster


5. Displaying and analyzing the failed tests.



FAILUREs are due to sub-optimal CVM memory and network errors. So to fix the issue
- increase CVM memory to 16G or more. (KB: 1513 -https://portal.nutanix.com/#/page/kbs/details?targetId=kA0600000008djKCAQ )
- check the network (rx_missed_errors -- check for network port flaps, network driver issues- KB 1679 and KB 1381)

c. Log Collector Feature of NCC: ( similar to show tech_support of Cisco or vm-support of VMware)

NCC Log collector collects the logs from all the CVMs in parallel.

1. Execute ncc log_collector to find the list of logs that will be collected.



2. To collect all the logs for last 4 hours -  ncc log_collector run_all
For example: stargate.INFO will have the time period when it is collected:
****Log Collector Start time = 2015/01/27-09:19:15 End time = 2015/01/27-13:19:15 ****

3. to anonymize ( ip address/cluster name/username)  the logs 
"ncc log_collector --anonymize_output=true  run_all"
The directory listing within the tar bundle:
nutanix@NTNX-13SM15010001-C-CVM::~/data/log_collector/NCC-logs-2015-01-27-640-1422393524$ ls
cluster_config  xx.yy.64.165-logs  xx.yy.65.100-logs  xx.yy.65.98-logs
command.txt     xx.yy.64.166-logs  xx.yy.65.101-logs  xx.yy.65.99-logs

4. To collect logs only from the component stargate  "ncc log_collector cvm_logs --component_list=stargate"

Additional Options:




5. To collect logs only from one CVM "ncc log_collector --cvm_list=10.1.65.100 alerts"

6. More options and filters:




d. Auto-run of NCC healthchecks and email of the results

Verify if Email alert is enabled:
nutanix@NTNX-13SM15010001-C-CVM:~$ ncli alert get-alert-config
    Alert Email Status        : Enabled
    Send Email Digest         : true
    Enable Default Nutanix... : true
    Default Nutanix Email     : al@nutanix.com
    Email Contacts            : j@nutanix.com
    SMTP Tunnel Status        : success
    Service Center            : n.nutanix.net
    Tunnel Connected Since    : Wed Jan 21 13:58:03 PST 2015
Enable auto-run of NCC
ncc --set_email_frequency=24
Verify the config:

ncc --show_email_config     
[ info ] NCC is set to send email every 24 hrs.
[ info ] NCC email has not been sent after last configuration.
[ info ] NCC email configuration was last set at 2015-01-27 13:47:32.956196.


Sample Email: