1. Cluster create -
cluster discover-nodes
- Network connectivity - all the nodes in same broadcast domain
- /etc/nutanix/factory_config.json - has right config - node location, serial number
- avahi-browser (/var/log/messages)
- genesis running
2. Cluster start - three main process that needs to work and svm boot , rest of the process - prism, pithos,
even stargate will startup if these process work.
a. Genesis - Logs in data/logs/genesis.out
b. Zookeeper - data/logs/zookeeper.out
c. cassandra - data/logs/cassandra/system.log
d. svm_boot - /usr/local/nutanix/bootstrap.log
d. svm_boot - /usr/local/nutanix/bootstrap.log
3. Genesis issues:
a. ESXi password
b. ESXi network /internal vswitch
c. Genesis was started as root
pkill genesis and rm /home/nutanix/data/locks/genesis, chown genesis.out to nutanix
c. Wrong zookeeper config
E0725 13:15:55.644642 14286 configuration_validator.cc:532] Zeus config check failed: invalid management_server_name 192.168.5.1
F0725 13:15:55.646738 14286 zeus.cc:698] Check failed: validator->config_valid() logical_timestamp: 44
F0725 13:15:55.646738 14286 zeus.cc:698] Check failed: validator->config_valid() logical_timestamp: 44
4. zookeeper fails to start
a. /etc/hosts on all the hosts does not have same zookeeper nodes (zk1/zk2 and zk3)
https://nutanix.atlassian.net/browse/ENG-9697
b. snapshot corruption:
c. edit_zeus was used , there was a misconfiguration or zeus_config validation bug
5. cassandra failures
- SSD tier full/ SSD tier not accessible
- cassandra configs did not reset during cluster destroy -
ERROR [main] 2012-06-19 14:21:03,868 AbstractCassandraDaemon.java (line 154) Fatal exception during initialization
org.apache.cassandra.config.ConfigurationException: enable_cluster_name_change is true but saved cluster name '1131' does not match the allowed prior name 'Test Cluster' and the configured name '1132'
org.apache.cassandra.config.ConfigurationException: enable_cluster_name_change is true but saved cluster name '1131' does not match the allowed prior name 'Test Cluster' and the configured name '1132'
- timestamp in future.
6. CVM boot failures
svm_boot - disk inventory, formatting and mounting disks , vmx configs ( repopulating vmx disk entries)
Marker : /tmp/svm_boot_succeeded
/usr/local/nutanix/bootstrap/log/svm_boot.log/gen2_svm_boot.log
/usr/local/nutanix/bootstrap/bin/svm_boot_reboot.dat ( this prevents svm_boot to be run)
thank you for the blog visit us forSAN Solutions in Dubai
ReplyDelete