Symptoms
When a previous Update operation has failed:
- You cannot start a new Update operation.
- Triggering a new Update operation fails with the below error:
Operation not allowed for NSX Edge cluster of Cluster 'domain' in 'UPDATE_FAILED' state.
Purpose
This article provides information on how to cleanup NSX Edge VMs in cases when the Update operation fails for the vCenter NSX Edge Cluster.
Resolution
This is a known issue.
Currently, there is no resolution.
Currently, there is no resolution.
Workaround
Cleaning up after Update failures involves two steps:
Using NSX Manager UI
Note: NSX Manager UI can be accessed through https://NSX-Manager-IP-Address or via NSX plugin in vSphere Client UI.
NSXD Database Cleanup
Once the above steps are complete, you should be able to update the Edge Cluster to add Edge Nodes using the same configuration as before or a new configuration.
- Cleanup in the NSX Manager
- Cleanup in the vCenter Server
Cleanup in the NSX Manager
Using NSX Manager API- Retrieve the list of edge transport-nodes with this command:
API - GET https://NSX-Manager-IP-Address/api/v1/transport-nodes?node_types=EdgeNode
- Using the output, select the edge transport-nodes whose names match the edge VM names, and retrieve the respective transport-node ids.
{
"results": [
{
"node_id": "cfb25f8a-bfed-418b-aed2-e69b306d8673", >>>>>>>>>>> retrieve node-id
"host_switch_spec": {
"host_switches": [
{
"host_switch_name": "overlaySw",
...
"node_settings": {
"hostname": "edge1.domain.com",
"dns_servers": [
"10.162.204.1",
"10.166.1.1"
],
"enable_ssh": true,
"allow_ssh_root_login": true
},
"resource_type": "EdgeNode",
"id": "cfb25f8a-bfed-418b-aed2-e69b306d8673",
"display_name": "edge1", <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< match edge VM name
"external_id": "cfb25f8a-bfed-418b-aed2-e69b306d8673",
"ip_addresses": [
"10.160.141.28"
],
...
"results": [
{
"node_id": "cfb25f8a-bfed-418b-aed2-e69b306d8673", >>>>>>>>>>> retrieve node-id
"host_switch_spec": {
"host_switches": [
{
"host_switch_name": "overlaySw",
...
"node_settings": {
"hostname": "edge1.domain.com",
"dns_servers": [
"10.162.204.1",
"10.166.1.1"
],
"enable_ssh": true,
"allow_ssh_root_login": true
},
"resource_type": "EdgeNode",
"id": "cfb25f8a-bfed-418b-aed2-e69b306d8673",
"display_name": "edge1", <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< match edge VM name
"external_id": "cfb25f8a-bfed-418b-aed2-e69b306d8673",
"ip_addresses": [
"10.160.141.28"
],
...
- Delete each of the edge transport-nodes using the node-ids:
API - `DELETE https://<NSX-Manager-IP-Address/api/v1/transport-nodes/<edge-transport-node-id>`
Using NSX Manager UI
Note: NSX Manager UI can be accessed through https://NSX-Manager-IP-Address or via NSX plugin in vSphere Client UI.
- Go to NSX H5 plugin > System > Fabric > Transport nodes > Edge Transport Nodes.
- Match the Edge VM names with the Edge field in the Edge Transport nodes page.
- Verify that the Edge VMs to delete are not already part of the Edge Cluster by checking that Edge Cluster field for that Edge VM is blank.
- Delete the two Edge transport nodes corresponding to the VMs specified in the failed update by using the DELETE option.
- The above step is expected to delete the corresponding VMs from vCenter Server as well.
Cleanup in the VC
If the failed VMs still show up in the vSphere Client UI even after deleting from the NSX Manager, manually delete the two VMs using vSphere Client UI.NSXD Database Cleanup
- Identify the cluster-id corresponding to the given cluster. This can be obtained by either of these options:
- Browsing to the given vCenter Cluster by going to vCenter Server MOB at https://VC-IP/mob
- From the vSphere Client UI, navigate to the given cluster and note down the part of the URL string with the pattern domain-X123.
- Connect to the VCSA with SSH and a root user.
- Stop WCP service:
vmon-cli --stop wcp
- Access NSX Database:
/opt/vmware/vpostgres/current/bin/psql -d VCDB -U postgres -w
- After entering the VCDB=# prompt:
- Find the nsxd.edge_configuration' table entry corresponding to the given cluster:
select * from nsxd.edge_configuration where cluster_id='domain-<X123>';
- Verify that inprogress_update_spec column is non-empty.
- Set the inprogress_update_spec field to NULL.
UPDATE nsxd.EDGE_CONFIGURATION SET INPROGRESS_UPDATE_SPEC=NULL WHERE CLUSTER_ID='domain-X123';
Make note of the Edge VM names used in the failed update.
- Check if there is a corresponding entry in the nsxd.EDGE_VM table for the given cluster
SELECT * FROM nsxd.EDGE_VM where cluster_id='domain-X123';
- If entries corresponding to the failed VMs exist in the nsxd.EDGE_VM table, selectively delete them (Make sure to delete ONLY the entries corresponding to the failed VM names).
DELETE FROM nsxd.EDGE_VM where cluster_id='domain-X123' and name_in_spec='VM-NAME-TO-DELETE';
- Verify that only the Edge VMs that have previously been successfully been deployed and configured have entries in the nsxd.EDGE_VM table.
- Restart WCP service
vmon-cli --start wcp
Once the above steps are complete, you should be able to update the Edge Cluster to add Edge Nodes using the same configuration as before or a new configuration.
Comments
Post a Comment