Skip to main content

Cleanup NSX Edge VMs after an Update failure of vCenter NSX Edge Cluster

  Symptoms

When a previous Update operation has failed:
  • You cannot start a new Update operation.
  • Triggering a new Update operation fails with the below error:
Operation not allowed for NSX Edge cluster of Cluster 'domain' in 'UPDATE_FAILED' state.
Purpose
This article provides information on how to cleanup NSX Edge VMs in cases when the Update operation fails for the vCenter NSX Edge Cluster.
Resolution
This is a known issue.

Currently, there is no resolution.
Workaround
Cleaning up after Update failures involves two steps:
  • Cleanup in the NSX Manager
  • Cleanup in the vCenter Server

Cleanup in the NSX Manager

Using NSX Manager API
  1. Retrieve the list of edge transport-nodes with this command: 
API - GET https://NSX-Manager-IP-Address/api/v1/transport-nodes?node_types=EdgeNode
  1. Using the output, select the edge transport-nodes whose names match the edge VM names, and retrieve the respective transport-node ids.
{
 "results": [
 {
 "node_id": "cfb25f8a-bfed-418b-aed2-e69b306d8673", >>>>>>>>>>> retrieve node-id
 "host_switch_spec": {
 "host_switches": [
 {
 "host_switch_name": "overlaySw",
...
 "node_settings": {
 "hostname": "edge1.domain.com",
 "dns_servers": [
 "10.162.204.1",
 "10.166.1.1"
 ],
 "enable_ssh": true,
 "allow_ssh_root_login": true
 },
 "resource_type": "EdgeNode",
 "id": "cfb25f8a-bfed-418b-aed2-e69b306d8673",
 "display_name": "edge1",    <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< match edge VM name
 "external_id": "cfb25f8a-bfed-418b-aed2-e69b306d8673",
 "ip_addresses": [
 "10.160.141.28"
 ],
...
  1. Delete each of the edge transport-nodes using the node-ids:
API - `DELETE https://<NSX-Manager-IP-Address/api/v1/transport-nodes/<edge-transport-node-id>`

Using NSX Manager UI

Note: NSX Manager UI can be accessed through https://NSX-Manager-IP-Address or via NSX plugin in vSphere Client UI.
  1. Go to NSX H5 plugin > System > Fabric > Transport nodes > Edge Transport Nodes.
  2. Match the Edge VM names with the Edge field in the Edge Transport nodes page.
  3. Verify that the Edge VMs to delete are not already part of the Edge Cluster by checking that Edge Cluster field for that Edge VM is blank.
  4. Delete the two Edge transport nodes corresponding to the VMs specified in the failed update by using the DELETE option.
  5. The above step is expected to delete the corresponding VMs from vCenter Server as well.

Cleanup in the VC

If the failed VMs still show up in the vSphere Client UI even after deleting from the NSX Manager, manually delete the two VMs using vSphere Client UI.

NSXD Database Cleanup
  1. Identify the cluster-id corresponding to the given cluster. This can be obtained by either of these options:
  • Browsing to the given vCenter Cluster by going to vCenter Server MOB at https://VC-IP/mob
  • From the vSphere Client UI, navigate to the given cluster and note down the part of the URL string with the pattern domain-X123.
  1. Connect to the VCSA with SSH and a root user.
  2. Stop WCP service:
vmon-cli --stop wcp
  1. Access NSX Database:
/opt/vmware/vpostgres/current/bin/psql -d VCDB -U postgres -w
  1. After entering the VCDB=# prompt:
    1. Find the nsxd.edge_configuration' table entry corresponding to the given cluster:
select * from nsxd.edge_configuration where cluster_id='domain-<X123>';
  1. Verify that inprogress_update_spec column is non-empty.
  2. Set the inprogress_update_spec field to NULL.
UPDATE nsxd.EDGE_CONFIGURATION SET INPROGRESS_UPDATE_SPEC=NULL WHERE CLUSTER_ID='domain-X123';
 
Make note of the Edge VM names used in the failed update.
  1. Check if there is a corresponding entry in the nsxd.EDGE_VM table for the given cluster
SELECT * FROM nsxd.EDGE_VM where cluster_id='domain-X123';
  1. If entries corresponding to the failed VMs exist in the nsxd.EDGE_VM table, selectively delete them (Make sure to delete ONLY the entries corresponding to the failed VM names).
DELETE FROM nsxd.EDGE_VM where cluster_id='domain-X123' and name_in_spec='VM-NAME-TO-DELETE';
  1. Verify that only the Edge VMs that have previously been successfully been deployed and configured have entries in the nsxd.EDGE_VM table.
 
  1. Restart WCP service
vmon-cli --start wcp

Once the above steps are complete, you should be able to update the Edge Cluster to add Edge Nodes using the same configuration as before or a new configuration.

Comments

Popular posts from this blog

"Failed to configure vAPI Endpoint Service at the firstboot time" while installing Windows VC 6.5

  Symptoms While configuring the vAPI EndPoint Service, you experience these symptoms: Windows vCenter Server 6.5 installation fails while configuring the vAPI EndPoint Service vCenter Server 6.5 installation on a Windows Server fails during the vAPI EndPoint Service during the firstboot time. You see the error: Error: An error occurred while starting service 'vapi-endpoint'. Failed to start the vAPI Endpoint Service. Failed to configure vAPI Endpoint Service at the firstboot time. Please file a bug against VAPI   In vapi_firstboot.py_2948_stderr.log file, you see entries similar to: No valid files with pathname: C:\ProgramData\VMware\vCenterServer\logs\vapi\endpoint* found. ERROR starting vapi-endpoint rc: 2, stdout: , stderr: Start service request failed. Error: Service crashed while starting^M vapi firstboot failed Traceback (most recent call last): File "C:\Program Files\VMware\vCenter Server\firstbo...

Cloning and converting virtual machine disks with vmkfstools

 Purpose This article provides information and instructions on the use of the vmkfstools command to convert virtual machine disks from one type to another. Resolution The vmkfstools command offers the ability to clone virtual machine content and also convert from one virtual machine disk ( .vmdk ) format into another. Note : The host operating system chosen to perform the conversion may not necessarily support running of virtual machines via the output format defined. vmkfstools maintains the possibility of exporting virtual disks for use in other VMware products which support alternative disk formats. To convert a virtual machine disk from one type to another: Shut down the virtual machine. Virtual machine disk files are locked while in-use by a running virtual machine. Log in to the VMware vSphere Management Assistant (v...

Troubleshooting vpxd service on Windows vCenter Server

  Symptoms You cannot connect to VMware vCenter Server with the vSphere Client. You cannot see the VMware vCenter Server in the inventory in the vSphere Web Client. You see a Microsoft Windows Event error associated with IIS similar to: Event properties - Event 7024, Service Control Manager The VMware VirtualCenter Server service terminated with service-specific error The system cannot find the file specified.. Log Name: System Source: Service Control Event ID: 7024 Level: Error Note : A windows Event ID 1000 may also be reported in relation to this issue.   Connecting to vCenter Server fails with the error: Cannot connect to host server_name : No connection could be made because the target machine actively refused it.   Attempting to start the VMware VirtualCenter Server service fails. You see this error: Windows could not start the VMware VirtualCenter Server service on...