Skip to main content

Investigating virtual machine file locks on ESXi

 


 

  Details

  • Adding an existing virtual machine disk (VMDK) to a virtual machine that is already powered on fails.
                Failed to add disk scsi0:1. Failed to power on scsi0:1
 
  • Powering on the virtual machine results in the power on task remaining at 95% indefinitely.
  • Cannot power on the virtual machine after deploying it from a template.
  • Powering on a virtual machine fails with an error:
    • Unable to open Swap File
    • Unable to access a file since it is locked
    • Unable to access a file <filename> since it is locked
    • Unable to access Virtual machine configuration
  • In the /var/log/vmkernel log file, you see entries similar to:

    WARNING: World: VM xxxx: xxx: Failed to open swap file <path>: Lock was not free
    WARNING: World: VM xxxx: xxx: Failed to initialize swap file <path>
     
  • When opening a console to the virtual machine, you may receive the error:

    Error connecting to <path><virtual machine>.vmx because the VMX is not started
     
  • The virtual machine reports conflicting power states between vCenter Server and the ESXi host UI.
  • Attempting to view or open the .vmx file using a text editor (ex: cat or vi), reports an error similar to:

    cat: can't open '[name of vm].vmx': Invalid argument
Solution
The Purpose of File Locking
 
To prevent concurrent changes to critical virtual machine files and file systems, ESXi hosts establish locks on these files. In certain circumstances, these locks may not be released when the virtual machine is powered off. As such, the files cannot then be accessed by other ESXi hosts while locked and the virtual machine fails to power on.

Virtual machine files locked during runtime include:
  • VMNAME.vswp
  • DISKNAME-flat.vmdk
  • DISKNAME-ITERATION-delta.vmdk
  • VMNAME.vmx
  • VMNAME.vmxf
  • vmware.log
Initial Quick Test
  1. Put DRS in maintenance mode, so it will allow you to choose the host when you attempt to power on. If DRS is not in use, then migrate the VM to another host
  2. If unsuccessful, continue to attempt a power on of the virtual machine on other hosts within the cluster
  3. When the VM resides on the host holding the file locks, the virtual machine should power on
  4. If you still cannot power on the virtual machine, continue with the steps below to investigate in more detail
 

ESXi Troubleshooting Steps ( Identifying the Locked File)

 
To identify the locked file, attempt to power on the virtual machine. During the power on process, an error may display or be written to the virtual machine's logs. The error and the log entry identify the virtual machine and files:
  1. To find the locking host, run vmfsfilelockinfo from the host experiencing difficulties with one or more locked files:

 

        To find out the IP address of the host holding the lock, run vmfsfilelockinfo on the VMDK flat, delta, or sesparse file for VMFS, or the .UUID.lck file for vSAN. vmfsfilelockinfo  takes these parameters:
 

·       File being tested

·       Username and password for accessing VMware vCenter Server (when tracing MAC address to ESX host.)

For example:

Run this command:


~ # vmfsfilelockinfo -p /vmfs/volumes/iscsi-lefthand-2/VM1/VM1_1-000001-delta.vmdk -v 192.168.1.10 -u administrator@vsphere.local

You see ouput similar to:

vmfsflelockinfo Version 1.0
Looking for lock owners on "VM1_1-000001-delta.vmdk"
"VM1_1-000001-delta.vmdk" is locked in Exclusive mode by host having mac address ['xx:xx:xx:xx:xx:xx']
Trying to make use of Fault Domain Manager
----------------------------------------------------------------------
Found 0 ESX hosts using Fault Domain Manager.
----------------------------------------------------------------------
Could not get information from Fault domain manager
Connecting to 192.168.1.10 with user administrator@vsphere.local
Password: xXxXxXxXxXx
----------------------------------------------------------------------
Found 3 ESX hosts from Virtual Center Server.
----------------------------------------------------------------------
Searching on Host 192.168.1.178
Searching on Host 192.168.1.179
Searching on Host 192.168.1.180
MAC Address : xx:xx:xx:xx:xx:xx

Host owning the lock on the vmdk is 192.168.1.180, lockMode : Exclusive

Total time taken : 0.27 seconds.

Note: During the life-cycle of a powered on virtual machine, several of its files transitions between various legitimate lock states. The lock state mode indicates the type of lock that is on the file. The list of lock modes is:
 

·       mode 0 = no lock

·       mode 1 = is an exclusive lock (vmx file of a powered on virtual machine, the currently used disk (flat or delta), *vswp, and so on.)

·       mode 2 = is a read-only lock (For example on the ..-flat.vmdk of a running virtual machine with snapshots)

·       mode 3 = is a multi-writer lock (For example used for MSCS clusters disks or FT VMs)

 

2.     To get the name of the process holding the lock, run the lsof command on the host holding the lock and filter the output for the file name in question:

~ # lsof | egrep 'Cartel|VM1_1-000001-delta.vmdk'

You see output similar to:


Cartel | World name | Type | fd | Description
36202 vmx FILE 80 /vmfs/volumes/556ce175-7f7bed3f-eb72-000c2998c47d/VM1/VM1_1-000001-delta.vmdk


This shows that the file is locked by a virtual machine having Cartel ID 36202. Now display the list of active Cartel IDs by executing this command:

~ # esxcli vm process list

This displays information for active virtual machines grouped by virtual machine name and having a format similar to:

Alternate_VM27
World ID: 36205
Process ID: 0
VMX Cartel ID: 36202
UUID: 56 4d bd a1 1d 10 98 0f-c1 41 85 ea a9 dc 9f bf
Display Name: Alternate_VM27
Config File: /vmfs/volumes/556ce175-7f7bed3f-eb72-000c2998c47d/Alternate_VM27/Alternate_VM27.vmx
………


The virtual machine entry having VMX Cartel ID 36202 shows the display name of the virtual machine holding the lock on file VM1_1-000001-delta.vmdk, which in this example, is Alternate_VM27.
 

If no processes are shown, the following script can search for VMs with the vmdk mounted. Replace VMDKS_TO_LOOK_FOR with needed vmdk. It will list all registered VMs and the VMDK will be displayed below the offending VM:

for i in $(vim-cmd vmsvc/getallvms | grep -v Vmid | awk -F "/"  '{print $2}' | awk '{print $1}'); do echo $i && find ./ -iname $i | xargs grep vmdk | grep -Ei VMDKS_TO_LOOK_FOR ; done


Removing the Lock
  1. Migrate the affected VM to the identified ESXi host holding the lock and attempt to power on.
  2.     Remove VMDK or shut down the virtual machine holding the lock to release the lock
  •   reboot offending host.
  •   engage VMware storage team, vSAN team, or NFS vendor for further assistance as metadata issues may be preventing locks from being properly handled.


Removing the .lck file (NFS only)

The files on the virtual machine may be locked via NFS storage. You can identify this by files denoted with .lck-#### (where #### is the value of the fileid field returned from a GETATTR request for the file being locked) at the end of the file name.

Caution: These can be removed safely only if the virtual machine is not running.

Note: VMFS volumes do not have .lck files. The locking mechanism for VMFS volumes is handled within VMFS metadata on the volume.
 
Check the integrity of the virtual machine configuration file (.vmx)

For more information on checking the integrity of the virtual machine configuration file, see Verifying ESX/ESXi virtual machine file integrity (1003743).

Note: If a virtual machine does not power on, it may be pointing to two disks in the .vmx file. Remove one of the disks from the virtual machine and attempt to power on again.

Note: For related information, see Cannot power on a virtual machine because the virtual disk cannot be opened (1004232).

Note: Command 'touch *' used to troubleshoot locked vmdk snapshots, will overwrite last modification time of a file which need to be stop and start using vmkfstools -D or chmod -s * or vmfsfilelockinfo * to identify the lock

Comments

Popular posts from this blog

Error [403] The maximum number of sessions has been exceeded in the H5 client during login or logout

  Symptoms In virgo log, you see messages similar to: [2020-05-19T07:25:45.285Z] [ERROR] http-nio-5090-exec-130 72026859 142953 501051 com.vmware.vise.security.spring.DefaultAuthenticationProvider logout failed for sessionId 142953, clientId 501051 java.lang.IllegalStateException: The specified cardinality of 1..1 for osgi:reference implementing com.vmware.vcenter.apigw.api.ApiGatewaySessionManager in bundle com.vmware.h5ngc requires that exactly one OSGI service satisfies the filtering criteria but no such service was found.         at com.vmware.o6jia.context.ExternalServiceTargetSource.getTarget(ExternalServiceTargetSource.java:99)         at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:192)         at com.sun.proxy.$Proxy159.logout(Unknown Source)   ...

"Performance data is currently not available for this entity" viewing the performance tab

  Symptoms While accessing the performance tab and navigating to Overview, you see: No data available   The data for Real time, but fails to retrieve it for past 1 day, week, month or year.  While selecting the advance parameter in performance tab, you see: Performance data is currently not available for this entity Cause This issue is caused by the vCenter Server database (Postgress) containing a stale/future time stamp reference for the ESXi host when the data was collected. For vCenter Servers using SQL, see  "Performance data is currently not available for this entity" error after updating rollup in vSphere Resolution Backup the vCenter...