Hung VM – Find the pid
I ran into this situation the other day and thought I would pass it on. It happens occasionally in a VI2 environment. Maybe this will save someone a support call. This information applies to esx 2.x
I don’t know what caused this situation, but VC1.x showed the VM turned off. An attempt to turn it on immediately failed. The vmware.log file showed nothing. Last thing in there was a couple weeks ago. The VMDK file’s modify time was a couple weeks ago. Nothing in /var/log/vmkwarning or /var/log/vmkernel that looked relevant.
So, try to see what ESX thinks the power state is.
vmware-cmd /home/vmware/VMNAME/VMNAME.vmx getstate
This took a while to come back with a message talking about unable to communicate with the vmx process.
Sounds like the VM is running, but not really = hosed process. We need to kill the pid of the vm, and then fire it back up. Now, normally I would lookup the pid using vmware-cmd. Even if it did return a pid, I would question it given the confused state of the VM.
vmware-cmd /home/vmware/VMNAME/VMNAME.vmx getpid
In this case, it is busted so we need to find another way.
grep VMNAME /proc/vmware/vm/*/*
You will get something back like this:
/proc/vmware/vm/214/names:vmid=214 pid=12083 cfgFile="/home/vmware/VMNAME/VMNAME.vmx" uuid="50 2c 9d fc 5a c3 9d a8-c1 d9 9b b3 b9 e5 0e 89" displayName="VMNAME"
Here pid 12083 is the process in question. It needs to be killled. But, first double check it given the broken state of the process.
[root@myhost01 vm]# ps -ef | grep 12083
root 12083 1 0 09:51 ? 00:00:00 /usr/lib/vmware/bin/vmware-vmx -
root 12084 12083 0 09:51 ? 00:00:00 vmware-mks -A 11 -D 13 -S -L /tm
root 12090 12083 0 09:51 ? 00:00:00 vmware [Floppy]
root 12091 12083 0 09:51 ? 00:00:00 vmware [ide0:0]
root 12092 12083 0 09:51 ? 00:00:00 /usr/lib/vmware/bin/vmware-vmx -
root 14291 10267 0 10:02 pts/0 00:00:00 grep 12083
It’s a vmware-vmx process – meaning it really is a VM. But we don’t know what vm – we really do but let’s check it anyway.
grep 12083 /proc/vmware/vm/*/*
The only one back is our broken VM. I am now 99% sure that is the correct one.
So, kill the pid and go restart your VM.
kill 12083
FYI, this works on and applies perfectly to 3.x. However, it does NOT apply to 3i, which does not have a service console. It also probably will not apply to VI4 because I think they’ve changed some things but I don’t know yet.
any idea why i get a pid of -1 for all VM’s?
(extra ‘*’ in first example as 2nd example doesn’t return a PID at all?
[root@svr-vmh-crcla06 1236]# grep SVR-IAS-CRS01 /proc/vmware/*/*/*
/proc/vmware/vm/1342/names:vmid=1342 pid=-1 cfgFile=”/vmfs/volumes/47fabc0b-cb57433f-2597-0019bb4fcf66/SVR-IAS-CRS01/SVR-IAS-CRS01.vmx” uuid=”50 0c ec b5 52 d8 26 2c-5e ff 72 45 4f 83 7c 28″ displayName=”SVR-IAS-CRS01″
[root@svr-vmh-crcla06 1236]# grep SVR-IAS-CRS01 /proc/vmware/*/*
/proc/vmware/sched/drm-stats: 111 1000 0 -1 1 2 1 0 2 6 2 0 2 6 2 0 4294967293 92 604 512 44 0 0 207 24 133 vm.1341 (/vmfs/volumes/47fabc0b-cb57433f-2597-0019bb4fcf66/SVR-IAS-CRS01/SVR-IAS-CRS01.vmx)
Paul,
The instructions were written for ESX 2.x and do not work in 3.x.
You have a number of options with 3.x. Safest options listed first…
1. ‘vmware-cmd /path/to/vm.vmx stop hard’ — may or may not work depending how screwed up the vm is.
2. ‘vm-support -x’ to list the vmids (small x). Then ‘vm-support -X vmid’ (big x) to generate logs and kill it. This way leaves you info for vmware support, but it takes forever to actually kill the vm.
3. ps -auxfww | grep VMNAME. Then kill the pid or kill -9 it.
More info is here. http://virtrix.blogspot.com/2007/09/vmware-stopping-virtual-machine-gone.html
Good luck
Jeremy
ps -ef | grep vmware
look for name of vmachine…
kill -9 pid of hung vmachine