Kevin Vance - Windows automation with Python

Entries | Archive | Friends | Friends' Friends | User Info

09:33 am

Windows automation with Python

Friday, July 6th, 2007
Previous Entry Share Next Entry
I had to automate a GUI task in Windows the other day. Since all of the Windows "macro tools" required programming anyway, I decided that I might as well use a language I know already. I opened up my python shell, and 20 google queries later I knew everything I needed to. For anyone else stuck doing this, here are some helpful recipes:

Setting the window focus

import win32com.client
shell = win32com.client.Dispatch("WScript.Shell")
shell.AppActivate('Some Application Title')
We use a Windows Script Host COM interface to access the AppActivate() function. Pass in the app's title, or its pid.

Sending keyboard commands

shell.SendKeys('%fo')    # Alt+F, O

This code will open the file located at path using the standard open dialog keyboard commands: File, Open, pathname, ENTER. SendKeys() simply sends an escaped key sequence to the currently focused window.

The whole program needs to be sprinkled with time.sleep() functions to allow for new windows to load and other operations to complete.

Positioning the mouse cursor relative to a window

import win32ui

from ctypes import *
user32 = windll.user32

x, y = win32ui.FindWindow(None, "Some Dialog Box").GetWindowRect()[0:2]
user32.SetCursorPos(x + 100, y + 150)
This code will move the mouse cursor 100 pixels right and 150 pixels down from the top-left corner of "Some Dialog Box". I couldn't find a SendKeys() equivalent for the mouse, so we're going to have to use USER32.DLL.

Since the command SetCursorPos() is relative to the top of the screen, we need to find the position of the window first, using python's win32ui library. FindWindow() can search by class name or by title, and GetWindowRect() returns the top-left and bottom-right coordinates of the window. We take only the first set of coordinates.

Sending a mouse click

I could find nothing in Windows Script Host, and nothing in the python win32 libraries. We are going to have to use ctypes to pass our input data to the SendInput() function in USER32.DLL. Credit goes to Case Nelson for figuring this out:
PUL = POINTER(c_ulong)
class KeyBdInput(Structure):
    _fields_ = [("wVk", c_ushort),
             ("wScan", c_ushort),
             ("dwFlags", c_ulong),
             ("time", c_ulong),
             ("dwExtraInfo", PUL)]

class HardwareInput(Structure):
    _fields_ = [("uMsg", c_ulong),
             ("wParamL", c_short),
             ("wParamH", c_ushort)]

class MouseInput(Structure):
    _fields_ = [("dx", c_long),
             ("dy", c_long),
             ("mouseData", c_ulong),
             ("dwFlags", c_ulong),
             ("dwExtraInfo", PUL)]

class Input_I(Union):
    _fields_ = [("ki", KeyBdInput),
              ("mi", MouseInput),
              ("hi", HardwareInput)]

class Input(Structure):
    _fields_ = [("type", c_ulong),
             ("ii", Input_I)]

class POINT(Structure):
    _fields_ = [("x", c_ulong),
             ("y", c_ulong)]

FInputs = Input * 2
extra = c_ulong(0)

click = Input_I()
click.mi = MouseInput(0, 0, 0, 2, 0, pointer(extra))
release = Input_I()
release.mi = MouseInput(0, 0, 0, 4, 0, pointer(extra))

x = FInputs( (0, click), (0, release) )
user32.SendInput(2, pointer(x), sizeof(x[0]))
Yikes. After defining a bunch of C data types, we create two input events to send to the current application: a left mouse button click immediately followed by a left mouse button release. It's not pretty, but it works.


With these functions, it should be possible to completely automate most Windows keyboard/mouse input tasks. No $400 macro designer required.
Link )Reply )

From: zztzed
2007-07-06 06:51 pm (UTC)
When I worked in CSD at USCA and I needed to automate some menial task I used AutoIt. It's basically a scripting language as well, but it also came with a recorder application so I could do whatever menial task once to generate a script of it instead of having to actually think about how it was done and translate that into the scripting language. And it was (and, as far as I know, still is) free, albeit not with a capital F like Python.
(Reply) (Thread)
[User Picture]From: kvance
2007-07-06 07:01 pm (UTC)
That looks really good.
(Reply) (Parent) (Thread)
From: zixyer
2007-07-06 10:41 pm (UTC)
All the Office applications and (surprisingly) Internet Explorer also all have really nice automation interfaces, so you can get at their internals without doing any screen-scraping or generating of pseudo mouse-clicks.

It kind of sucks that more Windows programs don't have Automation interfaces. It would be a requirement, if I were in charge of these things!
(Reply) (Parent) (Thread)
From: ext_105271
2008-06-17 05:09 pm (UTC)

the application

I still think the office interface on Mac somehow looks a lot more pretty than windows..
(Reply) (Parent) (Thread)
From: (Anonymous)
2008-07-30 05:45 am (UTC)

Re: the application

Wow macs sound really good, tell me more.
(Reply) (Parent) (Thread)
From: (Anonymous)
2008-12-21 11:26 am (UTC)
Hi I like the post very much, very useful.

import win32com.client

Don't work for me is it only for Win XP?, I have Win 2000.

ImportError: No module named win32com.client

And you sad you made some google querys, why did you not posted some of them or some sources, that wold be very helpful?
(Reply) (Thread)
From: (Anonymous)
2009-06-11 09:48 am (UTC)

How to Implement Mouse Drag

Thanks for the code. It is simple and it works.
Is there a function to implement Mouse Drag, like Mouse Click (NOT Drag Detect)?

I am trying to emulate Manual ZOOM (LMB + RMB + Drag) within a Python program.


(Reply) (Thread)
[User Picture]From: kvance
2009-06-12 03:17 am (UTC)

Re: How to Implement Mouse Drag

Interesting question. You may be able to create an input event for mouse motion, in between the click and release events. The dx and dy MouseInput fields look promising. See
(Reply) (Parent) (Thread)
From: (Anonymous)
2009-06-12 09:36 am (UTC)

Re: How to Implement Mouse Drag


I have found the complete code table for mouse events at the followiing address: (Must be documented somewhere else as well)

For drag, the hex code is 1 (MOVE).
With FInputs = 5 (LMClick + RMClick + Move + LMRelease + RMRelease) your solution works perfectly.

Thanks again,

(Reply) (Parent) (Thread)
From: (Anonymous)
2009-06-12 10:39 am (UTC)

Re: How to Implement Mouse Drag

check out pywinauto.controls.HwndWrapper.DragMouse(button='left', pressed='', press_coords=(0, 0), release_coords=(0, 0))
(Reply) (Parent) (Thread)
From: (Anonymous)
2011-10-16 06:43 pm (UTC)

Sleep vs events

Note that sleeping is the wrong way to go about it. If you're for instance opening Notepad, it's better to wait for the Notepad window to exist, rather than waiting 2 seconds. On a really slow computer (e.g., VM on a VM server that's heavily loaded), the sleeping strategy will fail -- it will move on sooner than the UI is ready.

Event-driven UI automation is slightly more difficult, but much more deterministic.
(Reply) (Thread)
[User Picture]From: kvance
2011-10-17 12:18 am (UTC)

Re: Sleep vs events

Great point. Checking for the right state instead of sleeping is always a better idea. The project I was working on was a one-off where checking the output was trivial, so I wasn't too concerned about those kinds failures so long as they were infrequent.
(Reply) (Parent) (Thread)
[User Picture]From: Vasil Lefterov
2012-06-02 09:32 am (UTC)

Clear the send text from a field. How?



shell.AppActivate('Some Application Title')

will send some text to a field in my application.

Is there a way to clean the field and resend something new after that?

(Reply) (Thread)
[User Picture]From: kvance
2012-06-02 02:25 pm (UTC)

Re: Clear the send text from a field. How?

You might be able to use the keyboard to select the contents of the field and delete it. Maybe something like SendKeys('^+{HOME}{DEL}') assuming the cursor is already at the end of the field.
(Reply) (Parent) (Thread)