I had to automate a GUI task in Windows the other day. Since all of the Windows "macro tools" required programming anyway, I decided that I might as well use a language I know already. I opened up my python shell, and 20 google queries later I knew everything I needed to. For anyone else stuck doing this, here are some helpful recipes:
Setting the window focus
shell = win32com.client.Dispatch("WScript.Shell")
shell.AppActivate('Some Application Title')
We use a Windows Script Host COM interface to access the AppActivate()
function. Pass in the app's title, or its pid.
Sending keyboard commands
shell.SendKeys('%fo') # Alt+F, O
This code will open the file located at path
using the standard open dialog keyboard commands: F
pen, pathname, ENTER. SendKeys()
simply sends an escaped key sequence to the currently focused window.
The whole program needs to be sprinkled with time.sleep()
functions to allow for new windows to load and other operations to complete.
Positioning the mouse cursor relative to a window
from ctypes import *
user32 = windll.user32
x, y = win32ui.FindWindow(None, "Some Dialog Box").GetWindowRect()[0:2]
user32.SetCursorPos(x + 100, y + 150)
This code will move the mouse cursor 100 pixels right and 150 pixels down from the top-left corner of "Some Dialog Box". I couldn't find a SendKeys()
equivalent for the mouse, so we're going to have to use USER32.DLL.
Since the command SetCursorPos()
is relative to the top of the screen, we need to find the position of the window first, using python's win32ui library. FindWindow()
can search by class name or by title, and GetWindowRect()
returns the top-left and bottom-right coordinates of the window. We take only the first set of coordinates.
Sending a mouse click
I could find nothing in Windows Script Host, and nothing in the python win32 libraries. We are going to have to use ctypes to pass our input data to the SendInput()
function in USER32.DLL. Credit goes to Case Nelson
for figuring this out:
# START SENDINPUT TYPE DECLARATIONS
PUL = POINTER(c_ulong)
_fields_ = [("wVk", c_ushort),
_fields_ = [("uMsg", c_ulong),
_fields_ = [("dx", c_long),
_fields_ = [("ki", KeyBdInput),
_fields_ = [("type", c_ulong),
_fields_ = [("x", c_ulong),
# END SENDINPUT TYPE DECLARATIONS
FInputs = Input * 2
extra = c_ulong(0)
click = Input_I()
click.mi = MouseInput(0, 0, 0, 2, 0, pointer(extra))
release = Input_I()
release.mi = MouseInput(0, 0, 0, 4, 0, pointer(extra))
x = FInputs( (0, click), (0, release) )
user32.SendInput(2, pointer(x), sizeof(x))
Yikes. After defining a bunch of C data types, we create two input events to send to the current application: a left mouse button click immediately followed by a left mouse button release. It's not pretty, but it works.
With these functions, it should be possible to completely automate most Windows keyboard/mouse input tasks. No $400 macro designer required.