I had to automate a GUI task in Windows the other day. Since all of the Windows "macro tools" required programming anyway, I decided that I might as well use a language I know already. I opened up my python shell, and 20 google queries later I knew everything I needed to. For anyone else stuck doing this, here are some helpful recipes:
Setting the window focus
import win32com.client
shell = win32com.client.Dispatch("WScript.Shell")
shell.AppActivate('Some Application Title')
We use a Windows Script Host COM interface to access the
AppActivate() function. Pass in the app's title, or its pid.
Sending keyboard commands
shell.SendKeys('%fo') # Alt+F, O
time.sleep(0.1)
shell.SendKeys(path)
shell.SendKeys('{ENTER}')
This code will open the file located at
path using the standard open dialog keyboard commands:
File,
Open, pathname, ENTER.
SendKeys() simply sends an escaped key sequence to the currently focused window.
The whole program needs to be sprinkled with
time.sleep() functions to allow for new windows to load and other operations to complete.
Positioning the mouse cursor relative to a window
import win32ui
from ctypes import *
user32 = windll.user32
x, y = win32ui.FindWindow(None, "Some Dialog Box").GetWindowRect()[0:2]
user32.SetCursorPos(x + 100, y + 150)
This code will move the mouse cursor 100 pixels right and 150 pixels down from the top-left corner of "Some Dialog Box". I couldn't find a
SendKeys() equivalent for the mouse, so we're going to have to use USER32.DLL.
Since the command
SetCursorPos() is relative to the top of the screen, we need to find the position of the window first, using python's win32ui library.
FindWindow() can search by class name or by title, and
GetWindowRect() returns the top-left and bottom-right coordinates of the window. We take only the first set of coordinates.
Sending a mouse click
I could find nothing in Windows Script Host, and nothing in the python win32 libraries. We are going to have to use ctypes to pass our input data to the
SendInput() function in USER32.DLL. Credit goes to
Case Nelson for figuring this out:
# START SENDINPUT TYPE DECLARATIONS
PUL = POINTER(c_ulong)
class KeyBdInput(Structure):
_fields_ = [("wVk", c_ushort),
("wScan", c_ushort),
("dwFlags", c_ulong),
("time", c_ulong),
("dwExtraInfo", PUL)]
class HardwareInput(Structure):
_fields_ = [("uMsg", c_ulong),
("wParamL", c_short),
("wParamH", c_ushort)]
class MouseInput(Structure):
_fields_ = [("dx", c_long),
("dy", c_long),
("mouseData", c_ulong),
("dwFlags", c_ulong),
("time",c_ulong),
("dwExtraInfo", PUL)]
class Input_I(Union):
_fields_ = [("ki", KeyBdInput),
("mi", MouseInput),
("hi", HardwareInput)]
class Input(Structure):
_fields_ = [("type", c_ulong),
("ii", Input_I)]
class POINT(Structure):
_fields_ = [("x", c_ulong),
("y", c_ulong)]
# END SENDINPUT TYPE DECLARATIONS
FInputs = Input * 2
extra = c_ulong(0)
click = Input_I()
click.mi = MouseInput(0, 0, 0, 2, 0, pointer(extra))
release = Input_I()
release.mi = MouseInput(0, 0, 0, 4, 0, pointer(extra))
x = FInputs( (0, click), (0, release) )
user32.SendInput(2, pointer(x), sizeof(x[0]))
Yikes. After defining a bunch of C data types, we create two input events to send to the current application: a left mouse button click immediately followed by a left mouse button release. It's not pretty, but it works.
Conclusion
With these functions, it should be possible to completely automate most Windows keyboard/mouse input tasks. No $400 macro designer required.