xdotool: Script your mouse
Back in the day, we were all told that computers would make our lives easier. They’d automate everything, and we’d have to give them only the minimum of input to get extraordinary results. Well, the calendar on the wall of LXF Towers says that it’s officially The Future(™) now, so how far have we come? Oh wait, we’re still spending ages pushing a plastic box around to move a cursor on the screen to point at fiddly little icons. For all the advancements we’ve seen in GUI technologies over the years, we still spend far too long clicking bits and bobs to get work done.
But there’s a solution, and it’s going to improve your productivity massively by the end of this tutorial. The solution is called xdotool, and according to the author’s description, it “simulates keyboard input and mouse activity”. Sure, that doesn’t sound very exciting on its own, but think of it this way: it’s a command-line tool that acts as a virtual keyboard and mouse, letting you script together commands to control your GUI apps. It can move the mouse cursor around, send clicks to the display, select specific windows and even move windows between desktops. Ultimately, it lets you bundle tasks that would take loads of faffing around with the mouse and keyboard into a single script.
Chances are that xdotool is available in your distribution’s package repositories – if not, you can grab it from www.semicomplete.com/projects/xdotool . The dependencies are just the usual X libraries, so it shouldn’t be difficult to install. Once you have it, open up a terminal window and then hover the mouse pointer over an element in the application, such as the Edit menu. Don’t click just yet though! Instead, enter this in the terminal:
xdotool click 1
And voila, the Edit menu opens up, as if you’d pressed the real mouse button. Here, xdotool is sending a click event to the X Window System (the core graphical layer of the Linux desktop), and this is possible because X was designed with great flexibility in mind. It isn’t narrow-minded, and doesn’t think that clicks can come only from mice; it knows that such events can be generated by touchpads, touchscreens, graphics tablets and other devices. And it’s even flexible enough to accept synthetic clicks generated by a program such as xdotool.
So, this is a really simple example, but it already has some uses. For instance, many people suffer from RSI, especially with the action of clicking mouse buttons. Using xdotool, you could configure your desktop environment or window manager so that a lesser-used key (eg backtick) runs the command above, thereby letting you move the mouse with your right hand and click with your left, sharing the load. Note that the number 1 in the command refers to the mouse button here – 1 is the left, 2 is the middle and 3 is the right. You can also use 4 as a virtual mouse wheel up movement, and 5 for down.
Let’s try something a bit more complex. xdotool lets you move the mouse pointer around the screen, using the mousemove command. In X Window System parlance, the start of the screen is the top-left corner, which has the X and Y (horizontal and vertical) co-ordinates of 0 and 0. If your screen resolution is 1024x768, then the co-ordinates for the top right location are 1023 (X) and 0 (Y). The bottom-right is 1023 (X) and 767 (Y), and so forth.
We’re going to move the mouse pointer to the main menu button on our desktop and click to open it. Your desktop layout may vary, so you’ll need to change the co-ordinates to fit, but in our case we have Xubuntu with the main menu in the top-left of the screen. Try this command:
xdotool mousemove 0 0 click 1
Here, you can see that we’re stringing commands together – we have a mousemove followed by the X and Y locations, and then a left-button click. In our case, this opens up the main menu. We’re on our way to automating our desktop! However, there’s something we have to bear in mind. xdotool just performs actions in a sequence blindly, and doesn’t know if it’s clicking on something that exists. You know when you fire up your Linux box, and the main menu or program launcher can take a couple of seconds to appear as it loads the icons and resources? Well, these delays can have an impact on xdotool command. It’s tempting to start writing awesome combinations of movements and clicks, but if something on your system is being slow for whatever reason, it can get out of sync.
For instance, on our machine we want to automate the process of getting the power status. Normally, that involves moving the mouse to the battery icon in the system tray, right clicking it, and choosing the Power Information entry in the Context menu. We could bunch together various xdotool commands in one line to achieve this, but what happens if the system is busy and the Context menu doesn’t come up immediately? Well, we need to add sleep commands in between the calls to xdotool. Here’s our script, which also includes some other new features:
#!/bin/bash xdotool mousemove --sync 1000 10 xdotool click 3 sleep 1 xdotool mousemove_relative --sync 0 80 xdotool click 1
We’ve saved this as batinfo.sh in /usr/bin, so that we can use Xfce’s keyboard settings to map a key combination (Ctrl+Alt+B) to run this script. Most desktops and window managers let you associate key combos with commands, eg for launching terminal windows, so you should be able to do something similar in your setup.
So, let’s see how this works. Our Power Management icon is in the top panel, towards the right of the screen – with a bit of testing, we’ve determined that the mouse should be at 1000 on the X axis and 10 on the Y axis to hover over it. Note here the --sync option for the mousemove command, which says that we don’t just want to tell the X Window System to move the mouse, but wait until it has sent back a confirmation message that the move was successful. So it’s generally a good idea to use --sync.
At its most basic level, xdotool lets you perform single actions, such as virtual mouse clicks.
We then send a right-click to open the Context menu, but then we wait for one second using the sleep 1 command. This gives us a bit of legroom if the system is busy – for instance, if you’re doing a load of package updates in the background, the Context menu might not appear immediately, as the processor and hard drive are busy. So we wait for a second, to give Xfce chance to do its business. (On your specific setup, you might want to change that to two or three seconds if your machine is often swamped.)
Next, there’s a new command: mousemove_relative. As the name suggests, this just moves the pointer relative to the current position, and not to an absolute location on the screen. In our case, the Power Information entry in the Context menu is 80 pixels down from where we activated the menu, so we move to that point. This gives us extra flexibility: if we move the Power icon to somewhere else in the panel one day, we only have to change the first mousemove command as everything else stays relative to it.
Now, xdotool’s ability to click around at pre-programmed co-ordinates is cool, but it’s not that useful if we have to manually activate a window before we perform our actions. As an example, you might have a really cool music player that you always run in the background, and occasionally bring up from your window selector to perform an operation (eg switch to a different playlist).
Some media apps enable you to control them from the CLI, thereby allowing you to map key combos in your window manager to commands, but not all have this facility. Instead of bringing up your media player each time manually and then running your xdotool script (at which point you might as well just click manually), you can make xdotool automatically switch to the appropriate window using the search and windowactivate commands.
For instance, we’ve just opened up the Parole media player on our desktop, and left it running in the background while we type commands into a terminal. We enter:
xdotool Parole “Firefox” windowactivate
This makes Parole jump to the foreground. Try it on your machine, replacing Parole with anything else you have running. Here, xdotool is working with your window manager to find specific windows – it looks for the first window with the word “Parole” in the title. You can replace that with anything else, and if there’s no match, no actions will take place.
Of course, Parole’s window could be anywhere on the screen, and so we can’t start our series of click actions at a fixed location. We need to move the mouse cursor to where Parole is and start working from there.
5 great uses for xdotool
A script to click all the options we need on any fresh LibreOffice installation?
With the power to control your mouse pointer through raw commands, anything is possible (Well, perhaps apart from reducing the latency in PulseAudio, but that’s a different story). Here’s some inspiration:
Create a series of clicks to set up LibreOffice exactly as you want it, so that you’re able to run it on any new Linux installation.
Remote newbie training
Write a script for a fresh Ubuntu 11.10 Unity installation that walks through opening apps trying out features, changing settings etc. You could even record an Ogg file of you talking to go along with it!
Drowning in windows? One quick script could let you move all terminal windows scattered across multiple desktops to a single desktop. Hurrah.
Ever seen those tool-assisted speed-runs on YouTube, where people appear to play games at light-speed using pre-programmed joypad presses? You could head over to www.happypenguin.org and do the same (see especially xdotool’s keydown and keyup commands).
With some clicks and the type command (see the virtual keyboard box), you can quickly send off a flame mail to the Gnome Shell developers every time you find something completely unintuitive.
Got any more? Once you’ve worked through this tutorial and you’ve built up your own collection of time-savers, you can share them with other readers. Email us at email@example.com with your script, and we’ll put a selection of them on a forthcoming LXFDVD.
As you’ll expect by now, xdotool’s authors have thought of this, so…
xdotool search “Parole” windowactivate --sync mousemove --window %1 0 0
After locating and activating the Parole window, we perform a mousemove command, but this time to a location in window number 1 of the search results (ie the first window that had Parole in the title – you’ll probably never need any others).
With just a few more mousemove_relative commands, you could create a robot that draws the Mona Lisa.
We move the mouse to location 0 0, which is the X and Y co-ordinates for the top-left of the window. Excellent! We can now write a script using offset co-ordinates from the top-left, clicking buttons, opening menu and checking boxes at will, regardless of where the program’s window was originally set.
If it pains you that your windows could be in arbitrary places, however, xdotool can move them for you. Try this:
xdotool search “Parole” windowactivate --sync windowmove 100 200
This locates Parole and moves its windows to 100 pixels across the screen, and 200 down. You can take out the windowactivate --sync part and do the moves in the background if you like. Another option is resizing windows:
xdotool search “Parole” windowsize 500 200
This finds Parole and resizes it to 500 pixels wide, 200 high. You can even move the window to another workspace/virtual desktop like this:
xdotool search “Parole” set_desktop_for_window 0
Just change the number at the end to whichever desktop you require, and note that in typical programming fashion, 0 is the first desktop, 1 is the second, and so forth.
Like xdotool, but for the command line
All this talk of GUI automation is lovely, but it doesn’t work with command-line apps. However, there’s a fantastic little tool that lets you automate actions at the CLI called expect. The name here comes from the program’s main feature: it expects a user-defined string to appear on the screen, and then sends out the appropriate text in response. expect is installed by default in many distros, or will almost certainly be available in your distro’s default package repositories. Here’s an example of how it works:
expect “What is your name?” send “Rembrandt Q Einstein\n”
Here, we wait for a program to give us the What is your name? prompt and then send a response. Note the \n bit at the end of the send command – that says that we want to output a newline character, ie simulating that the user has hit Enter.
The expect program is an interpreter, and you normally use it by stringing together a series of commands. Here’s an example of using expect to automate logging into an SFTP server and retrieving a file:
#!/usr/bin/expect -f spawn sftp firstname.lastname@example.org expect “Password:” send “mypasshere\n”; expect “sftp>” send “get myfile.txt\n”; expect “sftp>” send “exit\n”; interact
In the first line, we supply the full path to the expect binary, and use the -f flag to say that we want it to read the following commands, rather than start in interactive mode.
Then we use the vitally important spawn function, which lets us run a program and work with the output that it generates. Here we’re spawning an SFTP process to log into a remote server.
We expect the server to respond with a Password: prompt, so when that happens we send our password with a newline character as described above. Once the login is successful, we wait for an sftp> prompt and then perform our get command to retrieve a file. We wait for the prompt to come back and then log out with exit.
Note the interact line at the end – you must have this as the final line in your expect script, otherwise nothing will happen. It just tells expect to perform everything you’ve written so far. expect is an amazingly versatile tool and can save you a huge amount of time, so have a skim through the manual page to see what else it’s capable of.
What a drag, dude
Clicking is all good and well, but what if we want to perform an operation where the mouse button is always held down, such as a click-and-drag?
Have a peek at the manual page (man xdotool) to see a full list of available options.
Fortunately, xdotool caters for this by providing a more versatile alternative to the click command: mousedown and mouseup. Let’s try this out by firing up Gimp, going to File > New, and creating a new canvas of the default size (its window title should have “Untitled” in it).
Select the Pencil tool from the Gimp Toolbox window, then open up a terminal and enter the following into a file called test.sh:
xdotool search “Untitled” windowactivate --sync mousemove --window %1 100 100 sleep 0.5 xdotool mousedown 1 sleep 0.5 xdotool mousemove_relative --sync 200 200 sleep 0.5 xdotool mouseup 1
Here, we activate the Gimp canvas window, and move to location 100, 100 (X, Y) inside it. We then set the left mouse button to down, move the mouse diagonally down-right by 100 pixels in each direction, and then set the mouse button to up. In other words, we’ve created a robot that draws a diagonal line – see the screenshot. As with the earlier examples, we’ve included a few sleep commands here to make sure that we don’t work too quickly for the window manager or Gimp to process. (If your box is particularly snappy, you could change those sleep periods from half a second to 0.1.)
To wrap up, in this tutorial we’ve explored various different ways to interact with the GUI: moving the mouse, sending virtual clicks and drags, and finding specific windows. Putting these actions together, and exploring more possibilities with the xdotool manual page, you can automate pretty much everything on your desktop, leaving you with more time to spend on important things. Fancy a game of Frozen Bubble, anyone?
While we’ve focused on mouse events in this tutorial, xdotool can send virtual keystrokes as well. For example:
xdotool key F1
This simply simulates pressing the F1 key. You can create combination keypresses with the plus sign like this:
xdotool key ctrl+c
Given that you’re using xdotool to get away from the mouse, however, it’s unlikely that you’re going to spend much time with the key command. But there’s another option called type, which sends a stream of characters, and which can be useful for working with text editors. For instance, launch AbiWord and then enter this in a terminal:
xdotool search “AbiWord” windowactivate --sync type “Hello world”
You’ll see that the focus switches to the AbiWord window, and then the text is entered into the document, as if by magic. There’s even a tiny pause between the virtual keystrokes, as if someone were typing it. You can change this with the --delay option followed by a number of milliseconds, but the default is fine (and makes sure the target application doesn’t get too confused by receiving an inhumanly rapid stream of keypresses). As an example, here’s a script called mikesig.sh that we’ve put in /usr/bin:
#!/bin/sh xdotool type “This post brought to you by Mike Saunders Email me at email@example.com for more great wit”
Note that the carriage return in the script is also produced by xdotool, so you can insert new lines. We’ve used our window manager to make the Ctrl+Alt+S combo run mikesig.sh, so whatever program we’re in, we can press Ctrl+Alt+S to get an instant signature generator. If you have a more serious workload, you’ll find this useful, as it works across all sorts of applications and window managers.