Get Started with Python Watchdog


The Watchdog library in Python is a fantastic open-source tool for writing programs that monitor for and respond to changes in a filesystem. This short guide gets you up and running with a Watcher object to use Watchdog's powerful capabilities in your own projects. Then, we cover important parts of the Watchdog API to adapt the example for your own needs.

Requirements

This tutorial is written for Python 3.6+ for Watchdog 2.0.z. I've run this code on MacOS 11 and Linux (Ubuntu) without issue, I have no idea whether it would work on Windows.

Before we begin, run pip install watchdog.

Copy the following into your project (don't worry, I'll break down this code in the next section).

import time
from watchdog.observers import Observer
from watchdog.events import FileSystemEventHandler

class Watcher:

    def __init__(self, directory=".", handler=FileSystemEventHandler()):
        self.observer = Observer()
        self.handler = handler
        self.directory = directory

    def run(self):
        self.observer.schedule(
            self.handler, self.directory, recursive=True)
        self.observer.start()
        print("\nWatcher Running in {}/\n".format(self.directory))
        try:
            while True:
                time.sleep(1)
        except:
            self.observer.stop()
        self.observer.join()
        print("\nWatcher Terminated\n")


class MyHandler(FileSystemEventHandler):

    def on_any_event(self, event):
        print(event) # Your code here

if __name__=="__main__":
    w = Watcher(".", MyHandler())
    w.run()

Example Program

This code is boilerplate for any program that needs to monitor a folder and react when files are changed. Here are some example use cases:

  • A file management program for ingesting and processing files such as images from an SD card.
  • A static site generator development environment watches for changes to the source code and rebuilds the site on save.
  • Various home and workplace automation tasks

Essentially, your Watchdog is two parts: a Watcher and a Handler. The Watcher object initializes and manages monitoring the specified folder (and its subfolders). The Handler object applies your code to determine what to do in response to events. Let's take a closer look at each.

class Watcher:

    def __init__(self, directory=".", handler=FileSystemEventHandler()):
        self.observer = Observer()
        self.handler = handler
        self.directory = directory

The Observer() object is doing most of the work here, but it will need to be configured with a directory to watch (it also keeps an eye on all of the subdirectories) and a handler, which you'll write yourself.

def run(self):
    self.observer.schedule(
        self.handler, self.directory, recursive=True)
    self.observer.start()
    print("\nWatcher Running in {}/\n".format(self.directory))
    try:
        while True:
            time.sleep(1)
    except:
        self.observer.stop()
    self.observer.join()
    print("\nWatcher Terminated\n")

First, we configure the observer. After starting the observer, sleep until the program is quit, then stop and join the observer to clean up.

This could end on a different condition. You could have the watcher run for a certain period of time or until a function returns some value.

class MyHandler(FileSystemEventHandler):

    def on_any_event(self, event):
        print(event) # Your code here

This is where your unique code goes! Check out the API reference in the next section to understand what to put here.

w = Watcher(".", MyHandler())
w.run()

Initialize the Watcher object with a target directory and a Handler object, then run it. w.run() will continue until quit, so call any functions from the handler as lines after w.run() will not execute until the user hits Control-C (useful for things like killing subprocesses).

Partial API Reference

Watchdog has plenty of capabilities, but the code you write with the above example will mostly interface with the FileSystemEvent object and its derivatives, so that's what I've documented a selection of here. You can see the source code on this page on GitHub.

event.event_type | str

The event_type parameter is one of five options listed below:

types = ['created', 'deleted', 'modified', 'moved', 'closed']

To be honest, I've never had cause to use moved or closed but I assume they're useful otherwise the library wouldn't have them. The modified event only triggers when the file is saved (written to) so if you're using a program with autosave it might fire often.

class MyHandler(FileSystemEventHandler):

    def on_any_event(self, event):
        if event.event_type == "deleted":
            print("Oh no! It's gone!")

This parameter is quite useful for flow control, you may want to take different actions when a file is created than when it is deleted.

Depending on your use case, you might be better served by swapping on_any_event for on_created, on_deleted, and so on for the five event types.

event.src_path | str

The src_path parameter is a string with the absolute path to the file or folder that has experienced the event.

class MyHandler(FileSystemEventHandler):

    def on_any_event(self, event):
        if event.src_path[-5:] == ".docx":
            print("Microsoft Word documents not supported.")            

I use this parameter for filtering; I often need to ignore events from certain directories or file types.

event.is_directory | bool

The is_directory parameter is a boolean to let you know if the event came from a folder (True) or file (False).

import os

class MyHandler(FileSystemEventHandler):

    def on_any_event(self, event):
        if event.is_directory:
            os.system("cp -r {} ~/folders".format(event.src_path))
        else:
            os.system("cp {} ~/files".format(event.src_path))

This information can be useful for flow control or to avoiding taking an incorrect action on a folder.

More often, I use this parameter to avoid duplicating event handling. Many events that affect a file also affect the directory that it is in (for example, creating a file creates a "file created" event and a "directory modified" event). This parameter is helpful for deduplicating these events.

On a similar note, if an event within a target folder triggers another event somewhere else within the watched file tree, it is easy to accidentally write infinite loops. Use these three parameters to guard against that.

class MyHandler(FileSystemEventHandler):

    def on_any_event(self, event):
        if not event.is_directory:
            f = open(event.src_path, "a")
            f.write("\nInfinite Loop\n")
            f.close()

In this example, when a file is created, "Infinite Loop" will be written to it as every "modified" event triggers the event handler.

This is just what I consider the important parts of the library to know, there are tons of other features that you can learn from reading the code on GitHub.