Tuesday, 13 December 2011

Improved flexibility for log file rotation

The RotatingFileHandler and TimedRotatingFileHandler classes provide basic log file rotation functionality, but there are times when more functionality is required than is provided in these classes. One requirement is to compress log files when they are rotated. Currently, if you want this kind of functionality, you have to subclass the relevant handler and override the doRollover method. This works, but requires you to copy and paste the entire method, and then adapt it to your needs; clearly, this is not ideal. An example of this can be found in this ActiveState recipe for a TimedCompressedRotatingFileHandler. This uses the .zip format to compress log files.

In order to provide improved flexibility for log file processing (which will generally be compression, but in practice any transformation could be performed), I propose to add two new methods to these handlers (in their common base class, BaseRotatingHandler).

The first method is named rotate. It takes two arguments, source and dest. The source argument is the name of the log file about to be rotated (e.g. test.log), and the dest argument names the target of the rotation (e.g test.log.1). Under normal circumstances, the file named by dest should not exist, having been explicitly removed by the rollover logic. the default behaviour is what happens currently – a call to os.rename(source, dest).

Since we are potentially transforming log files, we might also want to change their extension, so that they are more usable with e.g. file system viewers like Nautilus or Windows Explorer. To facilitate this, a second method is provided, called filename. This takes a single argument – the filename of a destination file in a rotation (for example, test.log.1)  – and returns a modified version (for example, adding an extension – test.log.1.gz). The default behaviour is to return the name unchanged.

The definition of the new methods is as follows.

    def filename(self, name):

        """
        Modify the filename of a log file when rotating.

        This is provided so that a custom filename can be provided,
        say by adding an extension.

        The default implementation returns the name unchanged.

        :param name: The default name for the destination log file.
        """

    def rotate(self, source, dest):

        """
        When rotating, rotate the current log in the file named by
        source to the file named by dest.

        On entry, the file named by source should exist and the file
        named by dest should not exist.

        On exit, the file named by source should not exist and the

        file named by dest should exist.

        The default implementation simply renames source to dest.

        :param source: The source filename. This is normally the base
                       filename, e.g. 'test.log'
        :param dest:   The destination filename. This is normally
                       what the source is rotated to, e.g. 'test.log.1'.
        """

This will allow a custom rotation strategy to be implemented relatively easily: for example, if you want to gzip rotated log files, you can ensure that filename returns test.log.1.gz when passed test.log.1, and that rotate gzips test.log to test.log.1.gz.

The ActiveState example I cited earlier used the .zip format, but what if you want to use .gz, .bz2, .7z or some other compression algorithm? You would need to support all of them in a single handler, or write multiple handler classes for each format.

To avoid the need for users to subclass the rotating handlers, I further propose to allow configurability of the rotate and filename methods by providing two new handler attributes (both, by default, set to None in BaseRotatingHandler).

The first handler attribute is named rotator. You can set it to a callable that takes two arguments, source and dest. The meanings of these arguments are the same as for the rotate method. If a callable is provided in the rotator attribute, it is called by the rotate method to perform the rotation.

The second handler attribute is called namer. You can set this to a callable which takes a single argument – the filename of a destination file in a rotation (for example, test.log.1)  – and modifies it to, for example, add an extension. If a callable is set, then it is called by the filename method to return the new name.

An example of how you can define a namer and rotator is given in the following snippet, which shows zlib-based compression of the log file.

def namer(name):
    return name + ".gz"

def rotator(source, dest):
    with open(source, "rb") as sf:
        data = sf.read()
        compressed = zlib.compress(data, 9)
        with open(dest, "wb") as df:
            df.write(compressed)
    os.remove(source)

rh = logging.handlers.RotatingFileHandler(...)
rh.rotator = rotator
rh.namer = namer

I’m aware that these are not “true” .gz files, as they are bare compressed data, with no “container” such as you’d find in an actual gzip file. This snippet is just for illustration purposes.

Note: The namer function is called quite a few times during rollover, so it should be as simple and as fast as possible. It should also return the same output every time for a given input, otherwise the rollover behaviour may not work as expected. If either the namer or rotator function raises an exception, this will be handled in the same way as any other exception during an emit() call, i.e. via the handleError method of the handler.

The internal logic for determining which files to delete (when a backupCount is specified on a rotating file handler) uses, in the case of  TimedRotatingFileHandler, regular expressions for the date/time-based suffix. The regular expressions used have been updated to allow an optional file extension consisting of a period followed by one or more letters and digits.

These changes will soon appear in Python 3.3’s development repository. Your comments are welcome.

Thursday, 9 June 2011

Configurability tweaks for SysLogHandler and Formatter

There have been some recent changes to SysLogHandler and Formatter which will appear in Python releases.

  1. SysLogHandler currently appends a NUL byte to messages it sends to syslog daemons. This is because earlier versions of these daemons expected the NUL terminator to be present, even though it’s not part of the relevant specification (RFC 5424). More recent releases of these daemons don’t expect the NUL byte to be there, but strip it off if it’s present. And some even more recent daemons follow the specification more strictly, and pass the NUL byte on as part of the message. In order to facilitate dealing with all the various possible destinations for SysLogHandler messages, a new class-level attribute, append_nul, has been added to SysLogHandler. The default value is True, thus preserving current behaviour; however, you can set it to False for specific handler instances where you don’t want the NUL byte to be appended. This change is expected to appear in Python 3.2.1, which should be released soon.
  2. In the same vein, the date formatting logic used by Formatter has been changed to make it more configurable. Previously, the default ISO 8601 format was hard-coded as in this example: 2010-09-06 22:38:15,292 where the part before the comma is handled by a strptime format string ('%Y-%m-%d %H:%M:%S'), and the part after the comma is a millisecond value. Because strptime does not have a format placeholder for milliseconds, the millisecond value is appended using another format string, '%s,%03d' – both of these format strings have been hardcoded into the Formatter.formatTime method. That’s set to change with Python 3.3 – these strings will be defined as class-level attributes which can be overridden at the instance level when desired. The names of the attributes are default_time_format (for the strptime format string) and default_msec_format (for appending the millisecond value).

Monday, 11 April 2011

Added functionality for basicConfig() in Python 3.3

The logging.basicConfig() function allows users to easily configure either a console handler or file handler for the root logger, and thereby have them usable by all loggers in their application. While this functionality meets the commonest use cases, there are some slightly less common use cases which can be accommodated without too much work. For example:

  • Configuring a FileHandler with a specific encoding, or with a delay parameter which defers opening the file until the first time it’s actually used.
  • Configuring a SysLogHandler. This could be useful, say, for cron jobs or other admin scripts.
  • Configuring multiple handlers with minimal work.

Accordingly, basicConfig() in Python 3.3 will have added functionality in the form of a new handlers keyword argument. This defaults to None, but when specified it is expected to be an iterable of already-created handler instances, which are to be added to the root logger.

Any handler in the iterable which does not have a formatter assigned will be assigned the formatter created by basicConfig(). This allows users to specify formatters if they need to, but defaults to a common formatter for all handlers which have no formatter specified.

This change to basicConfig() also includes improved checking for parameter compatibility: it doesn’t make sense to specify handlers together with stream or filename parameters, or to specify stream and filename parameters together. In Python 3.3, checks have been added so that these invalid combinations will raise a ValueError.

Examples of usage:

logging.basicConfig(handlers=[logging.FileHandler('test.log', 'w', 'utf-8')])

logging.basicConfig(handlers=[logging.SysLogHandler()])

logging.basicConfig(handlers=[logging.StreamHandler(), logging.FileHandler('test.log')])

I’d welcome feedback on this new functionality, which has already been checked into Python’s Mercurial repository.

Logging documentation for Python 2.7 reorganised

The logging documentation for Python 2.7 has been reorganised into the same structure as that for Python 3.x. The library reference is supposed to be just that – a reference – so the tutorials have been moved to the HOWTO section. The configuration and handler documentation has also been moved into separate pages. The organisation is now as follows:

There’s a prominent sidebar at the start of the reference documentation pages pointing to the HOWTO sections.

I hope the results are easier to navigate and to understand. Please feel free to give your feedback.

Friday, 11 February 2011

Desktop log viewer released

A Qt-based desktop application for viewing logs received across the network has been released. It incorporates some of the feedback gratefully received following an earlier post with screencast.

The viewer has mandatory dependencies on Qt4 and PyQt4, and optional dependencies on ZeroMQ and pyzmq. Because these packages have non-trivial installation procedures (as they use C extensions) and because there may be an old (conflicting) version of Qt installed, the packaging of the application is in binary forms for Windows, Mac OS X (Intel only) and Linux.

A source distribution is also provided; this assumes that you have PyQt4 and optionally pyzmq already installed.

Downloads are available from the project site, where you can also report issues.

Screenshot:

The Linux binary was smoke-tested on: Fedora 14, Ubuntu (Jaunty, Karmic, Maverick), Linux Mint 10 (Ubuntu and Debian variants), Debian Lenny and OpenSUSE 11.3. It looks better on some than on others, but seems to work on all these platforms.

The Windows binary was smoke-tested on Windows XP and Windows 7.

The Mac OS X binary was smoke-tested on Leopard (Python 2.5).

All feedback gratefully received. Please post issues on the project site (requires a Google account), and also feel free to use the mailing list.

Monday, 17 January 2011

A desktop log viewer

I’ve created a simple desktop application which is a viewer for viewing logging records sent from applications via the SocketHandler and DatagramHandler. It’s early days and the viewer is fairly simple, so I’ve created a screencast so that I can get some feedback on the user interface before spending too much more time on it. If people think it will be useful, I can release it under a 3-clause BSD license.

I realize that web interfaces are all the rage, but it took me less time to produce this desktop app using PyQt than an equivalent web application would have.

I developed this on Ubuntu Maverick using the bundled versions of Qt and PyQt, and it seems pretty stable. With older versions of Qt/PyQt (e.g. the versions bundled with Ubuntu Karmic and earlier) I have had segfaults with the identical code - I'm not sure where the problem is.

The code also works with a recent version of PyQt on Windows 7.

The screencast is around 3 minutes long.

The video content presented here requires JavaScript to be enabled and the latest version of the Macromedia Flash Player. If you are you using a browser with JavaScript disabled please enable it now. Otherwise, please update your version of the free Flash Player by downloading here.

I’d be grateful for any feedback about the user interface. I know it’s not especially innovative but then it’s doing a fairly simple job :-)