Tuesday, 13 December 2011

Improved flexibility for log file rotation

The RotatingFileHandler and TimedRotatingFileHandler classes provide basic log file rotation functionality, but there are times when more functionality is required than is provided in these classes. One requirement is to compress log files when they are rotated. Currently, if you want this kind of functionality, you have to subclass the relevant handler and override the doRollover method. This works, but requires you to copy and paste the entire method, and then adapt it to your needs; clearly, this is not ideal. An example of this can be found in this ActiveState recipe for a TimedCompressedRotatingFileHandler. This uses the .zip format to compress log files.

In order to provide improved flexibility for log file processing (which will generally be compression, but in practice any transformation could be performed), I propose to add two new methods to these handlers (in their common base class, BaseRotatingHandler).

The first method is named rotate. It takes two arguments, source and dest. The source argument is the name of the log file about to be rotated (e.g. test.log), and the dest argument names the target of the rotation (e.g test.log.1). Under normal circumstances, the file named by dest should not exist, having been explicitly removed by the rollover logic. the default behaviour is what happens currently – a call to os.rename(source, dest).

Since we are potentially transforming log files, we might also want to change their extension, so that they are more usable with e.g. file system viewers like Nautilus or Windows Explorer. To facilitate this, a second method is provided, called filename. This takes a single argument – the filename of a destination file in a rotation (for example, test.log.1)  – and returns a modified version (for example, adding an extension – test.log.1.gz). The default behaviour is to return the name unchanged.

The definition of the new methods is as follows.

    def filename(self, name):

        """
        Modify the filename of a log file when rotating.

        This is provided so that a custom filename can be provided,
        say by adding an extension.

        The default implementation returns the name unchanged.

        :param name: The default name for the destination log file.
        """

    def rotate(self, source, dest):

        """
        When rotating, rotate the current log in the file named by
        source to the file named by dest.

        On entry, the file named by source should exist and the file
        named by dest should not exist.

        On exit, the file named by source should not exist and the

        file named by dest should exist.

        The default implementation simply renames source to dest.

        :param source: The source filename. This is normally the base
                       filename, e.g. 'test.log'
        :param dest:   The destination filename. This is normally
                       what the source is rotated to, e.g. 'test.log.1'.
        """

This will allow a custom rotation strategy to be implemented relatively easily: for example, if you want to gzip rotated log files, you can ensure that filename returns test.log.1.gz when passed test.log.1, and that rotate gzips test.log to test.log.1.gz.

The ActiveState example I cited earlier used the .zip format, but what if you want to use .gz, .bz2, .7z or some other compression algorithm? You would need to support all of them in a single handler, or write multiple handler classes for each format.

To avoid the need for users to subclass the rotating handlers, I further propose to allow configurability of the rotate and filename methods by providing two new handler attributes (both, by default, set to None in BaseRotatingHandler).

The first handler attribute is named rotator. You can set it to a callable that takes two arguments, source and dest. The meanings of these arguments are the same as for the rotate method. If a callable is provided in the rotator attribute, it is called by the rotate method to perform the rotation.

The second handler attribute is called namer. You can set this to a callable which takes a single argument – the filename of a destination file in a rotation (for example, test.log.1)  – and modifies it to, for example, add an extension. If a callable is set, then it is called by the filename method to return the new name.

An example of how you can define a namer and rotator is given in the following snippet, which shows zlib-based compression of the log file.

def namer(name):
    return name + ".gz"

def rotator(source, dest):
    with open(source, "rb") as sf:
        data = sf.read()
        compressed = zlib.compress(data, 9)
        with open(dest, "wb") as df:
            df.write(compressed)
    os.remove(source)

rh = logging.handlers.RotatingFileHandler(...)
rh.rotator = rotator
rh.namer = namer

I’m aware that these are not “true” .gz files, as they are bare compressed data, with no “container” such as you’d find in an actual gzip file. This snippet is just for illustration purposes.

Note: The namer function is called quite a few times during rollover, so it should be as simple and as fast as possible. It should also return the same output every time for a given input, otherwise the rollover behaviour may not work as expected. If either the namer or rotator function raises an exception, this will be handled in the same way as any other exception during an emit() call, i.e. via the handleError method of the handler.

The internal logic for determining which files to delete (when a backupCount is specified on a rotating file handler) uses, in the case of  TimedRotatingFileHandler, regular expressions for the date/time-based suffix. The regular expressions used have been updated to allow an optional file extension consisting of a period followed by one or more letters and digits.

These changes will soon appear in Python 3.3’s development repository. Your comments are welcome.

9 comments:

  1. Idle thought: from the "batteries included" perspective it'd be nice to have an implementation of the traditional Unix/Linux gzipped file rotation in the stdlib, not just hooks that enable one to implement it.

    ReplyDelete
  2. But don't forget that Python, and the stdlib, are not supposed to be Unix/Linux centric. Of course many of the core developers work primarily on Posix platforms, but you'll note that ActiveState recipe was focusing on the zip format more commonly encountered on Windows. That is why I have opted for the provision of hooks; it's one less bikeshed to worry about ;-)

    ReplyDelete
  3. Great!

    But I agree with Marius, we need gzipped logs in the stdlib. Maybe we need both, gzipped and zipped logs.

    Why not implement a mix-in class? Just like socketserver module! E.g. GzipMixIn and ZipMixIn classes:

    class ZippedRotatingFileHandler(ZipMixIn, RotatingFileHandler): pass

    or

    class GzippedTimedRotatingFileHandler(GzipMixIn, TimedRotatingFileHandler): pass

    or whatever combination you want.

    ReplyDelete
  4. http://packages.python.org/Logbook/api/handlers.html#logbook.FingersCrossedHandler - that would be much more useful IMO. =) And some network handler as well.

    ReplyDelete
  5. @anatoly: There is MemoryHandler in the stdlib which can provide similar functionality. The same functionality as the handler you mention can be obtained by subclassing MemoryHandler.

    I don't know what you mean by "network handler", there are stdlib handlers for UDP, TCP, SMTP, HTTP servers and syslog accessed via UDP or domain sockets.

    ReplyDelete
  6. How about making it even more flexible by also adding a pre and post rotation method/hook/property.

    The idea is that the rotation action can signal user supplied code to be executed before and after the file is rotated. Not sure what the best method for implementing this functionality, but as a user I think it would probably be easiest to be able to add a callable as pre-/post-hook when initializing the handler instead of sub-classing and then overriding methods.

    I'm sure there are many situations where these hooks could be usable - for compressing on post rotate with any custom file compressor for example :-)

    Arguments to these hooks should include the original file name and the rotated file name and probably a copy of the handler instance (self) to get to other handler specific settings.

    ReplyDelete
  7. @fitzterra: I think your case is less common, so you can address it by subclassing. The existing default implementations are pretty much one-liners anyway; you can make your subclass override the new methods and do whatever you want.

    ReplyDelete
  8. Just a heads up, the 'filename' method is now 'rotation_filename'.
    For true gzip files, simply replace 'zlib' with 'gzip' in the snippet. As of 3.1, gzip supports the 'with' statement:

    import os
    import gzip
    from logging.handlers import RotatingFileHandler

    class RotatingGzipFileHandler(RotatingFileHandler):
    def rotation_filename(self, name):
    return name + ".gz"

    def rotate(self, source, dest):
    with open(source, "rb") as sf:
    with gzip.open(dest, "wb") as df:
    df.writelines(sf)
    os.remove(source)

    ReplyDelete
  9. This is highly informatics, crisp and clear. I think that everything has been described in systematic manner so that reader could get maximum information and learn many things.
    http://anixusa.com/

    ReplyDelete