Plumber Jack: October 2010

Sunday 31 October 2010

logutils: Using recent logging features with older Python versions

Python’s development is a process of continuous improvement, and the standard library of course undergoes changes as new versions of Python are released. In the logging package, there have been numerous improvements in recent months:

Dictionary-based configuration (PEP 391)
QueueHandler for working with multiprocessing and in-process queues

In addition, there have been a few bits of utility code which have been posted here:

Classes which facilitate use of logging in particular scenarios, such as unit testing
Classes which facilitate building logging messages using str.format or string.Template

Many of the enhancements have been added as new features which will emerge in Python 3.2, but cannot be added to Python 2.X because there are no Python 2.X branches where new features can be added. Python 2.7, the last 2.X release, is in maintenance mode and will only accept bug-fixes which maintain backwards compatibility.

Although Python 3.2+ is the way forward for Python development, many users will be constrained to use 2.X for some considerable time yet, mostly because libraries they depend on have not yet been ported to 3.X, but sometimes also because their codebase is too large to consider porting to 3.X, when viewed from a cost/benefit viewpoint, or perhaps because their code needs to remain workable under 2.X and they cannot maintain two separate codebases.

For those users who are forced to stick with Python 2.X but who want to use some of the features added recently to logging, the logutils project offers one way in which they can access these features. This is available via PyPI so it can be installed using pip or easy_install. The package is currently at release 0.2 and contains the following:

Dictionary-based configuration (PEP 391) which is broadly compatible with Python 2.7 and 3.2.
The LoggerAdapter class is easier to subclass.
The QueueHandler and QueueListener classes are available for use with in-process, multiprocessing and other types of queue, as described in this post.
The TestHandler and Matcher classes are available for use in unit tests, as described in this post.
An enhanced HTTPHandler allows the use of secure connections and credentials for logging in to websites.

The package needs Python 2.4 or later. Some features are not available in Python 2.4, due to limitations of that Python version. For example, str.format is only available from Python 2.6 onwards; LoggerAdapter relies on a feature added in Python 2.5, namely the extra parameter for adding additional context to logs; and named handlers are not available for Python 2.X versions < 2.7 and Python 3.X versions < 3.2. (As a consequence of this, incremental handler configuration is not available on the versions where named handlers are not available.)

Project downloads are here. You can post issues here (needs a Google account). Documentation is available here.

Thursday 28 October 2010

Supporting alternative formatting styles in logging

When logging was added to the Python standard library, the only way of formatting messages with variable content was to use the %-formatting method. Since then, Python has gained two new formatting approaches: string.Template (added in Python 2.4) and str.format (added in Python 2.6).

Now, Python’s logging package will provide improved support for these two additional formatting styles. The Formatter class in logging has been enhanced for Python 3.2 to take an additional, optional keyword parameter named style. This defaults to '%', but other possible values are '{' and '$', which correspond to the other two formatting styles. Backwards compatibility is maintained by default (as you would expect), but by explicitly specifying a style parameter , you get the ability to specify format strings which work with str.format or string.Template. Here’s an example console session to show the possibilities:

vinay@eta-jaunty:~$ python3.2
Python 3.2a3+ (py3k:85857, Oct 27 2010, 18:25:50) 
[GCC 4.3.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.

>>> import logging
>>> root = logging.getLogger()
>>> root.setLevel(logging.DEBUG)
>>> handler = logging.StreamHandler()
>>> bf = logging.Formatter('{asctime} {name} {levelname:8s} {message}', style='{')
>>> handler.setFormatter(bf)
>>> root.addHandler(handler)
>>> logger = logging.getLogger('foo.bar')
>>> logger.debug('This is a DEBUG message')
2010-10-28 15:11:55,341 foo.bar DEBUG    This is a DEBUG message
>>> logger.critical('This is a CRITICAL message')
2010-10-28 15:12:11,526 foo.bar CRITICAL This is a CRITICAL message
>>> df = logging.Formatter('$asctime $name ${levelname} $message', style='$')
>>> handler.setFormatter(df)
>>> logger.debug('This is a DEBUG message')
2010-10-28 15:13:06,924 foo.bar DEBUG This is a DEBUG message
>>> logger.critical('This is a CRITICAL message')
2010-10-28 15:13:11,494 foo.bar CRITICAL This is a CRITICAL message
>>>

Note that the formatting of logging messages for final output to logs is completely independent of how an individual logging message is constructed. That can still use %-formatting, as shown here:

>>> logger.error('This is an%s %s %s', 'other,', 'ERROR,', 'message')
2010-10-28 15:19:29,833 foo.bar ERROR This is another, ERROR, message
>>>

Logging calls (logger.debug(), logger.info() etc.) only take positional parameters for the actual logging message itself, with keyword parameters used only for determining options for how to handle the actual logging call (e.g. the exc_info keyword parameter to indicate that traceback information should be logged, or the extra keyword parameter to indicate additional contextual information to be added to the log). So you cannot directly make logging calls using str.format or string.Template syntax, because internally the logging package uses %-formatting to merge the format string and the variable arguments. There would no changing this while preserving backward compatibility, since all logging calls which are out there in existing code will be using %-format strings.

There is, however, a way that you can use {}- and $- formatting to construct your individual log messages. Recall that for a message you can use an arbitrary object as a message format string, and that the logging package will call str() on that object to get the actual format string. Consider the following two classes:

class BraceMessage(object):
    def __init__(self, fmt, *args, **kwargs):
        self.fmt = fmt
        self.args = args
        self.kwargs = kwargs

    def __str__(self):
        return self.fmt.format(*self.args, **self.kwargs)

class DollarMessage(object):
    def __init__(self, fmt, **kwargs):
        self.fmt = fmt
        self.kwargs = kwargs
        
    def __str__(self):
        from string import Template
        return Template(self.fmt).substitute(**self.kwargs)

Either of these can be used in place of a format string, to allow {}- or $-formatting to be used to build the actual “message” part which appears in the formatted log output in place of “%(message)s” or “{message}” or “$message”. It’s a little unwieldy to use the class names whenever you want to log something, but it’s quite palatable if you use an alias such as __ (double underscore – not to be confused with _, the single underscore used as a synonym/alias for gettext.gettext or its brethren).

The above classes, and recent additions to the logging package, will be published in a package called logutils, intended for use with Python 2.4 or later. This is already available for download via PyPI (and will be the subject of a future blog post). They can be used as follows:

>>> from logutils import BraceMessage as __

>>> print(__('Message with {0} {1}', 2, 'placeholders'))
Message with 2 placeholders
>>> class Point: pass
... 
>>> p = Point()
>>> p.x = 0.5
>>> p.y = 0.5
>>> print(__('Message with coordinates: ({point.x:.2f}, {point.y:.2f})', point=p))
Message with coordinates: (0.50, 0.50)
>>> from logutils import DollarMessage as __
>>> print(__('Message with $num $what', num=2, what='placeholders'))
Message with 2 placeholders
>>>

One thing to note is that you pay no significant performance penalty with this approach: the actual formatting happens not when you make the logging call, but when (and if) the logged message is actually about to be output to a log by a handler. So the only slightly unusual thing which might trip you up is that the parentheses go around the format string and the arguments, not just the format string. That’s because the __ notation is just syntax sugar for a constructor call to one of the XXXMessage classes.

Your comments are, as always, welcome. Thanks for reading.

Tuesday 26 October 2010

Shared state in logging: why?

It’s generally recognised that undisciplined use of global variables is a warning sign about problems with a design. In fact, the paper “Global Variables Considered Harmful” by Wulf and Shaw dates back to 1973. The word “global” and the phrase “global state” have therefore acquired bad connotations in many people’s minds, and that can lead to situations where you “throw the baby out with the bath-water”.

The concept of state shared between multiple software components doesn’t have to be treated in the same way as naïve usage of global variables. There are legitimate situations where state needs to be shared – you will almost certainly be using a computer with a display, a mouse and a keyboard, all of which are essentially shared state (meaning data, and operations on that data, too) for the windowing system your computer is using.

Python’s logging package uses some shared state; Logger objects are shared by all Python code in a single process, in that a call to logging.getLogger('foo.bar') will always return the same object, no matter where it’s called from. The fact that Loggers are singletons is for a specific, considered reason, and not just because I was taking a design short-cut or didn’t know how to do it any other way.

There are alternative logging libraries which make a point of the fact that their loggers are constructed anew each time, so that a call

logger = Logger('foo.bar')

will return a new object every time. These systems perhaps use loggers just as a vehicle for delivering events via their debug(), info() etc. methods; for stdlib logging, Loggers are something more. They play a part in how an application developer, system administrator or support team member controls the logging verbosity of an application. This is done through configuration, which needs to consider the verbosity of all components used in an application - including third-party components whose source you are not free to modify.

Logging verbosity is primarily controlled by configuring level, propagation behaviour and filters for a logger. You can, of course, also control it via configuring levels and filters on handlers; but the best way is to configure loggers where possible, because that avoids the overhead of dispatching events to handlers only to have them be thrown away due to the handler’s configuration.

The logging package relies on loggers being shared across modules, in order to offer this configurability. If my application is using a third-party library which logs some events to a logger named foo.bar, and if I want to configure the verbosity of this logger - by setting its level, say – I can only do this if I can be sure that

logger = logging.getLogger('foo.bar')

will return the same object to the library code, which logs to logger foo.bar, and to my configuration code, which sets that logger’s level to what I want. If the two calls to get that named logger were to return two entirely different objects, then the setLevel() call made by my configuration code would not affect the object used internally by the library. This means that I would essentially lose some level of control over logging verbosity.

When developers are essentially logging just for themselves – i.e. they are the primary, perhaps only, audience for their logging output – then these considerations may not matter too much. If they are not using third-party libraries which use logging, or using libraries whose logging verbosity they don’t care about, that may be another reason for not worrying about this loss of configurability. Since Python logging is intended for the whole Python developer community, some of whom will definitely need and want this level of configurability, the design supports it, using shared state. I know I use this feature myself regularly – say, when examining the SQL output from ORMs such as are provided by Django and SQLAlchemy. I don’t generally log the SQL output (which can be quite voluminous), but might do so as a result of coming across some problem, whether functional or performance-related. When the problem is identified, I can turn the logging of SQL statements off again. In larger application scenarios, this functionality is also available to those who have no means of changing the source code - application support teams or system administrators – assuming that the application developers have provided for this via the mechanisms offered by the logging package.

Another important (but related) design feature of Python logging is the use of hierarchical logger names. In stdlib logging, logger names are not just things which turn up in logging output. When you create a logger named foo.bar, you also conceptually create a logger named foo. This latter logger is not actually created unless and until a getLogger('foo') call is made, but the hierarchical namespace feature (and the way in which handler inheritance works with the logger hierarchy) allows you to turn logging verbosity up and down at multiple levels of a package hierarchy quite easily – for example, turn on all logging in the django logging namespace, or just turn it on just for django.request, or perhaps just for django.db.backends – it puts the application developer/support team/sys admin in control, rather than just the library developer. And while the hierarchical feature might seem overkill for small projects and simple scripts, it doesn’t get in the way, and pays dividends when logging is used in larger systems.

Wednesday 20 October 2010

StreamHandler’s newline terminator now configurable

When StreamHandler writes a formatted log message to its stream, it adds a newline terminator. This behaviour is inherited by FileHandler and the other classes which derive from it (such as the rotating file handlers).

For most people, that’s what they want, since they get log messages on separate lines without having to explicitly code newlines in their log messages. However, some people want the flexibility to not have this newline automatically appended to logged messages.

Starting with Python 3.2, the message terminator will be configurable. This has been done by adding a terminator attribute to StreamHandler, which when emitting an event now writes the formatted message to its stream first, and then writes the terminator. If you don’t want newline termination for a handler, just set the handler instance’s terminator attribute to the empty string.

There’s a small chance that this approach will cause unexpected behaviour – in the unlikely instance that someone is setting a terminator attribute on an a StreamHandler or one of its subclasses for their own, unrelated purposes. Other than that, the new behaviour should be backwards compatible.

If this change is likely to adversely affect you for any reason, please let me know, by replying to this post. The change has already been checked into the py3k branch, and you are welcome to try it out.

Implementing filters using callables

It’s been pointed out that for implementing logging filters, you shouldn’t have to go to the lengths of creating a Filter class just to define its one filter method. Of course, there are cases where you will need to use Filter classes, but there are also cases where it would be beneficial to just use a function. To cater for this, changes have been made to Python 3.2. Once this enters beta, you will be able to pass a function or other callable to the addFilter and removeFilter methods of Handler and Logger, and this will be called with the LogRecord as its single argument to decide whether the event is to be logged.

The previous functionality did not actually require you to create Filter subclasses – you could pass any instance that had a filter method with the appropriate semantics. The Filter class was an illustration of how to do filtering as well as providing a not uncommon use case – filtering out everything other than a specific sub-tree in the logger namespace.

However, with this change, you’re not forced to call your filtering method filter, so you could e.g. use different filtering methods in a single class, or different functions in a module.

The development documentation has been updated to reflect this change.

Plumber Jack