Wednesday 29 December 2010

Colourising logging output in terminals

A nice-to-have feature when logging to terminals is to have the output colourised, typically according to the severity of individual messages. This is fairly straightforward to achieve, but the following points have to be borne in mind:

  • On Unix, Linux and OS X, terminals support ANSI escape codes for colourisation, and so a single solution should be able to support all of these platforms.
  • Windows does not support ANSI escape codes (though MS-DOS used to, using the ANSI.SYS driver), but does provide an API which can be used to achieve a similar effect.
  • When the terminal output is redirected to a file, colourisation should be disabled.

In this post, I’ll present a fairly simple handler which considers each of the above points in its design and implementation. I’ll be adding this handler to the logutils package soon.

We’ll assume that an entire formatted message will be colourised. If you need to colourise only certain parts of the message, you can adapt the solution presented here fairly easily – perhaps by adding a specialised formatter which is aware of ANSI escape codes.

We’ll also assume that we only want to be able to set background colour, foreground colour and foreground intensity, and that we’ll only want to set eight colours – red, green, blue, cyan, magenta, yellow, black and white. That’s more or less the lowest common denominator afforded us by ANSI escape sequences. If you have more specialised needs which are supported by your environment, you should be able to adapt the solution presented here, most likely just by subclassing.

It seems appropriate to base the handler on logging.StreamHandler. In addition to importing logging, we’ll need to import os (so that we can take appropriate action on Windows) and ctypes (so we can access the Windows console API needed to colourise terminal output).

This is how we start the handler implementation:

import ctypes
import logging
import os

class ColorizingStreamHandler(logging.StreamHandler):

@property
def is_tty(self):
isatty = getattr(self.stream, 'isatty', None)
return isatty and isatty()

def emit(self, record):
try:
message = self.format(record)
stream = self.stream
if not self.is_tty:
stream.write(message)
else:
self.output_colorized(message)
stream.write(getattr(self, 'terminator', '\n'))
self.flush()
except (KeyboardInterrupt, SystemExit):
raise
except:
self.handleError(record)

Noting that we need to determine when terminal output is piped to a file, we provide a property called is_tty on the handler which says whether the handler’s stream is a terminal. If it is, we’ll use a method called output_colorized (whose implementation will be platform dependent, and discussed shortly) to do the colourised output; otherwise, we just write the message to the stream.

In order to add colour to the message, we’ll override the format method of the handler:

    def format(self, record):
message = logging.StreamHandler.format(self, record)
if self.is_tty:
# Don't colorize any traceback
parts = message.split('\n', 1)
parts[0] = self.colorize(parts[0], record)
message = '\n'.join(parts)
return message

We first call the base class format method to do the actual formatting according to whatever formatter is set for the handler. If we are writing to a terminal, we extract the first line of the formatted text (which could contain an exception traceback), colourise that first line and add the traceback without any colouring.

I did consider putting the colourising logic into a specialised Formatter subclass, but decided against it, because:

  • Colourising is only done when writing to a terminal; a formatter would be unaware of specific output destinations.
  • Placing the colourising logic in a handler would allow a formatter to be shared with other handlers.
  • It’s simpler to keep everything in one place.

To do the grunt work, we’ll define the colorize helper method together with some data which drives it. We’ll make it the method’s job to add ANSI escape sequence to colour the message, even though Windows does not support ANSI escape sequences natively - we’ll just have to deal with it in the implementation of output_colorized, which we’ll come to a little later.

Let’s discuss ANSI escape sequences briefly now – for more information you can consult this article. Basically, colourising is done by placing a special character sequence in the character stream sent to the terminal. The sequence always starts with an ESC (ASCII 27) followed by a left bracket character ([). The sequence always ends with an ‘m’. In between is one or more integer parameters (as printed characters in decimal format, separated by semicolons). These parameters have the following meanings:

  • 0 – reset background and foreground colours and intensity to the defaults.
  • 1 – set the foreground intensity on.
  • between 30 and 37 – set the foreground colour to one of the values in this table, after subtracting 30 from the parameter.
  • between 40 and 47 - set the background colour to one of the values in this table, after subtracting 40 from the parameter.

There are other parameters with other meanings in the ANSI specification, but we won’t concern ourselves with those as they aren’t relevant to our simple colourisation scheme.

So, we use a simple color_map dictionary that allows mapping colour names to integer colour values. We also use a level_map dictionary which maps logging levels to a tuple encoding the background colour, foreground colour (colour names) and intensity (a Boolean). If no background colour is specified (None) , the default is used. Because Windows terminals usually have a dark background and Linux terminals a light background, the level_map is defined slightly differently on Windows. We also define the escape sequence beginning (called the CSI for Control Sequence Introducer/Initiator), and the reset sequence.

    # color names to indices
color_map = {
'black': 0,
'red': 1,
'green': 2,
'yellow': 3,
'blue': 4,
'magenta': 5,
'cyan': 6,
'white': 7,
}

#levels to (background, foreground, bold/intense)
if os.name == 'nt':
level_map = {
logging.DEBUG: (None, 'blue', True),
logging.INFO: (None, 'white', False),
logging.WARNING: (None, 'yellow', True),
logging.ERROR: (None, 'red', True),
logging.CRITICAL: ('red', 'white', True),
}
else:
level_map = {
logging.DEBUG: (None, 'blue', False),
logging.INFO: (None, 'black', False),
logging.WARNING: (None, 'yellow', False),
logging.ERROR: (None, 'red', False),
logging.CRITICAL: ('red', 'white', True),
}
csi = '\x1b['
reset = '\x1b[0m'

def colorize(self, message, record):
if record.levelno in self.level_map:
bg, fg, bold = self.level_map[record.levelno]
params = []
if bg in self.color_map:
params.append(str(self.color_map[bg] + 40))
if fg in self.color_map:
params.append(str(self.color_map[fg] + 30))
if bold:
params.append('1')
if params:
message = ''.join((self.csi, ';'.join(params),
'm', message, self.reset))
return message

Once the mappings and constants have been defined, the colorize method is very simple. If the record’s level is in the level map, we get the tuple for background and foreground colours and intensity, get the appropriate commands and construct the escape sequence. We return the passed in message, bracketed by the initial escape sequence which encodes the appropriate colours and intensity for the record’s level, and the final escape sequence which resets the terminal to default colours and intensity.

The implementation thus far adds ANSI escape sequences to messages to colourise them. These are natively supported on Unix, Linux and OS X, but not so on Windows. We therefore define the output_colorized message, which is called from emit,  differently on NT and non-NT platforms. The non-NT implementation is very simple: assuming ANSI support is native, we just write the colourised message to the output stream. For Windows, we need to interpret the ANSI escape sequences and implement them with Win32 API calls. Here are the implementations:

    if os.name != 'nt':
def output_colorized(self, message):
self.stream.write(message)
else:
import re
ansi_esc = re.compile(r'\x1b\[((\d+)(;(\d+))*)m')

nt_color_map = {
0: 0x00, # black
1: 0x04, # red
2: 0x02, # green
3: 0x06, # yellow
4: 0x01, # blue
5: 0x05, # magenta
6: 0x03, # cyan
7: 0x07, # white
}

def output_colorized(self, message):
parts = self.ansi_esc.split(message)
write = self.stream.write
h = None
fd = getattr(self.stream, 'fileno', None)
if fd is not None:
fd = fd()
if fd in (1, 2): # stdout or stderr
h = ctypes.windll.kernel32.GetStdHandle(-10 - fd)
while parts:
text = parts.pop(0)
if text:
write(text)
if len(parts) > 4:
params = parts[0]
parts = parts[4:]
if h is not None:
params = [int(p) for p in params.split(';')]
color = 0
for p in params:
if 40 <= p <= 47:
color |= self.nt_color_map[p - 40] << 4
elif 30 <= p <= 37:
color |= self.nt_color_map[p - 30]
elif p == 1:
color |= 0x08 # foreground intensity on
elif p == 0: # reset to default color
color = 0x07
else:
pass # error condition ignored
ctypes.windll.kernel32.SetConsoleTextAttribute(h, color)

Because the integer values for ANSI colour codes differ slightly from those for Windows, we use an nt_color_map dictionary to map between the two. We use regular expressions to split the colourised message into escape sequences and literal text to be output. (Note that because there are four groups in the regular expression, the list will contain four entries for each ANSI sequence found.)

In order to colourise the text in the terminal, we need to obtain a Windows console handle for the output stream, using the GetStdHandle Win32 API. If this handle is not obtainable, literal text is output but the ANSI escape sequences are ignored. If the handle is obtained, then the ANSI sequences are converted to the corresponding Win32 attribute values and the SetConsoleTextAttribute API is used to write them to the console.

The whole thing is available a ready-to-run form in this gist, which should run on all Pythons >= 2.5 (and Python 2.4, too, if you have ctypes installed). Here are some screenshots of running the scripts on Ubuntu, Windows 7, Windows 2008, and Mac OS X, showing the expected results for both terminal output and output piped to a file.

Ubuntu Jaunty:

ansistrm-jaunty

Windows 7:

ansistrm-win7

Windows 2008 (Python 2.4):

ansistrm-win2k8

Mac OS X:

ansistrm-macosx

If you want different colour settings, you can probably achieve this just by changing the level_map contents. If you want to insert colourisation for just part of your messages, you could change the logic in the colorize and/or format methods.  Of course, you can also override output_colorized and colorize if you need to support additional ANSI escape sequences.

Your comments are welcome, particularly any suggestions for improvements. Thanks for reading.

Sunday 19 December 2010

Logging documentation for Python 3.2 reorganised

After comments that the length of logging documentation was making it hard to navigate, and after the addition of a new basic tutorial, the documentation has been reorganised. The library reference is supposed to be just that – a reference – so the tutorials have been moved to the HOWTO section. The configuration and handler documentation has also been moved into separate pages. The organisation of the Python 3.2 logging documentation is now as follows:

There’s a prominent sidebar at the start of the reference documentation pages pointing to the HOWTO sections.

I hope the results are easier to navigate and to understand. Please feel free to give your feedback.

Thursday 9 December 2010

Proposed changes to logging defaults

Following recent discussion on python-dev, some changes to logging’s default behaviour are likely to be implemented soon, perhaps as soon as in 3.2b2 (3.2b1 was released just a few days ago). By default behaviour is meant behaviour in the absence of any explicit logging configuration.

The current behaviour, which dates from logging’s first appearance in Python, is as follows. When a logging event needs to be output, the logging package looks for handlers which have been configured by the application. If no such handlers are found, the package prints a message “No handlers could be found for logger XXX”, naming the logger that was used to log the event. (The message is only ever printed once, even if subsequent logging calls are made. Furthermore, the message is only printed when logging.raiseExceptions is True, which is the “development” mode of operation for logging. When logging.raiseExceptions is False, the “production” mode of operation for logging, the message isn’t printed, even once.)

This creates some problems when an application (or simple script, or interactive session – I’ll use script to mean any of these) meets the following criteria:

  • It uses a library which does logging under the covers.
  • It doesn’t configure any handlers.
  • The library code invoked logs events of severity WARNING or above.

Because the default verbosity for logging is WARNING, the logging package looks for handlers, and, not finding any, prints the one-off “No handlers could be found …” message. This may be a surprise to the author of the script and also for its users; it can generate support calls to the author from the users. The message was provided as a means of warning developers, who would have been new to logging, about a possible misconfiguration. (Of course, when logging was introduced,  everyone was new to it.) The importance of the message is less now than it was before, because many people are familiar with how logging works and provide good support on comp.lang.python to people who are newcomers to logging, or who run into issues that are simple to resolve.

To deal with the problem described above, the approach suggested has been for library developers to add a NullHandler instance to their top-level logger. This prevents messages from library code from causing the misconfiguration message to be printed, because a handler — the NullHandler — is always found when handling events from the library. If a script developer adds logging events in their own code which need to be handled, then the misconfiguration message will still appear, and that’s as it should be.

When the logging package was introduced, this was done with a philosophy that logging is an adjunct to normal application activity — meaning that a script should work in exactly the same way with or without the presence of logging. So logging tries to work as unobtrusively as possible, and is set up so that in production use, exceptions that occur in logging (e.g. mismatched format string and parameters) will not be raised, allowing a script to carry on uninterrupted. Also pertinent is the Unix philosophy that verbosity should be kept to a minimum.

The above describes the current state of play, but things are now set to change. The rationale for changing is:

  • The present behaviour goes against the Zen of Python, which says “Errors should never pass silently. Unless explicitly silenced.”
  • Python core developers want to be able to use the logging package not just as an adjunct to doing real work in the standard library, but to actually do the real work when it comes to the output of warning and error messages.
  • In some cases exceptions can’t be usefully raised, and writing out a message is the only way to provide information about an error condition which has been detected. Providing it via logging provides more flexibility in dealing with it than would a bare write to sys.stderr.

These changes will, in due course, bring benefits such as being able to redirect standard library warning and error messages, which are currently written to sys.stderr, to some destination of a developer’s choosing. However, it may cause some inconvenience to existing users, because the proposed behaviour change is not backwards-compatible.

How will the change be implemented? This is still being thought about, but the current front runner is to implement through the idea of a “handler of last resort”. When no handler is found when processing an event which needs to be output, instead of printing the misconfiguration message, the system will use the “last resort” handler to handle the event. The “last resort” handler will be (by default) a StreamHandler using sys.stderr as its stream, with its level set to WARNING. Details about how to preserve existing behaviour are still being worked on; one thought is that you can get existing behaviour by setting the last resort handler to None.

What’s the impact on users of logging?

  • If you’re using a version of Python earlier than 3.2, there’s no impact. Things will stay as they are.
  • Script developers: If you configure logging (i.e. add handlers), there should be no impact – the behaviour you see shouldn’t change, as long as there is no situation where a handler isn’t found. If you don’t configure logging, then in place of a misconfiguration message, you’ll see events of severity WARNING and above printed to sys.stderr without any specific formatting (i.e. as if the format string were "%(message)s").
  • Library developers: If the script using your library configured logging, you should see no change in behaviour. Otherwise: if you added a NullHandler to your library’s top-level logger, you should see no change in behaviour, but remember — your warning and error messages will not be shown to end-users. If you reverse the NullHandler addition or didn’t add it in the first place, then instead of a misconfiguration message, end-users will see your warning and error messages on sys.stderr.
  • End users: If the script being run configured logging, you should see no change in behaviour. If it didn’t, then in place of a misconfiguration message, you’ll see events of severity WARNING and above printed to sys.stderr without any specific formatting.

I hope I’ve covered everything, but should any kind reader spot some error or omission, please let me know.

Please feel free to comment giving your views on the proposed change, whether you approve of it or whether you think it affects you adversely. Thanks for reading.

Friday 3 December 2010

Getting more control over LogRecord creation

Every logging event is represented by a LogRecord instance. When an event is logged and not filtered out by a logger’s level, a LogRecord is created, populated with information about the event and then passed to the handlers for that logger (and its ancestors, up to and including the logger where further propagation up the hierarchy is disabled). Before Python 3.2, there were only two places where this creation was done:

  1. Logger.makeRecord, which is called in the normal process of logging an event. This invoked LogRecord directly to create an instance.
  2. logging.makeLogRecord, which is called with a dictionary containing attributes to be added to the LogRecord. This is typically invoked when a suitable dictionary has been received over the network (e.g. in pickle form via a SocketHandler, or in JSON form via an HTTPHandler).

This has usually meant that if you need to do anything special with a LogRecord, you’ve had to do one of the following.

  1. Create your own Logger subclass, which overrides makeRecord(), and set it using logging.setLoggerClass before any loggers that you care about are instantiated.
  2. Add a Filter to a logger or handler, which does the necessary special manipulation you need when its filter() method is called.

The first approach would be a little unwieldy in the scenario where (say) several different libraries wanted to do different things. Each would attempt to set its own Logger subclass, and the one which did this last would win. The second approach works reasonably well for many cases, but does not allow you to e.g. use a specialized subclass of LogRecord. Library developers can set a suitable filter on their loggers, but they would have to remember to do this every time they introduced a new logger (which they would do simply by adding new packages or modules and doing logger = logging.getLogger(__name__) at module level). It’s probably one too many things to think about. They could also add the filter to a NullHandler attached to their top-level logger, but this would not be invoked if an application developer attached a handler to a lower-level library logger – so output from that handler would not reflect the intentions of the library developer.

In Python 3.2, LogRecord creation will be done through a factory, which you can specify. The factory is just a callable you can set with logging.setLogRecordFactory(factory), and interrogate with logging.getLogRecordFactory(). The factory is invoked with the same signature as the LogRecord constructor, as LogRecord is the default setting for the factory:

factory(name, lvl, pathname, lineno, msg, args, exc_info, func=None, sinfo=None, **kwargs) 

The parameters are as follows:

  • name – the name of the logger used.
  • lvl – the numeric value of the event’s level.
  • pathname, lineno – the location in the source code where the logging call appears.
  • msg, args – the logging message and arguments.
  • exc_info – any exception info, or None.
  • func – the name of the function or method from which the logging call was made.
  • sinfo – a traceback such as is produced by traceback.print_stack. (This is not the same as the exception traceback.)
  • kwargs – currently empty, but this could change in future versions of logging.

This approach allows a custom factory to control all aspects of LogRecord creation. For example, you could return a subclass, or just add some additional attributes to the record once created, using a pattern similar to this:

old_factory = logging.getLogRecordFactory()

def record_factory(*args, **kwargs):
record = old_factory(*args, **kwargs)
record.custom_attribute = 0xdecafbad
return record

logging.setLogRecordFactory(record_factory)

This pattern allows different libraries to chain factories together, and as long as they don’t overwrite each other’s attributes or unintentionally overwrite the attributes provided as standard, there should be no surprises. However, it should be borne in mind that each link in the chain adds run-time overhead to all logging operations, and the technique should only be used when the use of a Filter does not provide the desired result.

These changes have been added to Python 3.2 (i.e. in the py3k branch) and should be in the first beta. Feel free to try them out. Your feedback is, as always, most welcome.

Sunday 31 October 2010

logutils: Using recent logging features with older Python versions

Python’s development is a process of continuous improvement, and the standard library of course undergoes changes as new versions of Python are released. In the logging package, there have been numerous improvements in recent months:

  • Dictionary-based configuration (PEP 391)
  • QueueHandler for working with multiprocessing and in-process queues

In addition, there have been a few bits of utility code which have been posted here:

Many of the enhancements have been added as new features which will emerge in Python 3.2, but cannot be added to Python 2.X because there are no Python 2.X branches where new features can be added. Python 2.7, the last 2.X release, is in maintenance mode and will only accept bug-fixes which maintain backwards compatibility.

Although Python 3.2+ is the way forward for Python development, many users will be constrained to use 2.X for some considerable time yet, mostly because libraries they depend on have not yet been ported to 3.X, but sometimes also because their codebase is too large to consider porting to 3.X, when viewed from a cost/benefit viewpoint, or perhaps because their code needs to remain workable under 2.X and they cannot maintain two separate codebases.

For those users who are forced to stick with Python 2.X but who want to use some of the features added recently to logging, the logutils project offers one way in which they can access these features. This is available via PyPI so it can be installed using pip or easy_install. The package is currently at release 0.2 and contains the following:

  • Dictionary-based configuration (PEP 391) which is broadly compatible with Python 2.7 and 3.2.
  • The LoggerAdapter class is easier to subclass.
  • The QueueHandler and QueueListener classes are available for use with in-process, multiprocessing and other types of queue, as described in this post.
  • The TestHandler and Matcher classes are available for use in unit tests, as described in this post.
  • An enhanced HTTPHandler allows the use of secure connections and credentials for logging in to websites.

The package needs Python 2.4 or later. Some features are not available in Python 2.4, due to limitations of that Python version. For example, str.format is only available from Python 2.6 onwards;  LoggerAdapter relies on a feature added in Python 2.5, namely the extra parameter for adding additional context to logs; and named handlers are not available for Python 2.X versions < 2.7 and Python 3.X versions < 3.2. (As a consequence of this, incremental handler configuration is not available on the versions where named handlers are not available.)

Project downloads are here. You can post issues here (needs a Google account). Documentation is available here.

Thursday 28 October 2010

Supporting alternative formatting styles in logging

When logging was added to the Python standard library, the only way of formatting messages with variable content was to use the %-formatting method. Since then, Python has gained two new formatting approaches: string.Template (added in Python 2.4) and str.format (added in Python 2.6).

Now, Python’s logging package will provide improved support for these two additional formatting styles. The Formatter class in logging has been enhanced for Python 3.2 to take an additional, optional keyword parameter named style. This defaults to '%', but other possible values are '{' and '$', which correspond to the other two formatting styles. Backwards compatibility is maintained by default (as you would expect), but by explicitly specifying a style parameter , you get the ability to specify format strings which work with str.format or string.Template. Here’s an example console session to show the possibilities:

vinay@eta-jaunty:~$ python3.2
Python 3.2a3+ (py3k:85857, Oct 27 2010, 18:25:50)
[GCC 4.3.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.

>>> import logging
>>> root = logging.getLogger()
>>> root.setLevel(logging.DEBUG)
>>> handler = logging.StreamHandler()
>>> bf = logging.Formatter('{asctime} {name} {levelname:8s} {message}', style='{')
>>> handler.setFormatter(bf)
>>> root.addHandler(handler)
>>> logger = logging.getLogger('foo.bar')
>>> logger.debug('This is a DEBUG message')
2010-10-28 15:11:55,341 foo.bar DEBUG This is a DEBUG message
>>> logger.critical('This is a CRITICAL message')
2010-10-28 15:12:11,526 foo.bar CRITICAL This is a CRITICAL message
>>> df = logging.Formatter('$asctime $name ${levelname} $message', style='$')
>>> handler.setFormatter(df)
>>> logger.debug('This is a DEBUG message')
2010-10-28 15:13:06,924 foo.bar DEBUG This is a DEBUG message
>>> logger.critical('This is a CRITICAL message')
2010-10-28 15:13:11,494 foo.bar CRITICAL This is a CRITICAL message
>>>

Note that the formatting of logging messages for final output to logs is completely independent of how an individual logging message is constructed. That can still use %-formatting, as shown here:

>>> logger.error('This is an%s %s %s', 'other,', 'ERROR,', 'message')
2010-10-28 15:19:29,833 foo.bar ERROR This is another, ERROR, message
>>>

Logging calls (logger.debug(), logger.info() etc.) only take positional parameters for the actual logging message itself, with keyword parameters used only for determining options for how to handle the actual logging call (e.g. the exc_info keyword parameter to indicate that traceback information should be logged, or the extra keyword parameter to indicate additional contextual information to be added to the log). So you cannot directly make logging calls using str.format or string.Template syntax, because internally the logging package uses %-formatting to merge the format string and the variable arguments. There would no changing this while preserving backward compatibility, since all logging calls which are out there in existing code will be using %-format strings.

There is, however, a way that you can use {}- and $- formatting to construct your individual log messages. Recall that for a message you can use an arbitrary object as a message format string, and that the logging package will call str() on that object to get the actual format string. Consider the following two classes:

class BraceMessage(object):
def __init__(self, fmt, *args, **kwargs):
self.fmt = fmt
self.args = args
self.kwargs = kwargs

def __str__(self):
return self.fmt.format(*self.args, **self.kwargs)

class DollarMessage(object):
def __init__(self, fmt, **kwargs):
self.fmt = fmt
self.kwargs = kwargs

def __str__(self):
from string import Template
return Template(self.fmt).substitute(**self.kwargs)

Either of these can be used in place of a format string, to allow {}- or $-formatting to be used to build the actual “message” part which appears in the formatted log output in place of “%(message)s” or “{message}” or “$message”. It’s a little unwieldy to use the class names whenever you want to log something, but it’s quite palatable if you use an alias such as __ (double underscore – not to be confused with _, the single underscore used as a synonym/alias for gettext.gettext or its brethren).

The above classes, and recent additions to the logging package, will be published in a package called logutils, intended for use with Python 2.4 or later. This is already available for download via PyPI (and will be the subject of a future blog post). They can be used as follows:

>>> from logutils import BraceMessage as __

>>> print(__('Message with {0} {1}', 2, 'placeholders'))
Message with 2 placeholders
>>> class Point: pass
...
>>> p = Point()
>>> p.x = 0.5
>>> p.y = 0.5
>>> print(__('Message with coordinates: ({point.x:.2f}, {point.y:.2f})', point=p))
Message with coordinates: (0.50, 0.50)
>>> from logutils import DollarMessage as __
>>> print(__('Message with $num $what', num=2, what='placeholders'))
Message with 2 placeholders
>>>

One thing to note is that you pay no significant performance penalty with this approach: the actual formatting happens not when you make the logging call, but when (and if) the logged message is actually about to be output to a log by a handler. So the only slightly unusual thing which might trip you up is that the parentheses go around the format string and the arguments, not just the format string. That’s because the __ notation is just syntax sugar for a constructor call to one of the XXXMessage classes.

Your comments are, as always, welcome. Thanks for reading.

Tuesday 26 October 2010

Shared state in logging: why?

It’s generally recognised that undisciplined use of global variables is a warning sign about problems with a design. In fact, the paper “Global Variables Considered Harmful” by Wulf and Shaw dates back to 1973. The word “global” and the phrase “global state” have therefore acquired bad connotations in many people’s minds, and that can lead to situations where you “throw the baby out with the bath-water”.

The concept of state shared between multiple software components doesn’t have to be treated in the same way as naïve usage of global variables. There are legitimate situations where state needs to be shared – you will almost certainly be using a computer with a display, a mouse and a keyboard, all of which are essentially shared state (meaning data, and operations on that data, too) for the windowing system your computer is using.

Python’s logging package uses some shared state; Logger objects are shared by all Python code in a single process, in that a call to logging.getLogger('foo.bar') will always return the same object, no matter where it’s called from. The fact that Loggers are singletons is for a specific, considered reason, and not just because I was taking a design short-cut or didn’t know how to do it any other way.

There are alternative logging libraries which make a point of the fact that their loggers are constructed anew each time, so that a call

logger = Logger('foo.bar')

will return a new object every time. These systems perhaps use loggers just as a vehicle for delivering events via their debug(), info() etc. methods; for stdlib logging, Loggers are something more. They play a part in how an application developer, system administrator or support team member controls the logging verbosity of an application. This is done through configuration, which needs to consider the verbosity of all components used in an application - including third-party components whose source you are not free to modify.

Logging verbosity is primarily controlled by configuring level, propagation behaviour and filters for a logger. You can, of course, also control it via configuring levels and filters on handlers; but the best way is to configure loggers where possible, because that avoids the overhead of dispatching events to handlers only to have them be thrown away due to the handler’s configuration.

The logging package relies on loggers being shared across modules, in order to offer this configurability. If my application is using a third-party library which logs some events to a logger named foo.bar, and if I want to configure the verbosity of this logger - by setting its level, say – I can only do this if I can be sure that

logger = logging.getLogger('foo.bar')

will return the same object to the library code, which logs to logger foo.bar, and to my configuration code, which sets that logger’s level to what I want. If the two calls to get that named logger were to return two entirely different objects, then the setLevel() call made by my configuration code would not affect the object used internally by the library. This means that I would essentially lose some level of control over logging verbosity.

When developers are essentially logging just for themselves – i.e. they are the primary, perhaps only, audience for their logging output – then these considerations may not matter too much. If they are not using third-party libraries which use logging, or using libraries whose logging verbosity they don’t care about, that may be another reason for not worrying about this loss of configurability. Since Python logging is intended for the whole Python developer community, some of whom will definitely need and want this level of configurability, the design supports it, using shared state. I know I use this feature myself regularly – say, when examining the SQL output from ORMs such as are provided by Django and SQLAlchemy. I don’t generally log the SQL output (which can be quite voluminous), but might do so as a result of coming across some problem, whether functional or performance-related. When the problem is identified, I can turn the logging of SQL statements off again. In larger application scenarios, this functionality is also available to those who have no means of changing the source code - application support teams or system administrators – assuming that the application developers have provided for this via the mechanisms offered by the logging package.

Another important (but related) design feature of Python logging is the use of hierarchical logger names. In stdlib logging, logger names are not just things which turn up in logging output. When you create a logger named foo.bar, you also conceptually create a logger named foo. This latter logger is not actually created unless and until a getLogger('foo') call is made, but the hierarchical namespace feature (and the way in which handler inheritance works with the logger hierarchy) allows you to turn logging verbosity up and down at multiple levels of a  package hierarchy quite easily – for example, turn on all logging in the django logging namespace, or just turn it on just for django.request, or perhaps just for django.db.backends – it puts the application developer/support team/sys admin in control, rather than just the library developer. And while the hierarchical feature might seem overkill for small projects and simple scripts, it doesn’t get in the way, and pays dividends when logging is used in larger systems.

Wednesday 20 October 2010

StreamHandler’s newline terminator now configurable

When StreamHandler writes a formatted log message to its stream, it adds a newline terminator. This behaviour is inherited by FileHandler and the other classes which derive from it (such as the rotating file handlers).

For most people, that’s what they want, since they get log messages on separate lines without having to explicitly code newlines in their log messages. However, some people want the flexibility to not have this newline automatically appended to logged messages.

Starting with Python 3.2, the message terminator will be configurable. This has been done by adding a terminator attribute to StreamHandler, which when emitting an event now writes the formatted message to its stream first, and then writes the terminator. If you don’t want newline termination for a handler, just set the handler instance’s terminator attribute to the empty string.

There’s a small chance that this approach will cause unexpected behaviour – in the unlikely instance that someone is setting a terminator attribute on an a StreamHandler or one of its subclasses for their own, unrelated purposes. Other than that, the new behaviour should be backwards compatible.

If this change is likely to adversely affect you for any reason, please let me know, by replying to this post. The change has already been checked into the py3k branch, and you are welcome to try it out.

Implementing filters using callables

It’s been pointed out that for implementing logging filters, you shouldn’t have to go to the lengths of creating a Filter class just to define its one filter method. Of course, there are cases where you will need to use Filter classes, but there are also cases where it would be beneficial to just use a function. To cater for this, changes have been made to Python 3.2. Once this enters beta, you will be able to pass a function or other callable to the  addFilter and removeFilter methods of Handler and Logger, and this will be called with the LogRecord as its single argument to decide whether the event is to be logged.

The previous functionality did not actually require you to create Filter subclasses – you could pass any instance that had a filter method with the appropriate semantics. The Filter class was an illustration of how to do filtering as well as providing a not uncommon use case – filtering out everything other than a specific sub-tree in the logger namespace.

However, with this change, you’re not forced to call your filtering method filter, so you could e.g. use different filtering methods in a single class, or different functions in a module.

The development documentation has been updated to reflect this change.

Monday 27 September 2010

Unit testing and logging

I’ve had questions in the past about using logging and unit testing together – what’s the best way of doing it?

Beyond the choice of unit testing frameworks (e.g. unittest, nose, py.test), unit testing appears to be one of those things (like configuration) where people have strong, and differing, opinions about how they like to do it. This is one reason there is no specific support for unit testing in logging. Besides, logging is one area where functionally, your application should work in exactly the same way no matter how logging is configured, and even if logging is completely switched off (because there are no handlers configured, or the verbosity thresholds are set to be above CRITICAL). I am of course distinguishing here between application logging and request logging as practiced by web servers – the latter is part of the function of the web server, whereas application logging is meant as a operational and diagnostic aid for developers, system administrators and support teams, and orthogonal to the application’s functionality. There are of course scenarios where logging is used to output messages to end users (typically, INFO or WARNING level messages), but it should be borne in mind that the application should keep going even if logging verbosity is turned off.

There are two things that would be useful when using logging and unit testing together:

  • Some way of capturing all the messages which are logged by the code being unit tested.
  • Some way of verifying that certain expectations with respect to logged messages are met.

Logging already provides a BufferingHandler class which allows you to capture LogRecords generated by logging activity. You can, for example, subclass this to store the LogRecord.__dict__ values rather than the LogRecords themselves – this will facilitate checking whether expectations are met. You typically don’t want to flush anything until the end of the test, though, so a handler which facilitates testing might look like this:

from logging.handlers import BufferingHandler

class TestHandler(BufferingHandler):
    def __init__(self):
        # BufferingHandler takes a "capacity" argument
        # so as to know when to flush. As we're overriding
        # shouldFlush anyway, we can set a capacity of zero.
        # You can call flush() manually to clear out the
        # buffer.
        BufferingHandler.__init__(self, 0)

    def shouldFlush(self):
        return False

    def emit(self, record):
        self.buffer.append(record.__dict__)

Now let’s consider the checking of expectations about what’s been logged. We could implement this in the TestHandler class directly, but it’s the sort of area where different people may want to do different things. The bare minimum we need in TestHandler would be something that looks for some kind of match between what’s been logged (the buffer of dictionaries) and the expected values. So as an idea, let’s delegate the details of matching to a separate Matcher class, which must have a matches method. Because matching of dictionaries is likely to crop up in tests other than to do with logging, creating a separate Matcher class allows us to deploy the functionality in other scenarios. So, we can develop the TestHandler class a little:

from logging.handlers import BufferingHandler

class TestHandler(BufferingHandler):
    def __init__(self, matcher):
        # BufferingHandler takes a "capacity" argument
        # so as to know when to flush. As we're overriding
        # shouldFlush anyway, we can set a capacity of zero.
        # You can call flush() manually to clear out the
        # buffer.
        BufferingHandler.__init__(self, 0)
        self.matcher = matcher

    def shouldFlush(self):
        return False

    def emit(self, record):
        self.buffer.append(record.__dict__)

    def matches(self, **kwargs):
        """
        Look for a saved dict whose keys/values match the supplied arguments.
        """
        for d in self.buffer:
            if self.matcher.matches(d, **kwargs):
                result = True
                break
        return result

The signature for the matches method allows us to pass just the keys we want to test for in the call. In implementing the Matcher class, we could do whatever we wanted – regular expression matching, for example – but we’ll keep it simple. Let’s assume that we want to either match values exactly, or else do partial matches for string values such as messages. (In order to ensure that a formatted message appears in a LogRecord’s __dict__, we need to call self.format(record) in the emit method.)

To decide which keys to do partial matches on, we can store a set of the relevant keys in a _partial_matches attribute. This is defined in the Matcher class, but you can replace it in a subclass or even an instance of Matcher if you need to. Here’s the Matcher class:

class Matcher(object):

    _partial_matches = ('msg', 'message')

    def matches(self, d, **kwargs):
        """
        Try to match a single dict with the supplied arguments.

        Keys whose values are strings and which are in self._partial_matches
        will be checked for partial (i.e. substring) matches. You can extend
        this scheme to (for example) do regular expression matching, etc.
        """
        result = True
        for k in kwargs:
            v = kwargs[k]
            dv = d.get(k)
            if not self.match_value(k, dv, v):
                result = False
                break
        return result

    def match_value(self, k, dv, v):
        """
        Try to match a single stored value (dv) with a supplied value (v).
        """
        if type(v) != type(dv):
            result = False
        elif type(dv) is not str or k not in self._partial_matches:
            result = (v == dv)
        else:
            result = dv.find(v) >= 0
        return result

Notice that the class is independent of logging and might prove useful elsewhere. Also note the use of str in the match_value method – you may want to replace this with basestring for Python 2.x. (Of course, I could have used isinstance, too.)

Now we can consider what a typical test case using TestHandler and Matcher might look like (some imports omitted):

import unittest

class LoggingTest(unittest.TestCase):
    def setUp(self):
        self.handler = h = TestHandler(Matcher())
        self.logger = l = logging.getLogger()
        l.addHandler(h)
        
    def tearDown(self):
        self.logger.removeHandler(self.handler)
        self.handler.close()
        
    def test_simple(self):
        "Simple test of logging test harness."
        # Just as a demo, let's log some messages.
        # Only one should show up in the log.
        self.logger.debug("This won't show up.")
        self.logger.info("Neither will this.")
        self.logger.warning("But this will.")
        h = self.handler
        self.assertTrue(h.matches(levelno=logging.WARNING))
        self.assertFalse(h.matches(levelno=logging.DEBUG))
        self.assertFalse(h.matches(levelno=logging.INFO))

    def test_partial(self):
        "Test of partial matching in logging test harness."
        # Just as a demo, let's log some messages.
        # Only one should show up in the log.
        self.logger.debug("This won't show up.")
        self.logger.info("Neither will this.")
        self.logger.warning("But this will.")
        h = self.handler
        self.assertTrue(h.matches(msg="ut th")) # from "But this will"
        self.assertTrue(h.matches(message="ut th")) # from "But this will"
        self.assertFalse(h.matches(message="either"))
        self.assertFalse(h.matches(message="won't"))

    def test_multiple(self):
        "Test of matching multiple values in logging test harness."
        # Just as a demo, let's log some messages.
        # Only one should show up in the log.
        self.logger.debug("This won't show up.")
        self.logger.info("Neither will this.")
        self.logger.warning("But this will.")
        self.logger.error("And so will this.")
        h = self.handler
        self.assertTrue(h.matches(levelno=logging.WARNING,
                                  funcName='test_multiple'))
        self.assertTrue(h.matches(levelno=logging.ERROR,
                                  funcName='test_multiple'))
        self.assertFalse(h.matches(levelno=logging.INFO))

if __name__ == '__main__':
    unittest.main()

The whole script is here, and if you run it, you should have no errors:

vinay@eta-jaunty:~/projects/scratch$ python2.7 testhdlr.py
...
----------------------------------------------------------------------
Ran 3 tests in 0.001s

OK
vinay@eta-jaunty:~/projects/scratch$ python3.2 testhdlr.py
...
----------------------------------------------------------------------
Ran 3 tests in 0.001s

OK

Is there anything I’ve missed? Anything you don’t like about these suggestions? Please feel free to comment. Thanks for reading.