Plumber Jack

Thursday, 9 December 2010

Proposed changes to logging defaults

Following recent discussion on python-dev, some changes to logging’s default behaviour are likely to be implemented soon, perhaps as soon as in 3.2b2 (3.2b1 was released just a few days ago). By default behaviour is meant behaviour in the absence of any explicit logging configuration.

The current behaviour, which dates from logging’s first appearance in Python, is as follows. When a logging event needs to be output, the logging package looks for handlers which have been configured by the application. If no such handlers are found, the package prints a message “No handlers could be found for logger XXX”, naming the logger that was used to log the event. (The message is only ever printed once, even if subsequent logging calls are made. Furthermore, the message is only printed when logging.raiseExceptions is True, which is the “development” mode of operation for logging. When logging.raiseExceptions is False, the “production” mode of operation for logging, the message isn’t printed, even once.)

This creates some problems when an application (or simple script, or interactive session – I’ll use script to mean any of these) meets the following criteria:

It uses a library which does logging under the covers.
It doesn’t configure any handlers.
The library code invoked logs events of severity WARNING or above.

Because the default verbosity for logging is WARNING, the logging package looks for handlers, and, not finding any, prints the one-off “No handlers could be found …” message. This may be a surprise to the author of the script and also for its users; it can generate support calls to the author from the users. The message was provided as a means of warning developers, who would have been new to logging, about a possible misconfiguration. (Of course, when logging was introduced, everyone was new to it.) The importance of the message is less now than it was before, because many people are familiar with how logging works and provide good support on comp.lang.python to people who are newcomers to logging, or who run into issues that are simple to resolve.

To deal with the problem described above, the approach suggested has been for library developers to add a NullHandler instance to their top-level logger. This prevents messages from library code from causing the misconfiguration message to be printed, because a handler — the NullHandler — is always found when handling events from the library. If a script developer adds logging events in their own code which need to be handled, then the misconfiguration message will still appear, and that’s as it should be.

When the logging package was introduced, this was done with a philosophy that logging is an adjunct to normal application activity — meaning that a script should work in exactly the same way with or without the presence of logging. So logging tries to work as unobtrusively as possible, and is set up so that in production use, exceptions that occur in logging (e.g. mismatched format string and parameters) will not be raised, allowing a script to carry on uninterrupted. Also pertinent is the Unix philosophy that verbosity should be kept to a minimum.

The above describes the current state of play, but things are now set to change. The rationale for changing is:

The present behaviour goes against the Zen of Python, which says “Errors should never pass silently. Unless explicitly silenced.”
Python core developers want to be able to use the logging package not just as an adjunct to doing real work in the standard library, but to actually do the real work when it comes to the output of warning and error messages.
In some cases exceptions can’t be usefully raised, and writing out a message is the only way to provide information about an error condition which has been detected. Providing it via logging provides more flexibility in dealing with it than would a bare write to sys.stderr.

These changes will, in due course, bring benefits such as being able to redirect standard library warning and error messages, which are currently written to sys.stderr, to some destination of a developer’s choosing. However, it may cause some inconvenience to existing users, because the proposed behaviour change is not backwards-compatible.

How will the change be implemented? This is still being thought about, but the current front runner is to implement through the idea of a “handler of last resort”. When no handler is found when processing an event which needs to be output, instead of printing the misconfiguration message, the system will use the “last resort” handler to handle the event. The “last resort” handler will be (by default) a StreamHandler using sys.stderr as its stream, with its level set to WARNING. Details about how to preserve existing behaviour are still being worked on; one thought is that you can get existing behaviour by setting the last resort handler to None.

What’s the impact on users of logging?

If you’re using a version of Python earlier than 3.2, there’s no impact. Things will stay as they are.
Script developers: If you configure logging (i.e. add handlers), there should be no impact – the behaviour you see shouldn’t change, as long as there is no situation where a handler isn’t found. If you don’t configure logging, then in place of a misconfiguration message, you’ll see events of severity WARNING and above printed to sys.stderr without any specific formatting (i.e. as if the format string were "%(message)s").
Library developers: If the script using your library configured logging, you should see no change in behaviour. Otherwise: if you added a NullHandler to your library’s top-level logger, you should see no change in behaviour, but remember — your warning and error messages will not be shown to end-users. If you reverse the NullHandler addition or didn’t add it in the first place, then instead of a misconfiguration message, end-users will see your warning and error messages on sys.stderr.
End users: If the script being run configured logging, you should see no change in behaviour. If it didn’t, then in place of a misconfiguration message, you’ll see events of severity WARNING and above printed to sys.stderr without any specific formatting.

I hope I’ve covered everything, but should any kind reader spot some error or omission, please let me know.

Please feel free to comment giving your views on the proposed change, whether you approve of it or whether you think it affects you adversely. Thanks for reading.

Friday, 3 December 2010

Getting more control over LogRecord creation

Every logging event is represented by a LogRecord instance. When an event is logged and not filtered out by a logger’s level, a LogRecord is created, populated with information about the event and then passed to the handlers for that logger (and its ancestors, up to and including the logger where further propagation up the hierarchy is disabled). Before Python 3.2, there were only two places where this creation was done:

Logger.makeRecord, which is called in the normal process of logging an event. This invoked LogRecord directly to create an instance.
logging.makeLogRecord, which is called with a dictionary containing attributes to be added to the LogRecord. This is typically invoked when a suitable dictionary has been received over the network (e.g. in pickle form via a SocketHandler, or in JSON form via an HTTPHandler).

This has usually meant that if you need to do anything special with a LogRecord, you’ve had to do one of the following.

Create your own Logger subclass, which overrides makeRecord(), and set it using logging.setLoggerClass before any loggers that you care about are instantiated.
Add a Filter to a logger or handler, which does the necessary special manipulation you need when its filter() method is called.

The first approach would be a little unwieldy in the scenario where (say) several different libraries wanted to do different things. Each would attempt to set its own Logger subclass, and the one which did this last would win. The second approach works reasonably well for many cases, but does not allow you to e.g. use a specialized subclass of LogRecord. Library developers can set a suitable filter on their loggers, but they would have to remember to do this every time they introduced a new logger (which they would do simply by adding new packages or modules and doing logger = logging.getLogger(__name__) at module level). It’s probably one too many things to think about. They could also add the filter to a NullHandler attached to their top-level logger, but this would not be invoked if an application developer attached a handler to a lower-level library logger – so output from that handler would not reflect the intentions of the library developer.

In Python 3.2, LogRecord creation will be done through a factory, which you can specify. The factory is just a callable you can set with logging.setLogRecordFactory(factory), and interrogate with logging.getLogRecordFactory(). The factory is invoked with the same signature as the LogRecord constructor, as LogRecord is the default setting for the factory:

factory(name, lvl, pathname, lineno, msg, args, exc_info, func=None, sinfo=None, **kwargs)

The parameters are as follows:

name – the name of the logger used.
lvl – the numeric value of the event’s level.
pathname, lineno – the location in the source code where the logging call appears.
msg, args – the logging message and arguments.
exc_info – any exception info, or None.
func – the name of the function or method from which the logging call was made.
sinfo – a traceback such as is produced by traceback.print_stack. (This is not the same as the exception traceback.)
kwargs – currently empty, but this could change in future versions of logging.

This approach allows a custom factory to control all aspects of LogRecord creation. For example, you could return a subclass, or just add some additional attributes to the record once created, using a pattern similar to this:

old_factory = logging.getLogRecordFactory()

def record_factory(*args, **kwargs):
    record = old_factory(*args, **kwargs)
    record.custom_attribute = 0xdecafbad 
    return record 
 
logging.setLogRecordFactory(record_factory)

This pattern allows different libraries to chain factories together, and as long as they don’t overwrite each other’s attributes or unintentionally overwrite the attributes provided as standard, there should be no surprises. However, it should be borne in mind that each link in the chain adds run-time overhead to all logging operations, and the technique should only be used when the use of a Filter does not provide the desired result.

These changes have been added to Python 3.2 (i.e. in the py3k branch) and should be in the first beta. Feel free to try them out. Your feedback is, as always, most welcome.

Sunday, 31 October 2010

logutils: Using recent logging features with older Python versions

Python’s development is a process of continuous improvement, and the standard library of course undergoes changes as new versions of Python are released. In the logging package, there have been numerous improvements in recent months:

Dictionary-based configuration (PEP 391)
QueueHandler for working with multiprocessing and in-process queues

In addition, there have been a few bits of utility code which have been posted here:

Classes which facilitate use of logging in particular scenarios, such as unit testing
Classes which facilitate building logging messages using str.format or string.Template

Many of the enhancements have been added as new features which will emerge in Python 3.2, but cannot be added to Python 2.X because there are no Python 2.X branches where new features can be added. Python 2.7, the last 2.X release, is in maintenance mode and will only accept bug-fixes which maintain backwards compatibility.

Although Python 3.2+ is the way forward for Python development, many users will be constrained to use 2.X for some considerable time yet, mostly because libraries they depend on have not yet been ported to 3.X, but sometimes also because their codebase is too large to consider porting to 3.X, when viewed from a cost/benefit viewpoint, or perhaps because their code needs to remain workable under 2.X and they cannot maintain two separate codebases.

For those users who are forced to stick with Python 2.X but who want to use some of the features added recently to logging, the logutils project offers one way in which they can access these features. This is available via PyPI so it can be installed using pip or easy_install. The package is currently at release 0.2 and contains the following:

Dictionary-based configuration (PEP 391) which is broadly compatible with Python 2.7 and 3.2.
The LoggerAdapter class is easier to subclass.
The QueueHandler and QueueListener classes are available for use with in-process, multiprocessing and other types of queue, as described in this post.
The TestHandler and Matcher classes are available for use in unit tests, as described in this post.
An enhanced HTTPHandler allows the use of secure connections and credentials for logging in to websites.

The package needs Python 2.4 or later. Some features are not available in Python 2.4, due to limitations of that Python version. For example, str.format is only available from Python 2.6 onwards; LoggerAdapter relies on a feature added in Python 2.5, namely the extra parameter for adding additional context to logs; and named handlers are not available for Python 2.X versions < 2.7 and Python 3.X versions < 3.2. (As a consequence of this, incremental handler configuration is not available on the versions where named handlers are not available.)

Project downloads are here. You can post issues here (needs a Google account). Documentation is available here.

Thursday, 28 October 2010

Supporting alternative formatting styles in logging

When logging was added to the Python standard library, the only way of formatting messages with variable content was to use the %-formatting method. Since then, Python has gained two new formatting approaches: string.Template (added in Python 2.4) and str.format (added in Python 2.6).

Now, Python’s logging package will provide improved support for these two additional formatting styles. The Formatter class in logging has been enhanced for Python 3.2 to take an additional, optional keyword parameter named style. This defaults to '%', but other possible values are '{' and '$', which correspond to the other two formatting styles. Backwards compatibility is maintained by default (as you would expect), but by explicitly specifying a style parameter , you get the ability to specify format strings which work with str.format or string.Template. Here’s an example console session to show the possibilities:

vinay@eta-jaunty:~$ python3.2
Python 3.2a3+ (py3k:85857, Oct 27 2010, 18:25:50) 
[GCC 4.3.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.

>>> import logging
>>> root = logging.getLogger()
>>> root.setLevel(logging.DEBUG)
>>> handler = logging.StreamHandler()
>>> bf = logging.Formatter('{asctime} {name} {levelname:8s} {message}', style='{')
>>> handler.setFormatter(bf)
>>> root.addHandler(handler)
>>> logger = logging.getLogger('foo.bar')
>>> logger.debug('This is a DEBUG message')
2010-10-28 15:11:55,341 foo.bar DEBUG    This is a DEBUG message
>>> logger.critical('This is a CRITICAL message')
2010-10-28 15:12:11,526 foo.bar CRITICAL This is a CRITICAL message
>>> df = logging.Formatter('$asctime $name ${levelname} $message', style='$')
>>> handler.setFormatter(df)
>>> logger.debug('This is a DEBUG message')
2010-10-28 15:13:06,924 foo.bar DEBUG This is a DEBUG message
>>> logger.critical('This is a CRITICAL message')
2010-10-28 15:13:11,494 foo.bar CRITICAL This is a CRITICAL message
>>>

Note that the formatting of logging messages for final output to logs is completely independent of how an individual logging message is constructed. That can still use %-formatting, as shown here:

>>> logger.error('This is an%s %s %s', 'other,', 'ERROR,', 'message')
2010-10-28 15:19:29,833 foo.bar ERROR This is another, ERROR, message
>>>

Logging calls (logger.debug(), logger.info() etc.) only take positional parameters for the actual logging message itself, with keyword parameters used only for determining options for how to handle the actual logging call (e.g. the exc_info keyword parameter to indicate that traceback information should be logged, or the extra keyword parameter to indicate additional contextual information to be added to the log). So you cannot directly make logging calls using str.format or string.Template syntax, because internally the logging package uses %-formatting to merge the format string and the variable arguments. There would no changing this while preserving backward compatibility, since all logging calls which are out there in existing code will be using %-format strings.

There is, however, a way that you can use {}- and $- formatting to construct your individual log messages. Recall that for a message you can use an arbitrary object as a message format string, and that the logging package will call str() on that object to get the actual format string. Consider the following two classes:

class BraceMessage(object):
    def __init__(self, fmt, *args, **kwargs):
        self.fmt = fmt
        self.args = args
        self.kwargs = kwargs

    def __str__(self):
        return self.fmt.format(*self.args, **self.kwargs)

class DollarMessage(object):
    def __init__(self, fmt, **kwargs):
        self.fmt = fmt
        self.kwargs = kwargs
        
    def __str__(self):
        from string import Template
        return Template(self.fmt).substitute(**self.kwargs)

Either of these can be used in place of a format string, to allow {}- or $-formatting to be used to build the actual “message” part which appears in the formatted log output in place of “%(message)s” or “{message}” or “$message”. It’s a little unwieldy to use the class names whenever you want to log something, but it’s quite palatable if you use an alias such as __ (double underscore – not to be confused with _, the single underscore used as a synonym/alias for gettext.gettext or its brethren).

The above classes, and recent additions to the logging package, will be published in a package called logutils, intended for use with Python 2.4 or later. This is already available for download via PyPI (and will be the subject of a future blog post). They can be used as follows:

>>> from logutils import BraceMessage as __

>>> print(__('Message with {0} {1}', 2, 'placeholders'))
Message with 2 placeholders
>>> class Point: pass
... 
>>> p = Point()
>>> p.x = 0.5
>>> p.y = 0.5
>>> print(__('Message with coordinates: ({point.x:.2f}, {point.y:.2f})', point=p))
Message with coordinates: (0.50, 0.50)
>>> from logutils import DollarMessage as __
>>> print(__('Message with $num $what', num=2, what='placeholders'))
Message with 2 placeholders
>>>

One thing to note is that you pay no significant performance penalty with this approach: the actual formatting happens not when you make the logging call, but when (and if) the logged message is actually about to be output to a log by a handler. So the only slightly unusual thing which might trip you up is that the parentheses go around the format string and the arguments, not just the format string. That’s because the __ notation is just syntax sugar for a constructor call to one of the XXXMessage classes.

Your comments are, as always, welcome. Thanks for reading.

Tuesday, 26 October 2010

Shared state in logging: why?

It’s generally recognised that undisciplined use of global variables is a warning sign about problems with a design. In fact, the paper “Global Variables Considered Harmful” by Wulf and Shaw dates back to 1973. The word “global” and the phrase “global state” have therefore acquired bad connotations in many people’s minds, and that can lead to situations where you “throw the baby out with the bath-water”.

The concept of state shared between multiple software components doesn’t have to be treated in the same way as naïve usage of global variables. There are legitimate situations where state needs to be shared – you will almost certainly be using a computer with a display, a mouse and a keyboard, all of which are essentially shared state (meaning data, and operations on that data, too) for the windowing system your computer is using.

Python’s logging package uses some shared state; Logger objects are shared by all Python code in a single process, in that a call to logging.getLogger('foo.bar') will always return the same object, no matter where it’s called from. The fact that Loggers are singletons is for a specific, considered reason, and not just because I was taking a design short-cut or didn’t know how to do it any other way.

There are alternative logging libraries which make a point of the fact that their loggers are constructed anew each time, so that a call

logger = Logger('foo.bar')

will return a new object every time. These systems perhaps use loggers just as a vehicle for delivering events via their debug(), info() etc. methods; for stdlib logging, Loggers are something more. They play a part in how an application developer, system administrator or support team member controls the logging verbosity of an application. This is done through configuration, which needs to consider the verbosity of all components used in an application - including third-party components whose source you are not free to modify.

Logging verbosity is primarily controlled by configuring level, propagation behaviour and filters for a logger. You can, of course, also control it via configuring levels and filters on handlers; but the best way is to configure loggers where possible, because that avoids the overhead of dispatching events to handlers only to have them be thrown away due to the handler’s configuration.

The logging package relies on loggers being shared across modules, in order to offer this configurability. If my application is using a third-party library which logs some events to a logger named foo.bar, and if I want to configure the verbosity of this logger - by setting its level, say – I can only do this if I can be sure that

logger = logging.getLogger('foo.bar')

will return the same object to the library code, which logs to logger foo.bar, and to my configuration code, which sets that logger’s level to what I want. If the two calls to get that named logger were to return two entirely different objects, then the setLevel() call made by my configuration code would not affect the object used internally by the library. This means that I would essentially lose some level of control over logging verbosity.

When developers are essentially logging just for themselves – i.e. they are the primary, perhaps only, audience for their logging output – then these considerations may not matter too much. If they are not using third-party libraries which use logging, or using libraries whose logging verbosity they don’t care about, that may be another reason for not worrying about this loss of configurability. Since Python logging is intended for the whole Python developer community, some of whom will definitely need and want this level of configurability, the design supports it, using shared state. I know I use this feature myself regularly – say, when examining the SQL output from ORMs such as are provided by Django and SQLAlchemy. I don’t generally log the SQL output (which can be quite voluminous), but might do so as a result of coming across some problem, whether functional or performance-related. When the problem is identified, I can turn the logging of SQL statements off again. In larger application scenarios, this functionality is also available to those who have no means of changing the source code - application support teams or system administrators – assuming that the application developers have provided for this via the mechanisms offered by the logging package.

Another important (but related) design feature of Python logging is the use of hierarchical logger names. In stdlib logging, logger names are not just things which turn up in logging output. When you create a logger named foo.bar, you also conceptually create a logger named foo. This latter logger is not actually created unless and until a getLogger('foo') call is made, but the hierarchical namespace feature (and the way in which handler inheritance works with the logger hierarchy) allows you to turn logging verbosity up and down at multiple levels of a package hierarchy quite easily – for example, turn on all logging in the django logging namespace, or just turn it on just for django.request, or perhaps just for django.db.backends – it puts the application developer/support team/sys admin in control, rather than just the library developer. And while the hierarchical feature might seem overkill for small projects and simple scripts, it doesn’t get in the way, and pays dividends when logging is used in larger systems.

Wednesday, 20 October 2010

StreamHandler’s newline terminator now configurable

When StreamHandler writes a formatted log message to its stream, it adds a newline terminator. This behaviour is inherited by FileHandler and the other classes which derive from it (such as the rotating file handlers).

For most people, that’s what they want, since they get log messages on separate lines without having to explicitly code newlines in their log messages. However, some people want the flexibility to not have this newline automatically appended to logged messages.

Starting with Python 3.2, the message terminator will be configurable. This has been done by adding a terminator attribute to StreamHandler, which when emitting an event now writes the formatted message to its stream first, and then writes the terminator. If you don’t want newline termination for a handler, just set the handler instance’s terminator attribute to the empty string.

There’s a small chance that this approach will cause unexpected behaviour – in the unlikely instance that someone is setting a terminator attribute on an a StreamHandler or one of its subclasses for their own, unrelated purposes. Other than that, the new behaviour should be backwards compatible.

If this change is likely to adversely affect you for any reason, please let me know, by replying to this post. The change has already been checked into the py3k branch, and you are welcome to try it out.

Implementing filters using callables

It’s been pointed out that for implementing logging filters, you shouldn’t have to go to the lengths of creating a Filter class just to define its one filter method. Of course, there are cases where you will need to use Filter classes, but there are also cases where it would be beneficial to just use a function. To cater for this, changes have been made to Python 3.2. Once this enters beta, you will be able to pass a function or other callable to the addFilter and removeFilter methods of Handler and Logger, and this will be called with the LogRecord as its single argument to decide whether the event is to be logged.

The previous functionality did not actually require you to create Filter subclasses – you could pass any instance that had a filter method with the appropriate semantics. The Filter class was an illustration of how to do filtering as well as providing a not uncommon use case – filtering out everything other than a specific sub-tree in the logger namespace.

However, with this change, you’re not forced to call your filtering method filter, so you could e.g. use different filtering methods in a single class, or different functions in a module.

The development documentation has been updated to reflect this change.