Tuesday, 26 October 2010

Shared state in logging: why?

It’s generally recognised that undisciplined use of global variables is a warning sign about problems with a design. In fact, the paper “Global Variables Considered Harmful” by Wulf and Shaw dates back to 1973. The word “global” and the phrase “global state” have therefore acquired bad connotations in many people’s minds, and that can lead to situations where you “throw the baby out with the bath-water”.

The concept of state shared between multiple software components doesn’t have to be treated in the same way as naïve usage of global variables. There are legitimate situations where state needs to be shared – you will almost certainly be using a computer with a display, a mouse and a keyboard, all of which are essentially shared state (meaning data, and operations on that data, too) for the windowing system your computer is using.

Python’s logging package uses some shared state; Logger objects are shared by all Python code in a single process, in that a call to logging.getLogger('foo.bar') will always return the same object, no matter where it’s called from. The fact that Loggers are singletons is for a specific, considered reason, and not just because I was taking a design short-cut or didn’t know how to do it any other way.

There are alternative logging libraries which make a point of the fact that their loggers are constructed anew each time, so that a call

logger = Logger('foo.bar')

will return a new object every time. These systems perhaps use loggers just as a vehicle for delivering events via their debug(), info() etc. methods; for stdlib logging, Loggers are something more. They play a part in how an application developer, system administrator or support team member controls the logging verbosity of an application. This is done through configuration, which needs to consider the verbosity of all components used in an application - including third-party components whose source you are not free to modify.

Logging verbosity is primarily controlled by configuring level, propagation behaviour and filters for a logger. You can, of course, also control it via configuring levels and filters on handlers; but the best way is to configure loggers where possible, because that avoids the overhead of dispatching events to handlers only to have them be thrown away due to the handler’s configuration.

The logging package relies on loggers being shared across modules, in order to offer this configurability. If my application is using a third-party library which logs some events to a logger named foo.bar, and if I want to configure the verbosity of this logger - by setting its level, say – I can only do this if I can be sure that

logger = logging.getLogger('foo.bar')

will return the same object to the library code, which logs to logger foo.bar, and to my configuration code, which sets that logger’s level to what I want. If the two calls to get that named logger were to return two entirely different objects, then the setLevel() call made by my configuration code would not affect the object used internally by the library. This means that I would essentially lose some level of control over logging verbosity.

When developers are essentially logging just for themselves – i.e. they are the primary, perhaps only, audience for their logging output – then these considerations may not matter too much. If they are not using third-party libraries which use logging, or using libraries whose logging verbosity they don’t care about, that may be another reason for not worrying about this loss of configurability. Since Python logging is intended for the whole Python developer community, some of whom will definitely need and want this level of configurability, the design supports it, using shared state. I know I use this feature myself regularly – say, when examining the SQL output from ORMs such as are provided by Django and SQLAlchemy. I don’t generally log the SQL output (which can be quite voluminous), but might do so as a result of coming across some problem, whether functional or performance-related. When the problem is identified, I can turn the logging of SQL statements off again. In larger application scenarios, this functionality is also available to those who have no means of changing the source code - application support teams or system administrators – assuming that the application developers have provided for this via the mechanisms offered by the logging package.

Another important (but related) design feature of Python logging is the use of hierarchical logger names. In stdlib logging, logger names are not just things which turn up in logging output. When you create a logger named foo.bar, you also conceptually create a logger named foo. This latter logger is not actually created unless and until a getLogger('foo') call is made, but the hierarchical namespace feature (and the way in which handler inheritance works with the logger hierarchy) allows you to turn logging verbosity up and down at multiple levels of a  package hierarchy quite easily – for example, turn on all logging in the django logging namespace, or just turn it on just for django.request, or perhaps just for django.db.backends – it puts the application developer/support team/sys admin in control, rather than just the library developer. And while the hierarchical feature might seem overkill for small projects and simple scripts, it doesn’t get in the way, and pays dividends when logging is used in larger systems.

1 comment: