Sunday, 29 April 2012

Introspecting the logger hierarchy

Brandon Rhodes’ post about his logging_tree project got me thinking about providing this sort of functionality in the logging package. Though introspection of the logger hierarchy has been mentioned a couple of times in newsgroup postings over the years, it has never actually been requested as a fleshed-out feature, and until Brandon’s post, it was below my radar. Now, thanks to him, I have given it some attention.

There are a number of things which need to be considered when providing some way of introspecting the logger hierarchy:

  • In many systems, the logging hierarchy doesn’t change once it’s configured. However, there are also systems where it does – logging provides the facility for on-the-fly reconfiguration without having to restart a Python process. Therefore, a general introspection facility must act as a snapshot – the state of the hierarchy at a particular time when the snapshot was requested.
  • The introspection facility must allow both built-in and user-defined Logger, Handler, Formatter, Filter instances to return useful information.
  • The data returned by the introspection facility should be, as far as practicable, immutable. Users should not expect to get direct access to individual logging elements (as there are already specific APIs for doing that).
  • The introspection facility should perhaps eschew any presentation of the hierarchy; this is likely to be very application-dependent, and it is debatable whether it is worth providing a “default” presentation. Example presentations would be a textual printout (such as logging_tree provides) or a depiction of the hierarchy in a browser-based or desktop GUI.

I’ve made an attempt at addressing the above points when implementing my own take on Brandon’s work. Although I’ve no definite plan to include this functionality in Python 3.3, I’m publishing this post in order to get feedback about the approach used, and also whether something like this belongs in the core package.

The basic API is provided by a single function, get_snapshot(), which returns a dictionary. The keys of that dictionary are logger names, with '' being the key for the root logger. The corresponding values are instances of namedtuple subclasses; they are snapshots of the loggers, handlers, formatters and filters.

Here’s how it might work. Running the code snippet

logging.getLogger('logtree.test')
logging.basicConfig(filename='logtree.log')
result = get_snapshot()
tw = textwrap.TextWrapper()
tw.subsequent_indent = ' ' * 4
for name, info in sorted(result.items()):
    print(tw.fill('%r -> %r' % (name, info)))
    print('-'*40)

would display something like

'' -> logging_RootLoggerInfo(class_='logging.RootLogger', level=30,
    propagate=1, children=(('logtree',
    logging_PlaceHolderInfo(class_='logging.PlaceHolder',
    children=(('logtree.test',
    logging_LoggerInfo(class_='logging.Logger', name='logtree.test',
    level=0, handlers=(), disabled=0, propagate=1, filters=(),
    children=())),))),), filters=(),
    handlers=(logging_FileHandlerInfo(class_='logging.FileHandler',
    encoding=None, level=0, delay=None,
    filename='C:\\temp\\logtree.log', mode='a', filters=(),
    formatter=logging_FormatterInfo(class_='logging.Formatter',
    style='%', datefmt=None,
    format='%(levelname)s:%(name)s:%(message)s')),))
----------------------------------------
'logtree' -> logging_PlaceHolderInfo(class_='logging.PlaceHolder',
    children=(('logtree.test',
    logging_LoggerInfo(class_='logging.Logger', name='logtree.test',
    level=0, handlers=(), disabled=0, propagate=1, filters=(),
    children=())),))
----------------------------------------
'logtree.test' -> logging_LoggerInfo(class_='logging.Logger',
    name='logtree.test', level=0, handlers=(), disabled=0,
    propagate=1, filters=(), children=())
----------------------------------------

Of course, this is very simplified formatting, and can be much improved on, but it gives the gist of what’s being returned. The various classes whose names end in Info are the namedtuple subclasses, and the corresponding instances represent particular logging objects. You can see that specific attributes are displayed; these can be customised quite flexibly.

Customising is done via a configuration dictionary – if you don’t supply one, a default is used. The full definition of get_snapshot() is:

def get_snapshot(config=None):
    return Snapper(config).snapshot()

If no configuration dictionary is specified, a default one – available at Snapper.DEFAULT_CONFIG – is used. You can make a copy of that, add or change elements in your copy, and then pass that in to get_snapshot. (You could also subclass the Snapper class if you really need to, but in most cases, you shouldn’t need to.)

The configuration dictionary currently has two keys - attrmap and format – whose values are dictionaries. The dictionary is used when converting each logging object to its snapshot representation. Each object, and each attribute of that object, is passed recursively through the conversion process. This process works roughly as follows:

  1. A dict instance is converted to another dict instance whose keys and values have passed through the conversion process.
  2. Instances of list and tuple are converted to a tuple instance whose elements are the converted source elements.
  3. Instances of int, float, and str are passed through unchanged. None is passed through unchanged. (If needed, this could also be applied to complex and decimal.Decimal types, but it seems unlikely these would be used in a logging configuration.)
  4. For instances, a qualified class name is computed from instance.__class__.__name__ and instance.__module__ (if present). This is used as the key to the attrmap and format configuration sub-dictionaries. The format dictionary is checked first – if the key is present, the value should be a callable which takes a single argument – the instance – and returns the representation. This is used, for example, to convert internal implementation class instances such as PercentStyle, StrFormatStyle and StringTemplateStyle (in Python 3.2 and later) to the strings '%', '{' and '$'. (The value for config['attrmap']['logging.PercentStyle'] is lambda x: '%'.) If the key is not present in format, then attrmap is checked for the key. If present, the value should be a space-separated list of attribute names to bring into the snapshot; if an attribute name starts with an underscore, you can represent this as e.g. _fmt:format, where the instance attribute in the logging object will be _fmt but the field name in the corresponding namedtuple will be format. (Field names in named tuples can’t start with an underscore.) All named tuple classes have the field name class_ included, which holds the class name of the type represented by the named tuple.

This arrangement allows user-defined handler classes, for example, to be easily accommodated by just adding corresponding entries in the configuration dictionary. A part of the default configuration dictionary is shown below:

DEFAULT_CONFIG = {
    'attrmap': {
        'TextIOWrapper': 'name mode encoding', # for Python 3.x
        'file': 'name mode encoding', # for Python 2.x
        'logging.Formatter': 'datefmt _fmt:format _style:style',
        'logging.Logger': 'disabled filters handlers level name '
                          'propagate children',
        'logging.StreamHandler': 'filters formatter level stream',
        'logging.FileHandler': 'filters formatter level '
            'baseFilename:filename mode encoding delay',
        'logging.handlers.SMTPHandler': 'filters formatter level mailhost '
            'mailport fromaddr toaddrs username password subject secure '
            'timeout',
    },
    'format': {
        'logging.PercentStyle': lambda x : '%',
        'logging.StrFormatStyle': lambda x : '{',
        'logging.StringTemplateStyle': lambda x : '$',
    },
}

Just as an exercise, I coded up an implementation of a presentation which echoes Brandon’s formatting. You can see these in this gist, which contains two files: logtree.py, which contains the introspection implementation, and lttest.py, a simple test which defines a logging configuration similar to Django’s default configuration and prints out representations using Brandon’s logging_tree and the logtree.py implementation. The differences between the representations are not especially significant, and relate to the representation of individual handlers:

SHGHFV~A

Comments are welcome on the approach used, the desirability (or otherwise) of adding the logtree.py functionality to the stdlib, or any other pertinent point.

6 comments:

  1. This comment has been removed by the author.

    ReplyDelete
  2. Nice and useful. User-defined classes might provide default configuration themselves in `atrrmap` and `format` attributes. This would allow to use custom classes without having to modify config dictionary in the simplest case. Btw, why space-separated list of attribute names instead of plain list of strings?

    ReplyDelete
    Replies
    1. Yes, the idea of making it data-driven is that it's easy to add support for custom classes. Space-separated attribute names just mirror how namedtuple does it - although it's not necessary to follow the same convention exactly, it does give people one less thing to remember.

      Delete
  3. I did not actually find the bare logger and handler objects to be that un-friendly — once I had read through the source code and saw how they worked, it was easy to inspect their attributes and see which loggers were configured with which handlers. My recommendation would be to document and support that already-existing object interface, and maybe add a single walk() function like os.walk() but that lets someone iterate over the existing loggers.

    After all, documenting and letting people access the raw attributes is not a dead-end in Python, since if you ever wanted mylog.handlers to be something fancier than a list, you could stick a list-like-object there instead; if mylog.propagate assignment someday needs to trigger an action, just make it a property.

    ReplyDelete
    Replies
    1. It's not that they're unfriendly, but they can change at any time in a multi-threaded environment. The snapshot gives an immutable copy, and makes the API's contract clearer; although in Python you can access any field, including private ones, there shouldn't be a back-door API which could be used to manipulate the internals. For example, documenting Logger.handlers might suggest that e.g. direct adding of handlers to the list is OK. Sure, you can make the handlers a custom list subclass which intelligently handles appends and removes from itself, and make propagate a property; and that might happen at some point in the future.

      If an API gives a caller an iterator, the API creator can't assume that the caller will consume the iterator immediately to completion. What if just after you handed out such an iterator and it's been partially consumed, a configuration call (perhaps on another thread) changes the entire hierarchy, invalidating the iterator? The error handling gets more complicated, for what is an uncommon use case. I realise that in the common case this won't happen, but you have to consider the less common (but not uncommon) scenarios such as multi-threaded applications.

      Delete
  4. A few use cases for logger introspection that I've encountered:

    * catching unit tests that don't clean up after themselves properly (take a snapshot of the loggers, run setUp/test/tearDown, take another snapshot, compare, complain if the snapshots differ)

    * debugging logger configuration issues (why doesn't my log messages show up? oh, that naughty disable_existing_loggers=True default for logging.config.fileConfig)

    Incidentally, is 'instance.__module__' in your post a typo? I'd have expected 'instance.__class__.__module__'.

    ReplyDelete