Thursday 26 July 2012

A potential security vulnerability in logging.fileConfig()

Although now superseded by the dictConfig() API proposed in PEP 391, the much older fileConfig() API has been widely used to configure logging. It is also possible to do on-the-fly reconfiguration of logging through logging’s listen() API, which listens on a socket for configuration data in either the format understood by fileConfig(), or JSON format for use with dictConfig(). Although listen() only ever binds to localhost and so cannot be reached from a remote machine, it is possible in certain scenarios, if you use the listen() function, that code which you don’t control will be executed in the process which calls listen(). To see how, consider a shared hosting environment where multiple users, who have no connection with each other other than being hosted on the same server, run processes. Ordinarily these processes are isolated from each other, but if one user’s process uses listen(), another user’s process could connect to the listened-on socket and send arbitrary configuration data. Because fileConfig() calls eval() under the hood to convert from text elements in the configuration text to Python objects for use in the configuration operation, potentially arbitrary code could be sent in a logging configuration from a malicious user to a victim, perhaps resulting in undesirable consequences for the victim when that code is run by eval(). (Thanks are due to Graham Dumpleton for reporting this vulnerability.)

Why does fileConfig() use eval(), and how can the vulnerability be tackled? Although recent versions of Python provide the ast module to assist with processing Python source code, this module was not available when fileConfig() was written. The type of configuration data which needs to be evaluated can be more complex than a simple parser can cope with: for example, a configuration may refer to a handler’s class, whether it be a stdlib-included one such as handlers.WatchedFileHandler, or a user-defined or third-party handler such as mailinglogger.MailingLogger. The configuration can also specify the constructor arguments for a handler, which can also be fairly involved to parse in ad hoc fashion. For this reason, eval() was used.

Users have already been alerted to the vulnerability via an update to the documentation, but that is perhaps not enough. One way of addressing the vulnerability is to not use eval(), but some other evaluation mechanism which is more sandboxed. One suggestion which has been made is to use ast.literal_eval(), but this only copes with Python literals – it does no name lookup, and so can’t cope with expressions like handlers.WatchedFileHandler.

As an experiment, I put together a simple expression evaluator which uses the ast module to parse input text and probably provides the bare minimum evaluation of AST nodes to support common use cases:

This shows a simple evaluator together with a simple test harness which evaluates user-entered expressions in the context of the logging package:

However, limiting the scope of evaluation is not the complete answer, and perhaps not the correct answer. For example, a malicious user could still send a bogus configuration which, for example, just turns the verbosity of all loggers off, or configures a huge number of bogus loggers. This would certainly be allowed even by a limited-evaluation scheme if a user legitimately wanted to do so; but if a malicious user sends the exact same “legal” configuration, it is still a security exploit because the consequences may be undesirable to the victim.

But how is the listener to know whether or not the configuration is coming from a legitimate source (a client process controlled by the same user who is running the process which uses listen())  or a malicious user (a client process controlled by some other user)? The simplest answer would appear to be a shared secret: When listen() is called, it is passed a text passphrase, which is also known to legitimate clients. When handling a configuration request via the socket, the configuration is checked to see if it contains the passphrase. If it does, the request is processed; otherwise, it is ignored.

In the fileConfig() input data, the passphrase could be provided via a passphrase=secret entry in the [default] section. In the dictConfig() input data, the passphrase could be provided against the passphrase key in the dict which is passed to dictConfig(). The checking would be done in the request handler code before calling fileConfig() or dictConfig(). If the passphrase argument to the listen() call is None (the default, preserving the current behaviour) no passphrase checking would be done.

As always, comments are welcome. Although this vulnerability is limited to specific scenarios, it would be good to address it as soon as possible. As Python 3.3 has entered beta, and is therefore in feature freeze, it is likely that a fix will go into Python 3.4 – the vulnerability is not currently considered serious enough to warrant fixing in earlier versions.

4 comments:

  1. Please have the listen facility turned off by default. I suspect that would greatly reduce the exposure of this issue - especially to those who are unaware that the facility even exists. (I am not trying to discourage solutions that you are exploring - just feel like they should be relevant only to someone who has enabled the listen feature).

    ReplyDelete
    Replies
    1. If people aren't aware of the listen() functionality, they aren't affected by it. It's only those people who opt to use listen() that have to consider the issues discussed here.

      Delete
  2. Couldn't you have used the compiler module (available since Python 2.2) for logging (introduced in Python 2.3) to obtain an AST?

    ReplyDelete
    Replies
    1. Possibly - I didn't consider the security implications when I wrote the listen() API, which came after the fileConfig() function was already written and working. But as I say above, any functioning configuration system could be subverted in the specific scenario described, whether eval() or some other approach were used: so a shared secret seems to be the correct solution to this issue.

      Delete