Tuesday, 6 July 2010

Using a custom Formatter to deal with Unicode messages

Sometimes, you want to use Unicode in messages, and different logging handlers deal with Unicode in different ways. For example, FileHandler allows you to specify an encoding, which is then used to encode Unicode messages to bytes. In Python 2.x, SMTPHandler doesn't do any encoding, which can lead to UnicodeEncodeErrors being raised when smtplib writes the message to a socket.

To avoid this, you can use a Formatter which encodes the message for you, as in the following example:

import logging, logging.handlers

class EncodingFormatter(logging.Formatter):

    def __init__(self, fmt, datefmt=None, encoding=None):
        logging.Formatter.__init__(self, fmt, datefmt)
        self.encoding = encoding

    def format(self, record):
        result = logging.Formatter.format(self, record)
        if isinstance(result, unicode):
            result = result.encode(self.encoding or 'utf-8')
        return result

def main():
    root = logging.getLogger()
    sh = logging.handlers.SMTPHandler(mailhost=('localhost', 25),
                                      fromaddr='vms@test.com',
                                      toaddrs='test@test.com',
                                      subject='Logged Event')
    root.addHandler(sh)
    sh.setFormatter(EncodingFormatter('%(message)s', encoding='iso8859-1'))
    root.error(u'accentu\u00e9')
    
if __name__ == '__main__':
    main()