Tuesday, 6 July 2010

Using a custom Formatter to deal with Unicode messages

Sometimes, you want to use Unicode in messages, and different logging handlers deal with Unicode in different ways. For example, FileHandler allows you to specify an encoding, which is then used to encode Unicode messages to bytes. In Python 2.x, SMTPHandler doesn't do any encoding, which can lead to UnicodeEncodeErrors being raised when smtplib writes the message to a socket.

To avoid this, you can use a Formatter which encodes the message for you, as in the following example:

import logging, logging.handlers

class EncodingFormatter(logging.Formatter):

    def __init__(self, fmt, datefmt=None, encoding=None):
        logging.Formatter.__init__(self, fmt, datefmt)
        self.encoding = encoding

    def format(self, record):
        result = logging.Formatter.format(self, record)
        if isinstance(result, unicode):
            result = result.encode(self.encoding or 'utf-8')
        return result

def main():
    root = logging.getLogger()
    sh = logging.handlers.SMTPHandler(mailhost=('localhost', 25),
                                      fromaddr='vms@test.com',
                                      toaddrs='test@test.com',
                                      subject='Logged Event')
    root.addHandler(sh)
    sh.setFormatter(EncodingFormatter('%(message)s', encoding='iso8859-1'))
    root.error(u'accentu\u00e9')
    
if __name__ == '__main__':
    main() 
 

2 comments:

  1. Hi,

    I think that's not enough to completely avoid the unicodeencodeerror. If you specify an encoding of ASCII instead of iso8859-1, this code fails. We could change from
    result = result.encode(self.encoding or 'utf-8')

    to

    try:
    result = result.encode(self.encoding)
    except:
    result = result.encode("UTF-8",errors="replace")

    But even there, another problem arises : SMTPHandler does not add correct headers to the smtp header (quoted-printable,base64) so the sent email looks ugly :
    accentuĂ© is seen as accentué

    I can't see any simple solution to this without changing the code of SMTPHandler

    Norbert

    ReplyDelete
  2. Well, it doesn't make sense to use an ASCII encoding where your content can't be encoded as ASCII - so it's completely correct that a UnicodeEncodeError is raised in that case. But you could use any encoding which is compatible with your content.

    Your other points relate not to SMTPHandler but the smtplib module, because SMTPHandler just calls smtplib (specifically, the sendmail method of a SMTP class instance) to do the heavy lifting.

    If you need more customised behaviour, you don't need to change the code of SMTPHandler; you can achieve a custom implementation by subclassing SMTPHandler and then implementing custom code (perhaps using the email module) to create the MIME message and send it.

    ReplyDelete