Processing email with Google App Engine for Java

Google App Engine (GAE) is all kinds of cool. It hosts your Java and Python apps and provides a set of APIs that give you the power, stability and scalability of Google’s infrastructure. It’s dead simple to deploy apps and it’s reasonably priced – starting at free for apps that don’t need too many resources.

But, it’s a platform, which means you write code specifically for App Engine and you have to rely on Google’s API and services for much of your functionality. Mostly this works great, but sometimes there are bugs or subtleties in how App Engine works which can cause you pain.

At Songkick I maintain an internal application for processing snippets. The idea is cribbed from Google, where I used to work. Each week Songkick employees email in a short summary of what they did that week and what they plan to do next week. It’s a really useful way of getting a quick update on everything that’s going on in the company.

The Snippets app accepts the emails, parses them and compiles them into a weekly digest which it mails out to everyone at the company. It also makes the digests available on an internal web page and has some nice reminder emails it sends out. It’s written in Java and runs on GAE.

All pretty straightforward and simple to implement. Except that email processing in GAE is quirky and poorly documented. Although GAE implements the standard JavaMail API, it’s pretty hard to find clear examples of how to make this work in the general case. So, here’s some code that shows how we do it, and some of the pitfalls to look out for. This covers the basics of parsing emails using JavaMail and the specifics of handling incoming email in GAE.

You should also read Google’s own documentation on handling GAE email. I will point out where you need to diverge from Google’s advice.

Receiving Email in Google App Engine

In order for your GAE app to handle incoming email, you must first configure your instance properly.

Start by enabling inbound emails for your app. If you don’t do this none of the rest of the code will work. Add the following lines to the appengine-web.xml file (in /war/WEB-INF/ in your GAE app folder structure)

<inbound-services>
  <service>mail</service>
</inbound-services>

Next, configure a servlet to handle incoming emails in your web.xml file (also in /war/WEB-INF/). Add lines like this:

<servlet>
  <servlet-name>mailhandler</servlet-name>
  <servlet-class>com.yourappname.MailHandlerServlet</servlet-class>
</servlet>
<servlet-mapping>
  <servlet-name>mailhandler</servlet-name>
  <url-pattern>/_ah/mail/*</url-pattern>
</servlet-mapping>
<security-constraint>
  <web-resource-collection>
    <url-pattern>/_ah/mail/*</url-pattern>
  </web-resource-collection>
  <auth-constraint>
    <role-name>admin</role-name>
  </auth-constraint>
</security-constraint>

The only change you have to make is to replace com.yourappname.MailHandlerServlet with the path to the mail handling servlet class you build. I’ll cover what this class does in the next section. You can optionally change the <servlet-name> if you want, but make sure the name is the same in both the <servlet> and <servlet-mapping> classes.

Your application is now setup to receive emails.

Two important notes

Email handling only really works when you app is deployed to GAE. You can simulate sending email when your app is running in the development server, but this does not let you test all the edge cases that come up when handling email. This is a massive pain. In practice you have to test in production.

To get email to your app, it has to be addressed to string@appid.appspotmail.com. For example, if your app is called mailinator, emails you send to any @mailinator.appspotmail.com address will be routed to MailHandlerServlet.

Pro tip: set up an email alias that redirects from a more friendly address to your @mailinator.appspotmail.com email address.

Writing the mail handling servlet

Now your app is configured to receive email, you need to handle incoming emails in a servlet. Assuming you configured <servlet-class> to be com.yourappname.MailHandlerServlet, then create a new Java class called MailHandlerServlet under com.yourappname in your app’s src folder.

Your MailHandlerServlet class extends javax.servlet.http.HttpServlet and handles Post requests sent to it. The basic class looks like:

package com.yourappname;

import java.io.IOException;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

@SuppressWarnings("serial")
public class MailHandlerServlet extends HttpServlet {
  public void doPost(HttpServletRequest req, HttpServletResponse resp)
    throws IOException {
  }
}

When an email arrives for your app, the doPost method is called and you can handle the email.

Up to now, we’ve mostly followed Google’s own documentation. Now we diverge.

Google tells you to use the standard JavaMail APIs to parse your emails. This will not work for emails sent from some clients, including hotmail and Mac OS X’s Mail.app. Ouch. Emails sent from these addresses will throw java.io.IOException: Truncated quoted printable data. For the gory details, see this Google Groups thread.

If you want to handle emails from a wide variety of clients, use the code at the bottom of that thread to handle emails. I copied that code (kindly supplied by user moca) into a MimeUtils class which you can download from Pastebin here. You’ll also need to download the Apache Commons IO package to use MimeUtils.

Once you have MimeUtils, processing emails is fairly straightforward. Here is the basic MailHandlerServlet class:

package com.yourappname;

import java.io.IOException;

import javax.mail.MessagingException;
import javax.mail.internet.MimeMessage;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

@SuppressWarnings("serial")
public class MailHandlerServlet extends HttpServlet {
  public void doPost(HttpServletRequest req, HttpServletResponse resp)
    throws IOException {
  try {
    MimeMessage message = MimeUtils.createMimeMessage(req);

    if (processMessage(message)) {
      Debug.log("Incoming email handled");
    } else {
      Debug.log("Failed to handle incoming email");
    }
  } catch (MessagingException e) {
    Debug.log("MessagingException: " + e);
    e.printStackTrace();
  }
 }
}

I use a utility class called Debug to write to the GAE logs. You could replace all calls to Debug.log() with System.err.println().

This code extracts the incoming email from the HttpServletRequest using MimeUtils, then passes it to a method called processMessage. The rest is exception handling and logging.

So, what does processMessage do? It is responsible for parsing the email. Here is a slightly modified version of the processMessage method we use. This is designed to extract the text of the email body. Email attachments are ignored. It’s fairly easy to extend the code to extract and save attachments if you need them.

I use the standard JavaMail API for email parsing, but handling multipart messages correctly is a little tricky. A multipart mail is a hierarchical data structure: each part of the multipart mail can potentially also be a multipart message, so you have to recurse through the parts until you find the one you want. This is handled in the aptly named handlePart method below:

package com.yourappname;

import java.io.IOException;
import java.text.DateFormat;
import java.text.SimpleDateFormat;
import java.util.Date;

import javax.mail.BodyPart;
import javax.mail.Message;
import javax.mail.MessagingException;
import javax.mail.Multipart;
import javax.mail.internet.MimeMessage;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

@SuppressWarnings("serial")
public class MailHandlerServlet extends HttpServlet {
  public void doPost(HttpServletRequest req, HttpServletResponse resp)
      throws IOException {
    try {
      MimeMessage message = MimeUtils.createMimeMessage(req);

      if (processMessage(message)) {
        Debug.log("Incoming email handled");
      } else {
        Debug.log("Failed to handle incoming email");
      }
    } catch (MessagingException e) {
      Debug.log("MessagingException: " + e);
      e.printStackTrace();
    }
  }

  private boolean processMessage(MimeMessage message) {
    String date = getMessageDate(message);
    String from = "unknown";

    try {
      from = message.getFrom()[0].toString();
      Object content = MimeUtils.getContent(message);

      if (message.getContentType().startsWith("text/plain")) {
        processMail(from, date, (String) content);
        return true;
      } else if (content instanceof Multipart) {
        Multipart mp = (Multipart) content;
        for (int i = 0; i < mp.getCount(); i++) {
          if (handlePart(from, date, mp.getBodyPart(i))) {
            return true;
          }
        }
        return false;
      } else {
        Debug.log("Unable to process message content - unknown content type");
      }
    } catch (IOException e) {
      Debug.log("Exception handling incoming email " + e);
    } catch (MessagingException e) {
      Debug.log("Exception handling incoming email " + e);
    } catch (Exception e) {
      Debug.log("Exception handling incoming email " + e);
    }

    return false;
  }

  private boolean handlePart(String from, String date, BodyPart part)
      throws MessagingException, IOException {
    if (part.getContentType().startsWith("text/plain")
        || part.getContentType().startsWith("text/html")) {
      processMail(from, date, (String) part.getContent());
      return true;
    } else {
      if (part.getContent() instanceof Multipart) {
        Multipart mp = (Multipart) part.getContent();
        Debug.log("Handling a multipart sub-message with " + mp.getCount() + " sub-parts");
        for (int i = 0; i < mp.getCount(); i++) {
          if (handlePart(from, date, mp.getBodyPart(i))) {
            return true;
          }
        }
        Debug.log("No text or HTML part in the multipart mime sub-message");
      }
      return false;
    }
  }

  private String getMessageDate(Message message) {
    Date when = null;
    try {
      when = message.getReceivedDate();
      if (when == null) {
        when = message.getSentDate();
      }
      if (when == null) {
        return null;
      }
    } catch (MessagingException e) {
      Debug.log("Cannot get message date: " + e);
      e.printStackTrace();
      return null;
    }

    DateFormat format = new SimpleDateFormat("EEE, d MMM yyyy HH:mm:ss");
    return format.format(when);
  }
}

Download the code from Pastebin

The one remaining method is processMail which handles the parsed email. I will leave implementing this up to you. In the case of our snippet system, we store the parsed email in the GAE datastore (using the excellent Objectify for convenience), ready for later use.

If you implement this code, you have fully working GAE email handling. MailHandlerServlet processes emails sent from a very wide variety of clients and services. It extracts the date, sender and email body from the incoming email and passes it to processMail for handling.

Pretty straightforward when you know how, but hard to get right if you are unfamiliar with the bugs in GAE and the subtleties of handling mutlipart emails via JavaMail.

2 thoughts on “Processing email with Google App Engine for Java

  1. Hello,
    I am trying to extract email attachment file name and its content. How do I extract attachment file name while receiving email to my google app engine? Based on the file name I need to perform some action to route this attach file to the appropriate location in the blob database for further processing.
    anyhelp in this regard much appreciated.
    Thanks,

  2. Great write up, Dan. If you get a chance, see if you can repost your Pastebin code; it appears as if the link no longer works. Thanks for your clarity of writing.