The corrosion of Aaron Stone

about

Convert a Postfix queue to mbox format

Today I wanted to convert the contents of my postfix ‘deferred’ queue, which was all spam stuck in my MTA, to mbox format so that I could feed it into various spam-learning systems.

First I converted my Postfix queue to human-readable format with postcat:

cd /var/spool/postfix/deferred
mkdir spam
for i in {0..9} {A..F}; do mkdir spam/$i; done
for i in `ls */*`; do postcat $i > spam/$i; done

This made a copy of each file in postcat format, but that’s only halfway there – postcat has its own output format that isn’t anything like an mbox or a maildir. Inspecting the output, and brushing up on my sed, I came up with this:

sed -n '
  /^\*\*\* ENVELOPE/,/^\*\*\* MESSAGE CONTENTS/ {
    /^message_arrival_time:/ {
      s/^message_arrival_time: \(.*\)$/\1/
      h
    }
    /:/ {
      s/^sender: \(.*\)$/\1/
      H
      g
      s/\(.*\)\n\(.*\)/From \2 \1/
      p
    }
  }
  /S/,/D/ {
    /S/ d
    s/^\*\*\* HEADER EXTRACTED.*$//
    s/^\(>*From \)/>\1/
    p
}' -

A final pass over all of the messages, and I had my mbox file to train SpamAssassin with:

for i in `ls */*`; do sh mkmbox.sh < $i >> spam.mbox; done
sa-learn --spam --mbox spam.mbox