/ Power Tools

Comparing mailbox folders in Python

I recently moved email hosts and needed to copy over a large folder of archived messages. Since I had both my old account and new account in Mail.app, I did the naive “Copy to…” command from a mass selection of messages in the old archive folder. It actually went well, but the count of the new archive folder was off by five.

One option to get these folders in sync is to delete the messages from the new archive folder and try again, but there was no guarantee that they would all transfer if given a second chance.

Another option is to use some simple Python code to print out the subjects of the missing messages, since Mail.app can export a standard mbox file and Python has a mailbox module that can read that file format.

To start, I exported the mailboxes by right-clicking on both mailbox folders and choosing “Export Mailbox”. Then I opened the Python REPL and started noodling around.

import mailbox

# Open the mbox files (relative to your present working directory)
# Note, Mail.app actually exports a mbox folder that contains an mbox file
mbox1 = mailbox.mbox('Old-Archive.mbox/mbox')
mbox2 = mailbox.mbox('New-Archive.mbox/mbox')

# Use something like `subject` to unique the messages
# or they will all be unique.
def subjects(mbox):
	subjects = []
	for message in mbox:
		subjects.append(message['subject'])
	return subjects

old_subjects = subjects(mbox1)
new_subjects = subjects(mbox2)

# Throw away dups within a mailbox
# and remove all `new_subjects` from the `old_subjects` set
diff = list(set(old_subjects) - set(new_subjects))

# Just the unsynced subjects are left
for subject in diff:
	print subject

It could be optimized and generalized, but that could lead down a rabbit hole (Relevant XKCD). It worked well for my needs and spit out the subjects of the five missing messages.