الجمعة، 28 مايو 2010

Removing Duplicate email messages after maildir conversion

The Problem

After converting to maildir format on cpanel-exim servers, at times, you may find duplicate emails on certain mail accounts. We can manually delete these mails through the webmail interface if the mails are not of high numbers. But if there are duplicates for an inbox with 1000s of mails, you'll have to go for other methods.

The Solution

1) Install reformail
wget http://umn.dl.sourceforge.net/sourceforge/courier/maildrop-1.6.3.tar.bz2
tar -jxf maildrop-1.6.3.tar.bz2
cd maildrop-1.6.3
./configure
cd numlib
make
cd ..
cd liblock/
make
cd ..
cd rfc822
make
cd ..
cd maildrop
make reformail
cp reformail /usr/local/bin
chmod 755 /usr/local/bin/reformail


2) cd to the directory where you have the cur, new folders of the duplicate mail account stored.

3) Check the number of messages in cur directory.
unalias ls
ls cur | wc -l


4) Check whether they all have unique message ids.
for i in cur/*; do reformail -x Message-ID: <$i; done | wc -l

5) Check the number of mails remaining after filtering out the duplicate mails.
for i in cur/*; do reformail -x Message-ID: <$i; done | sort -u | wc -l

6) Check how many we are going to delete.
rm -f /tmp/dups
for i in cur/*; do reformail -D 2000000 /tmp/dups <$i && echo $i; done | wc -l


7) Add the result you get from Step 6 and & Step 5 and make sure that it is equal to the result you got from Step 4. If this total doesn't match you should increase the 2000000 - reformail isn't remembering enough Message-IDs to spot all the duplicates. Iterate until the sum becomes same. Note down the value that you used.

8) Delete the messages and verify whether the number of mails is correct now.
for i in cur/*; do reformail -D 2000000 /tmp/dups <$i && rm $i; done
ls cur | wc -l


Note: The 2000000 value should be used only if the result was correct on Step 7. Or else, you need to use the iterated value.

Output should be the same as the result of Step - 5

ليست هناك تعليقات:

إرسال تعليق