开发者

Download POP3 headers from a certain date (Python)

开发者 https://www.devze.com 2023-04-08 22:40 出处:网络
I\'m trying to write a pop3 and imap clients in python using available libs, which will download email headers (and subsequently entire email bodies) from various servers and save them in a mongodb da

I'm trying to write a pop3 and imap clients in python using available libs, which will download email headers (and subsequently entire email bodies) from various servers and save them in a mongodb database. The problem I'm facing is that this cl开发者_Go百科ient downloads emails in addition to a user's regular email client. So with the assumption that a user might or might not leave emails on the server when downloading using his mail client, I'd like to fetch the headers but only collect them from a certain date, to avoid grabbing entire mailboxes every time I fetch the headers.

As far as I can see the POP3 list call will get me all messages on the server, even those I probably already downloaded. IMAP doesn't have this problem.

How do email clients handle this situation when dealing with POP3 servers?


Outlook logs in to a POP3 server and issues the STAT, LIST and UIDL commands; then if it decides the user has no new messages it logs out. I have observed Outlook doing this when tracing network traffic between a client and my DBMail POP3 server. I have seen Outlook fail to detect new messages on a POP3 server using this method. Thunderbird behaves similarly but I have never seen it fail to detect new messages.

Issue the LIST and UIDL commands to the server after logging in. LIST gives you an index number (the message's linear position in the mailbox) and the size of each message. UIDL gives you the same index number and a computed hash value for each message.

For each user you can store the size and hash value given by LIST and UIDL. If you see the same size and hash value, assume it is the same message. When a given message no longer appears in this list, assume it has been deleted and clear it from your local memory.

For complete purity, remember the relative positions of the size/hash pairs in the message list, so that you can support the possibility that they may repeat. (My guess on Outlook's new message detection failure is that sometimes these values do repeat, at least for DBMail, but Outlook remembers them even after they are deleted, and forever considers them not new. If it were me, I would try to avoid this behavior.)

Footnote: Remember that the headers are part of the message. Do not trust anything in the header for this reason: dates, senders, even server hand-off information can be easily faked and cannot be assumed unique.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号