CollegeSpamChecker (2018-)

CollegeSpamChecker is a simple, but pretty cool project that I made to fiddle around with imaplib, and later the threading libraries in Python. It checks a folder on an email server for spam, collects all the mail, and displays a table showing who’s sending you the most email.

 

CollegeSpamChecker (CSC for the rest of this article) started development in 2018, when, after taking some standardized tests, I began to get bombarded with college spam email. I have a folder called college spam on my personal email server. Any time I get college spam, it gets thrown into this folder. At first, CSC was great, it was able to process a few hundred emails relatively quickly, and I had no complaints. Aside from some formatting errors, CSC worked great…or so I thought.

 

Of course as time went on my college spam folder grew and grew…it’s now sitting at just under 3,000 emails. Instead of taking a minute, CSC took about 5-6 minutes to do a spam analysis. This is when I thought “it’s time for some multi-threaded goodness!”. Of course multi-threaded programs are confusing for newcomers with Python – there’s threading and multiprocessing that look like they do the same thing but are vastly different, then you have to figure out if your use case is better off for one or the other, it’s a nightmare.

After a few hours of coding I did get multi-threaded functionality working for CSC, and oh boy is it fast! 30 seconds to process 3,000 emails with 20 threads. I did also encounter some lovely errors with mail servers complaining about too many concurrent connections, so I built in functionalities to catch these errors. I also baked in the option to set the amount of threads to run the analysis with.

Lastly, because Gmail (and Yahoo Mail by extension) think that authenticating via username and password is insecure (okay very cool!), for CSC to work you have to allow insecure apps. To help with the situation, CSC can automatically fill out server names for Gmail, Yahoo Mail, and Outlook.com, and gives links for users to turn on (then off) allow insecure apps.

 

And that’s CollegeSpamChecker! Of course this can be used to analyze all sorts of other mail folders, it isn’t strictly for analyzing college spam. I’ve used it on my main inbox and saw emails from services I forgot I signed up with, so there’s a benefit to that! You could use it to see which marketing campaigns are the loudest (or the quietest). There’s a lot of cool potential with mail analysis.

 


If you want to try out CollegeSpamChecker for yourself, check out the source code here:

https://gitlab.com/o355/CollegeSpamChecker