Back in November I was looking for project to do and noticed how hard it is to search for and read through the 10 years of speeches, statements, and press releases on Ron Paul’s congressional website so I set about using Python and BeautifulSoup to transform 10 years of MS Word formatted HTML into stripped down HTML that could be imported in to a WordPress blog. The result is Ron Paul’s Brain.
I first used wget to download all of them, then used BeautifulSoup and regular expressions to strip them down to the most basic html. After stripping them down as much as I could, I had to touch each one by hand to get the last little bits. After everything was set I made a WordPress Extended RSS (WXR) file and uploaded it to the blog at Ron Paul’s Brain. Of course just having them in a blog doesn’t help you if you’re trying to read up on what Ron Paul thinks about a certain topic, so I used a wordpress plugin called Similar Posts which finds the most commonly used words in each post and links to other posts using the same words. The result is a great website with a wealth of information that can keep you reading for hours.
I actually finished the site back in December but never posted about it but now you can enjoy it. I will continue to posts new speeches and press releases as they become available.