Hacker News RSS

Last Updated: 7/8/2009

Andrew Trusty has created a parametized version of my script. You can use it to submit your own RSS feed and have it processed. My feed url is redirecting to Andrew's version as it is being improved and I only have so much bandwidth to give.



AppEngine quota's bit Andrew. I've relaunched his codebase on a separate AppEngine app located here

Original Post

I have created a modified version of the Hacker News (HN) RSS feed which embeds the content of the linked article into the content feed. Instead of showing just a link to the article and discussion area on HN, the modified feed extracts the content of the article and displays that as well. This allows me to browse the article content from Google Reader instead of opening each article in its own tab. The technique for extracting the content is borrowed from Readability which seems to work well for most pages.

The feed is available here. It is generated upon request, with caching to speed it up and errors are reported in the server logs. refreshed every 15 minutes by a cronjob which sends me an email reporting all articles expanded and any parsing errors . The parsing is done by a bit of Python code which is available here. which is too ugly to show at this time. The HTML parsing is handled by Beautiful Soup which can handle some types of malformed HTML. Non-absolute links are fixed so that images appear properly and links are not relative to the RSS url.

BEFORE modification