Wednesday, July 08, 2009

Read Anything on Kindle

After a year of use, I must say that my Kindle [reviewed here] has turned out to be even more useful than I'd expected. I used to read a lot on my (backlit, eye-straining) laptop screen, but I've now found ways to shift pretty much all my heavy reading on to the Kindle. Here are some of the most useful (non-obvious) tricks and tips I've found for extracting online content:

(1) RSS feeds: Use kindlefeeder to read your favourite blogs or other content with a full RSS feed (e.g. NDPR reviews). Highly recommended.

(2) Partial feeds: Alas, not all content providers are so considerate as to provide users with the convenience of a full RSS feed. If they provide a partial RSS feed (e.g. Philosopher’s Digest, and most newspapers), you can use Calibre to automatically track the feed and extract the full text from the website, though setting this up requires some tinkering at first.

Update: You may be able to convert partial to full feeds using wizardrss.

(I should note that Calibre also converts text-based [i.e. non-scanned] PDFs to kindle format, which is a convenient alternative to emailing papers to your Amazon account for conversion.)

(3) Single pages: Other times, you come across interesting stand-alone articles, e.g. in newspapers, blogs, magazines, or the SEP. In such cases, you can use the Instapaper bookmarklet to instantly save the page. Instapaper then automatically extracts the text from your saved pages, and delivers them to your Kindle on demand. Very useful!

(4) Multiple pages: Sometimes online books are rendered in html, but you probably don't want to save each page one at a time. Fortunately, it's easy to automate the process. For example:

(i) Use a website mirroring tool to download all the pages, and if you have (or create) a "table of contents" page that links to each other html file in the correct order, then Calibre can easily compile this into an e-book. (I did something like this to get Sidgwick's Methods of Ethics on to my Kindle last year.)

(ii) Alternatively, if the URLs are suitably systematic, you can use bash scripts to run a loop that downloads the text directly from each page in turn. For example, the following code extracts the text from each of "http://WEBSITE/p1.html through to p250.html, into a plain text file "book.txt":
for (( c=1; c<=250; c++ )) do lynx -dump -nolist -width=800 http://WEBSITE/p$c.html >> book.txt
done

(You might then need to do a quick global 'search and replace' to cut out any extraneous header/footer text from each page, before transferring the file to your Kindle. Or you can use advanced text conversion tools in Calibre to add a hyperlinked table of contents! Essential for longer books.)

(5) Scanned PDFs: see the instructions in my old post, JSTOR to Amazon Kindle.

1 comment:

  1. Ok, I'm sold! If only to give my eyes a break for all the reading you gotta get through in postgrad work.

    ReplyDelete

Visitors: check my comments policy first.
Non-Blogger users: If the comment form isn't working for you, email me your comment and I can post it on your behalf. (If your comment is too long, first try breaking it into two parts.)

Note: only a member of this blog may post a comment.