|
Maciej Janicki's website |
2020-11-08
science
workflow
Following science journals via RSSWorking in academia means having to follow the current state of the research. The “usual” way to do it is downloading PDFs from websites of conferences and journals. It is needless to say that this is terribly inefficient: it requires us to remember when something new is expected to appear, manually navigating through the website etc. Even worse, recently there is a trend to increasingly rely on social media (Twitter, ResearchGate) and bloated, proprietary, data-mining “bibliography managers” (like Mendeley) to get suggestions on what to read. In this post, I describe an efficient, distraction-free and privacy-protecting workflow for keeping up with the latest research. The general idea is to use RSS feeds to track updates of the journals in a text mode RSS reader. Furthermore, we want the link in the RSS item to send us directly to the PDF version of the paper (opening in our favourite reader), which usually requires a slight hack. Use RSS to follow updatesOnce upon a time, there was a universal way of following updates on any website: RSS feeds. The standard is dead-simple: each site exposes an XML file containing a list of recent “posts”. They typically contain a title, publication date, a short summary and a link to the full article. There are dozens of clients, both text-based and graphical, that you can use to subscribe to such feeds and gather all updates in one place. Without logging in to any accounts and giving away the information about which sources you follow and which articles you read. Here’s how I browse the journal Journal for Data Mining & Digital Humanities
(JDMDH) in my text-based client
In order to get the list of articles as an RSS feeds, all you need to do is
add the feed URL (e.g. Shorten the way from an RSS item to PDFIn an ideal world, I would get the PDF on my desktop by hitting just one keystroke in newsboat. This is sometimes possible. Where it is not, a terminal-based web browser can minimize the pain and distraction caused by having to pass through a website. In practice, most RSS feeds contain a link that leads not straight to
the PDF, but to a website from which a PDF can be downloaded. In the
easiest case, the URL straight to the PDF can be deduced from website
URL. For example, the “Journal of Data Mining & Digital Humanities”
refers to papers using an URL like In order to implement this, let’s start by setting the web browser used by
newsboat (i.e. the program called when you hit the keybinding for “open”)
to a custom script. Edit the file Then create the file Right now we have one rule for the “Journal of Data Mining & Digital
Humanities” and a default rule at the end, which opens everything else using
which:
Now pressing A little annoyance of this method is that you need to implement a separate rule for almost every source, because each of them might have a different way of changing the webpage URL to the one of the PDF - you need to figure out the rule by comparing those every time. Here’s one for arXiv: Setup w3m to directly open PDFsFor sime sites, the relation between the URLs might be unpredictable or
downloading with to either ConclusionThe above workflow gives me a standard and minimum-distraction way to follow
the newly appearing articles. Especially a look at arXiv, which has almost
daily updates, allows me to keep up with the field. All sources are accessible
with the same text-based interface, thanks to which I can better focus on
the text. As in the other solutions that I present, it allows me to flexibly
and seamlessly combine the most suitable applications for the job: | |
| Powered by Jekyll. Copyright © Maciej Janicki 2019-2020. Licensed under CC BY. | |