M. T. Homer Reid MIT Home Page
Physics Problems Research teaching My Music About Me Miscellany

Stupid bash tricks

February 11, 2010

New! Updated prfetch and prlget to accommodate new APS PROLA website format.

March 14, 2009

New! Updated prlget to accommodate new APS website format.

June 05, 2006

New! Added jcpfetch for fetching papers from the Journal Of Chemical Physics.

January 16, 2006

Here are some silly bash scripts for fetching papers from physics journals. The idea is that fetching a paper from a journal's website gets to be something of a hassle after a while. You have to go to the search site, type in the citation numbers, click through to the HTML page for the article you want, click "Download" or "Get PDF" or whatever, and then post-process and print the files from your command line. This gets tedious, particularly if you need to fetch six or eight papers all at once.

Thus these utilities. The first family of utilities, including prfetch, naturefetch, and sciencefetch, take a volume and page number on the command line and fetch the PDF file for the corresponding paper. The second utility, prlget, fetches the table of contents for an entire issue of PRL, queries you one-by-one as to which articles you want to read, then fetches all the articles you selected and aggregates them into one large PDF file.

Since both utilities are text-only console tools, they are convenient for using when you're on vacation and can't download papers directly from the journal web sites, but can ssh to your computer at work (from which you can presumably access the journal websites through your institution's paid subscriptions).

I am posting these scripts in the hope that some of these techniques will be useful to fellow shell programmers out there, and particularly in the hope that people will have suggestions for how to improve them, how to do things more efficiently, etc. If you are not a shell programmer, it is possible that they will work for you as is, but you should at least read through them to see what they are going to do before you try to run them, and I am not responsible if they make your computer explode! (Although I do think that is unlikely...) In addition to standard unix utilities (bash, grep. cut, wc), you will need vim and lynx to make these work as they are written.

The Scripts

Script Description
jcpfetch Fetch a paper from the Journal Of Chemical Physics.
prfetch Fetch one specific paper from an APS physics journal: Physical Review A-E, Physical Review Letters, Reviews Of Modern Physics.
naturefetch Fetch papers from Nature. (1/17/2006 Note: Appears not to be working (they may have changed their web interface again), but you can try it.)
sciencefetch Fetch papers from Science. (1/17/2006 Note: Appears not to be working (they may have changed their web interface again), but you can try it.)
prlget Fetch the table of contents for one edition of Physical Review Letters, query user one-by-one on which papers he wants, then fetch all requested papers and format them in a convenient format for printing.
Note: To use this script you need the following files as well.
pps Utility script for collecting several PS or PDF files into one big PS file for sending to the printer.
blank.ps Blank postscript file needed for use with pps script. Note that you will need to modify the BLANKPS variable in the pps script to point to the path in which you store this file.
Note: If you think this is a massively stupid way of doing what I am trying to do here, I completely agree with you, and would love to do something more intelligent.
mkduplex Utility script for making a postscript file print double-sided.

Usage Examples

Fetch and view a PRL paper:

675 norika /tmp <> prfetch PRL 95 136804
fetching    PRL 95   136804: html... pdf...(1054650 bytes).
676 norika /tmp <> gv PRL.95.136804.pdf 

Fetch and print an RMP paper:

677 norika /tmp <>  prfetch RMP 78 17  
fetching    RMP 78   17    : html... pdf...(1757602 bytes).
678 norika /tmp <>  ppsp RMP.78.17.pdf 

Fetch table of contents for an issue of PRL, ask you which papers you want, then fetch those papers and put them all in one big file:

533 norika /tmp <> prlget 96 01
fetching html document...
PRL for  14 January -  20 January 2006: 

   1: Quantum Metrology

(S)kip, Show (A)bstract, (G)et? Note: just hit return to skip

   2: Fermions in Optical Lattices Swept across Feshbach Resonances

(S)kip, Show (A)bstract, (G)et? 

   3: Emergence of Chaos in Quantum Systems Far from the Classical Limit

(S)kip, Show (A)bstract, (G)et? a
   (Received 11 April 2005; published 10 January 2006)

   The^ dynamical status of isolated quantum systems is unclear as
   conventional^ measures fail to detect chaos in such systems. However,
   when^ quantum systems are subjected to observation--as all experimental
   systems must^ be--their dynamics is no longer linear and, in
   the appropriate^ limit(s), the evolution of expectation values,
   conditioned on the observations,^ closely approaches the behavior
   of classical trajectories. Here we show,^ by analyzing a specific
   example, that microscopic continuously observed quantum^ systems,
   even far from any classical limit, can have a^ positive Lyapunov
   exponent, and thus be truly chaotic.

   2006 The American Physical Society
(S)kip, Show (A)bstract, (G)et? g

        * fetching paper 010403...

   4: Violation of the Entropic Area Law for Fermions

(S)kip, Show (A)bstract, (G)et? g

        * fetching paper 010404...

   5: Directed Spontaneous Emission from an Extended Ensemble of N Atoms:
   Timing Is Everything
(S)kip, Show (A)bstract, (G)et? 

This keeps going until the last entry in this week's edition of the journal:

   122: Erratum: Hydrogen Burning of ^17O in Classical Novae [Phys. Rev.
   Lett. 95, 031101 (2005)]
(S)kip, Show (A)bstract, (G)et? 

Get any others? 
At this prompt you can enter the numbers (between 1 and 122 in this case) of any articles you want to get but missed at the time. Otherwise just hit return. You will get a bunch of output like this:
010403.pdf: PDF file detected, converting to PS
010403.ps: 4 pages, padding 0 times...
010404.pdf: PDF file detected, converting to PS
010404.ps: 4 pages, padding 0 times...
and then finally the final status printout telling you where the output files are written. The output is a file in your /tmp directory named, in this case, PRL.96.01.ps.
PS output written to /tmp/pps.ps.
PDF output written to /tmp/pps.pdf.
Thank you for your support.
Output in /tmp/PRL.96.01.ps.
Thank you for your support.

Stupid bash tricks, by Homer Reid
Last Modified: 11/16/16