Org-board is a bookmarking and web archival system for Emacs Org
mode, building on ideas from Pinboard. Org-board archives your
bookmarks so that you can access them even when you’re not online,
or when the site hosting them goes down. Org-board uses wget
as
a backend for archival, so any of its options can be used directly
from org-board. This means you can download whole sites for
archival with a couple of keystrokes, while keeping track of your
archives from a simple Org file.
In org-board, a bookmark is represented by an Org heading of any
level, with a URL property containing one or more URLs. Once such
a heading is created, a call to org-board-archive
creates a
unique ID and directory for the entry via org-attach
, archives
the contents and requisites of the page(s) listed in the URL
property using wget
, and saves them inside the entry’s directory.
A link to the (timestamped) root archive folder is created in the
property ARCHIVED_AT
. Multiple archives can be made for each
entry. Additional options to pass to wget
can be specified via
the property WGET_OPTIONS
.
- org-board-archive
- archive the current entry, creating a unique ID and directory via org-attach if necessary.
- org-board-archive-dry-run
- show the `wget’ invocation that will run for this entry.
- org-board-new
- prompt for a URL to add to the current entry’s properties, then archive it immediately.
- org-board-delete-all
- delete all the archives for this entry by deleting the org-attach directory.
- org-board-open
- Open the bookmark at point in a browser. With prefix command, open in the Emacs web browser.
- org-board-diff
- use
zdiff
(which itself uses the pre-installedediff
) to recursively diff two archives of the same entry. - org-board-cancel
- cancel the current org-board archival process.
These are all bound in the org-board-keymap
variable. To set it
up, see the “Getting started” section below.
- org-board-wget-program
- the path to the wget program.
- org-board-wget-switches
- the command line options to send by default to
wget
. By default these are included as:-e robots=off
- ignore robots.txt files.
--page-requisites
- download all page requisites (CSS, images) for pages downloaded.
--adjust-extension
- give pages that look like HTML a “.html” extension.
--convert-links
- convert links in downloaded files so that they all work locally.
- org-board-agent-header-alist
- an alist mapping agent names to their respective header/user-agent arguments. Set a
WGET_OPTIONS
property to the car of one of these lists (say, “Mac-OS-10.8”) and org-board will replace it with its corresponding value in the alist before calling wget. This is useful for some sites that do not serve pages towget
(like Google Cache). - org-board-wget-show-buffer
- whether to show the archival process buffer (defaults to true).
- org-board-log-wget-invocation
- whether to log the archival process command in the root of the archival directory (defaults to true).
- org-board-domain-regexp-alist
- apply certain options when a domain matches a regular expression. See the docstring for details. As an example, this is used to make sure that
wget
not send a User Agent string when archiving from Google Cache, which will not normally serve pages to it.
Options like “–header: ‘Agent Blabla’” cannot be specified as
properties, because the property API splits on spaces, and such an
option has to be passed to the wget
process as one argument. To
work around this, add these types of options to
org-board-agent-header-alist
instead, where the property API is
not involved.
I recently found a list of articles on linkers that I wanted to bookmark and keep locally for offline reading. In a dedicated org file for bookmarks I created this entry:
** TODO Linkers (20-part series) :PROPERTIES: :URL: http://a3f.at/lists/linkers :WGET_OPTIONS: --recursive -l 1 :END:
Where the URL property is a page that already lists the URLs that I
wanted to download. I specified the recursive property for `wget’
along with a depth of 1 (-l 1
) so that each linked page would be
downloaded. With point inside the entry, I run “M-x
org-board-archive”. An `org-attach’ directory is created and
`wget’ starts downloading the pages to it. At the end the entry
looks like this:
** TODO Linkers (20-part series) :PROPERTIES: :URL: http://a3f.at/lists/linkers :WGET_OPTIONS: --recursive -l 1 :ID: D3BCE79F-C465-45D5-847E-7733684B9812 :ARCHIVED_AT: [2016-08-30-Tue-15-03-56] :END:
The value in the ARCHIVED_AT property is a link that points to the root of the timestamped archival directory. The ID property was automatically generated by `org-attach’.
If you have zdiff
installed from GNU ELPA, you can diff between
two archives done for the same entry, so you can see how a page
has changed over time. The diff recurses through the directory
structure of an archive and will highlight any changes that have
been made.
There are two ways to install the package. One way is to clone this repository and load the Emacs Lisp file manually.
(load-file "/path/to/org-board.el")
Alternatively, you can download the package directly from Emacs
using MELPA. M-x package-install RET org-board RET
will take care of
it.
The following keymap is defined in org-board-keymap
:
Key | Command |
a | org-board-archive |
r | org-board-archive-dry-run |
n | org-board-new |
k | org-board-delete-all |
o | org-board-open |
d | org-board-diff |
c | org-board-cancel |
O | org-attach-reveal-in-emacs |
? | Show help for this keymap. |
To install the keymap is give it a prefix key, e.g.:
(global-set-key (kbd "<f11>") org-board-keymap)
Then typing <f11> a
would run org-board-archive
, for example.
The location of wget
should be picked up automatically from the
PATH environment variable. If it is not, then the variable
org-board-wget-program
can be customized.
Other options are already set so that archiving bookmarks is done
pretty much automatically. With no WGET_OPTIONS
specified, by
default `org-board-archive’ will just download the page and its
requisites (images and CSS), and nothing else.