Skip to content

kljohann/org-board

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

56 Commits
 
 
 
 
 
 
 
 

Repository files navigation

org-board.el

Motivation

Org-board is a bookmarking and web archival system for Emacs Org mode, building on ideas from Pinboard. Org-board archives your bookmarks so that you can access them even when you’re not online, or when the site hosting them goes down. Org-board uses wget as a backend for archival, so any of its options can be used directly from org-board. This means you can download whole sites for archival with a couple of keystrokes, while keeping track of your archives from a simple Org file.

Summary

In org-board, a bookmark is represented by an Org heading of any level, with a URL property containing one or more URLs. Once such a heading is created, a call to org-board-archive creates a unique ID and directory for the entry via org-attach, archives the contents and requisites of the page(s) listed in the URL property using wget, and saves them inside the entry’s directory. A link to the (timestamped) root archive folder is created in the property ARCHIVED_AT. Multiple archives can be made for each entry. Additional options to pass to wget can be specified via the property WGET_OPTIONS.

User commands

org-board-archive
archive the current entry, creating a unique ID and directory via org-attach if necessary.
org-board-archive-dry-run
show the `wget’ invocation that will run for this entry.
org-board-new
prompt for a URL to add to the current entry’s properties, then archive it immediately.
org-board-delete-all
delete all the archives for this entry by deleting the org-attach directory.
org-board-open
Open the bookmark at point in a browser. With prefix command, open in the Emacs web browser.
org-board-diff
use zdiff (which itself uses the pre-installed ediff) to recursively diff two archives of the same entry.
org-board-cancel
cancel the current org-board archival process.

These are all bound in the org-board-keymap variable. To set it up, see the “Getting started” section below.

Important options

org-board-wget-program
the path to the wget program.
org-board-wget-switches
the command line options to send by default to wget. By default these are included as:
-e robots=off
ignore robots.txt files.
--page-requisites
download all page requisites (CSS, images) for pages downloaded.
--adjust-extension
give pages that look like HTML a “.html” extension.
--convert-links
convert links in downloaded files so that they all work locally.
org-board-agent-header-alist
an alist mapping agent names to their respective header/user-agent arguments. Set a WGET_OPTIONS property to the car of one of these lists (say, “Mac-OS-10.8”) and org-board will replace it with its corresponding value in the alist before calling wget. This is useful for some sites that do not serve pages to wget (like Google Cache).
org-board-wget-show-buffer
whether to show the archival process buffer (defaults to true).
org-board-log-wget-invocation
whether to log the archival process command in the root of the archival directory (defaults to true).
org-board-domain-regexp-alist
apply certain options when a domain matches a regular expression. See the docstring for details. As an example, this is used to make sure that wget not send a User Agent string when archiving from Google Cache, which will not normally serve pages to it.

Known limitations

Options like “–header: ‘Agent Blabla’” cannot be specified as properties, because the property API splits on spaces, and such an option has to be passed to the wget process as one argument. To work around this, add these types of options to org-board-agent-header-alist instead, where the property API is not involved.

Example usage

Archiving

I recently found a list of articles on linkers that I wanted to bookmark and keep locally for offline reading. In a dedicated org file for bookmarks I created this entry:

** TODO Linkers (20-part series)
:PROPERTIES:
:URL:          http://a3f.at/lists/linkers
:WGET_OPTIONS: --recursive -l 1
:END:

Where the URL property is a page that already lists the URLs that I wanted to download. I specified the recursive property for `wget’ along with a depth of 1 (-l 1) so that each linked page would be downloaded. With point inside the entry, I run “M-x org-board-archive”. An `org-attach’ directory is created and `wget’ starts downloading the pages to it. At the end the entry looks like this:

** TODO Linkers (20-part series)
:PROPERTIES:
:URL:          http://a3f.at/lists/linkers
:WGET_OPTIONS: --recursive -l 1
:ID:           D3BCE79F-C465-45D5-847E-7733684B9812
:ARCHIVED_AT:  [2016-08-30-Tue-15-03-56]
:END:

The value in the ARCHIVED_AT property is a link that points to the root of the timestamped archival directory. The ID property was automatically generated by `org-attach’.

Diffing

If you have zdiff installed from GNU ELPA, you can diff between two archives done for the same entry, so you can see how a page has changed over time. The diff recurses through the directory structure of an archive and will highlight any changes that have been made.

Getting started

Installation

There are two ways to install the package. One way is to clone this repository and load the Emacs Lisp file manually.

(load-file "/path/to/org-board.el")

Alternatively, you can download the package directly from Emacs using MELPA. M-x package-install RET org-board RET will take care of it.

Keybindings

The following keymap is defined in org-board-keymap:

KeyCommand
aorg-board-archive
rorg-board-archive-dry-run
norg-board-new
korg-board-delete-all
oorg-board-open
dorg-board-diff
corg-board-cancel
Oorg-attach-reveal-in-emacs
?Show help for this keymap.

To install the keymap is give it a prefix key, e.g.:

(global-set-key (kbd "<f11>") org-board-keymap)

Then typing <f11> a would run org-board-archive, for example.

Miscellaneous

The location of wget should be picked up automatically from the PATH environment variable. If it is not, then the variable org-board-wget-program can be customized.

Other options are already set so that archiving bookmarks is done pretty much automatically. With no WGET_OPTIONS specified, by default `org-board-archive’ will just download the page and its requisites (images and CSS), and nothing else.

About

Org mode's web archiver.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Emacs Lisp 100.0%