Skip to content

Latest commit

 

History

History
282 lines (211 loc) · 13.9 KB

README.md

File metadata and controls

282 lines (211 loc) · 13.9 KB

Arquivo404: soft 404 linker to Arquivo.pt

Arquivo404 automatically fixes links to broken URLs.

If a broken URL on a given website was web-archived by Arquivo.pt, the arquivo404 script will generate a customizable message containing a link to its web-archived version (memento). By default the oldest memento. If the URL was not web-archived, then the message is not presented.

It uses the Arquivo.pt Memento API to search for web-archived versions of the broken URL.

Other web archives that support the Memento protocol (rfc 7089) can be added.

Learn more at:

Examples of links to broken URLs fixed with arquivo404

One-line installation

The simplest way to install the arquivo404 script is to include it in the HTML element where the message will be presented. Here are 2 examples of one-liners that display the Arquivo404 message in English and in Portuguese, respectively.

EN

<script type="text/javascript" src="https://arquivo.pt/arquivo404.js" async defer onload="ARQUIVO_NOT_FOUND_404.message('<a href=\'{archivedURL}\'>View an archived version of the page from {date} at {archiveName}</a>').call();"></script>

PT

<script type="text/javascript" src="https://arquivo.pt/arquivo404.js" async defer onload="ARQUIVO_NOT_FOUND_404.call();"></script>

You may download the javacript file available at https://arquivo.pt/arquivo404.js to your web server and change the "src" attribute to its new path.

Methods to customize arquivo404 search and message

The Arquivo404 script exports a globally scoped variable: ARQUIVO_NOT_FOUND_404. This object provides methods to customize how the arquivo404 script searches for the broken URL and the message presented to the users.

Method Description Arguments Example
messageElementId Sets the id of the HTML element to write the message. If none is given, a new <div> will be created for this purpose. It will be appended to the parent of the <script> element that was used to load this script. messageElementId : string ARQUIVO_NOT_FOUND_404
.messageElementId('messageDiv')
.call();
message Sets the message to be displayed by arquivo404. message : string ARQUIVO_NOT_FOUND_404
.messageElementId('messageDiv')
.message('<a href="{archivedURL}">View an archived version of the page from {date} at {archiveName}</a>')
.call();
setMinimumDate Specifies the oldest date allowed to be retrieved (optional) minDate : Date ARQUIVO_NOT_FOUND_404
.messageElementId('messageDiv')
.setMinimumDate(new Date("2010-01-30 GMT"))
.call();
setMaximumDate Specifies the most recent date allowed to be retrieved (optional). maxDate : Date ARQUIVO_NOT_FOUND_404
.messageElementId('messageDiv')
.setMaximumDate(new Date("2015-01-30 GMT"))
.call();
setMostRelevantMemento Specifies whether to pick the oldest or the most recent memento retrieved from the web archive within the minimum and maximum dates (if defined). By default it picks the oldest one. criterion : 'oldest' | 'most-recent' ARQUIVO_NOT_FOUND_404
.messageElementId('messageDiv')
.setMostRelevantMemento('most-recent')
.call();
setDateFormatter Configures date format using the date tag on messages. The default formatting is YYYY-MM-DD. setDateFormatter's argument is a function that receives a single javascript Date object and returns a string. dateFormatter : function ARQUIVO_NOT_FOUND_404
.messageElementId('messageDiv')
.setDateFormatter(date => [date.getMonth()+1, date.getDate() ,date.getFullYear()].join('/'))
.message('<a href="{archivedURL}">View an archived version of the page from {date} at {archiveName}</a>')
.call();
addArchive Adds a web archive compliant with the Memento API protocol to search for web-archived versions of the broken URL. By default, arquivo404 uses the Arquivo.pt web archive. The argument of this function should have 3 properties:
  archiveApiUrl - URL to the timemap/link/ endpoint of the API.
  archiveName - Archive name to be used with the archiveName tag in the message.
  timeout - Timeout for the API request.
{
    archiveApiUrl: string,
    archiveName: string,
    timeout: number
}
ARQUIVO_NOT_FOUND_404
.messageElementId('messageDiv')
.addArchive({
    archiveApiUrl:'https://web.archive.org/web/timemap/link/',
    archiveName: 'Internet Archive',
    timeout: 2000
})
.call();
url Specifies a given URL to search in web archives. If this method isn't used, arquivo404 will search for the URL in window.location.href. url : string ARQUIVO_NOT_FOUND_404
.messageElementId('messageDiv')
.url('http://www.fccn.pt/SCCN/')
.call();
call Executes the arquivo404 script - ARQUIVO_NOT_FOUND_404
.call()

Special tags for custom message

Messages can use tags between curly brackets to display the following dynamic information:

Tag Description
archiveName The name of the web archive preserving the content of the broken URL
archivedURL The URL that references the web-archived content
date The date when the content was web-archived. The default format is YYYY-MM-DD, but it can be customized using the setDateFormatter method.

Usage examples

Presenting the message within a specific HTML element

  1. Import the arquivo404 script in the header of the soft 404 webpage:
<head>
...
<script type="text/javascript" src="https://arquivo.pt/arquivo404.js"></script>
...
</head>
  1. Create an empty <div> with a specific id (e.g. "messageDiv") where you want the arquivo404 message to be presented:
<body>
...
<div id="messageDiv"></div>
  1. Customize the ARQUIVO_NOT_FOUND_404 object using the messageElementId method to identify the created <div> and run arquivo404 script by invoking the call() method:
<script type="text/javascript">
    ARQUIVO_NOT_FOUND_404
      .messageElementId('messageDiv')
      .call();
</script>
...
</body>

Customizing the message

The message displayed by the arquivo404 script can be customized using the message method:

<script type="text/javascript">
    ARQUIVO_NOT_FOUND_404
      .messageElementId('messageDiv')
      .message('Oops! The page you were searching for seems to be missing! <a href="{archivedURL}">Visit an archived version of the page from {date} at {archiveName}.</a>')
      .call();
</script>
...
</body>

Getting the most recent memento, instead of the oldest (default)

By default, Arquivo404 will display the oldest version available among all available versions of the archived page.

This behaviour can be altered to instead display the most recent version:

<script type="text/javascript">
    ARQUIVO_NOT_FOUND_404
      .messageElementId('messageDiv')
      .setMostRelevantMemento('most-recent')
      .call();
</script>
...
</body>

Limiting the date range of the retrieved results

Suppose that the domain of your website belongs to you since 1 January 2010, and to other people before.

Thus, you want to limit the retrieved results to the time range when website began belonging to you. The function setMinimumDate supports this.

<script type="text/javascript">
    ARQUIVO_NOT_FOUND_404
      .messageElementId('messageDiv')
      .setMinimumDate(new Date("2010-01-01 GMT")) 
      .call();
</script>
...
</body>

Specifying a given URL to search in web archives

Some websites redirect broken links to a soft 404 page that loses track of the broken URL (URL in window.location.href).

In these cases, by default the arquivo404 script would search for web-archived versions of the soft 404 error page, instead of the broken URL.

If the website kept state of the broken URL originalUrl that was requested, it can inject it in its soft 404 page using the url method to solve this problem:

<script type="text/javascript">
    ARQUIVO_NOT_FOUND_404
      .messageElementId('messageDiv')
      .url(originalUrl) // Here we're assuming the original URL is stored in this variable
      .call();
</script>
...
</body>

Customizing date format in the message

By default, the date is displayed in the YYYY-MM-DD format. This can be changed using the setDateFormatter method:

<script type="text/javascript">
  function customDateFormatter(date){
    // formats the date into MM/DD/YYYY
    return (date.getMonth()+1) + '/' + date.getDate() + '/' + date.getFullYear();
  }
    ARQUIVO_NOT_FOUND_404
      .messageElementId('messageDiv')
      .setDateFormatter(customDateFormatter) 
      .message('<a href="{archivedURL}">View an archived version of the page from {date} at {archiveName}</a>')
      .call();
</script>
...
</body>

Adding web archives to search for the broken URL

Sometimes a broken URL isn't available in Arquivo.pt but it was preserved by other archives such as the Internet Archive. Arquivo404 supports adding web archives that support the Memento protocol, as long as they have CORS enabled.

<script type="text/javascript">
    ARQUIVO_NOT_FOUND_404
      .messageElementId('messageDiv')
	.addArchive( {  // adding the Internet Archive 
        timeout: 6000, 
        archiveName: "Internet Archive", 
        archiveApiUrl: "https://web.archive.org/web/timemap/link/" // MUST point towards the timemap/link endpoint of the API.
      } ) 
      .call();
</script>
...
</body>

Calling versions from older domains of the website

Suppose that your website used to have the domain old.website.org but at some point in time it was changed to new.website.org.

If we want arquivo404 to search for broken URLs across both domains, we can combine the url and addArchive methods as follows:

<script type="text/javascript">
    ARQUIVO_NOT_FOUND_404
	.url(window.location.pathname + window.location.search)
	.addArchive( {timeout: 2000, archiveName: "Arquivo.pt", archiveApiUrl: "https://arquivo.pt/arquivo404server/timemap/link/https://new.website.org"} ) 
	.addArchive( {timeout: 2000, archiveName: "Arquivo.pt", archiveApiUrl: "https://arquivo.pt/arquivo404server/timemap/link/https://old.website.org"} )
	.call();    
</script>
...
</body>

In the above example, if a user tries to visit new.website.org/pathname/ arquivo404 will search Arquivo.pt for mementos for both old.website.org/pathname/ and new.website.org/pathname/.

A complete example

A functional example using all of the possible configurations is available on 404-page-example.html

How to test arquivo404?

If your website is new

  1. Suggest your website to start being automatically web-archived
  2. Publish a test page on your website and write down its URL
  3. SavePageNow the test page
  4. Wait 48 hours
  5. Remove the test page from your website to originate a 404 error
  6. Try to open the URL of the test page and check if the arquivo404 message appears. If it does not appear, contact us.

If your website is already being preserved by Arquivo.pt

  1. Search for your website in Arquivo.pt
  2. Choose one of its old versions
  3. Browse until you find a web page that no longer exists on your current website
  4. Click on its original URL displayed on the replay top bar image
  5. Check if the arquivo404 appears. If it does not appear, contact us.

Troubleshooting

Web Archives must have CORS enabled

The arquivo404 JavaScript requires that the Memento API has an open CORS policy. In practice, the web archive server should respond with the HTTP header: Access-Control-Allow-Origin: *

Remove redirects/rewrite rules of URLs to home page

If the current website rewrites/redirects URLs that reference missing pages (404 errors) to the home page, the arquivo404 will not work.

In Wordpress websites, frequently there is a rule to rewrite URLs containing /index.php to /. If you cannot remove this rule, you can try to apply this alternative script to install arquivo404:

<script type="text/javascript" src=https://arquivo.pt/arquivo404.js async defer onload="ARQUIVO_NOT_FOUND_404.url([ ...(window.location.href.split('/').slice(0,3)), 'index.php', ...(window.location.href.split('/').slice(3)) ].join('/')).call();"></script>