Arquivo404 automatically fixes links to broken URLs.
If a broken URL on a given website was web-archived by Arquivo.pt, the arquivo404 script will generate a customizable message containing a link to its web-archived version (memento). By default the oldest memento. If the URL was not web-archived, then the message is not presented.
It uses the Arquivo.pt Memento API to search for web-archived versions of the broken URL.
Other web archives that support the Memento protocol (rfc 7089) can be added.
Learn more at:
- https://arquivo.pt/arquivo404en
- https://arquivo.pt/arquivo404 (in Portuguese)
- https://www.fct.pt/apoios/projectos/
- https://www.fccn.pt/SCCN/
- https://www.nau.edu.pt/pt/entidades/administracao-publica/fct/
- https://www.b-on.pt/sobre/index.aspx?area_id=3
- https://www.cert.rcts.pt/pt/sobre/filiacao/
- https://www.cienciavitae.pt/destaques/
- https://www.cienciavitae.pt/uploads/2018/11/Poster_CI%C3%8ANCIAVITAE.pdf
- https://ifilnova.pt/pt/pages/nuno-venturinha
- https://andremourao.com/courses
- https://webcurator.ddns.net/?p=134
- https://sobre.arquivo.pt/sobre-o-arquivo/sobre-o-arquivo/objectivos-do-arquivo-da-web-portuguesa
- https://sobre.arquivo.pt/sobre/publicacoes-1/automatic-identification-and-preservation-of-r-d
- https://sobre.arquivo.pt/pt/acerca/funcionamento-do-arquivo-pt/arquitectura/
- https://sobre.arquivo.pt/pt/acerca/funcionamento-do-arquivo-pt/tecnologia/
- https://sobre.arquivo.pt/en/about/system-functioning/technology/
- https://sobre.arquivo.pt/en/about/system-functioning/architecture/
The simplest way to install the arquivo404 script is to include it in the HTML element where the message will be presented. Here are 2 examples of one-liners that display the Arquivo404 message in English and in Portuguese, respectively.
<script type="text/javascript" src="https://arquivo.pt/arquivo404.js" async defer onload="ARQUIVO_NOT_FOUND_404.message('<a href=\'{archivedURL}\'>View an archived version of the page from {date} at {archiveName}</a>').call();"></script>
<script type="text/javascript" src="https://arquivo.pt/arquivo404.js" async defer onload="ARQUIVO_NOT_FOUND_404.call();"></script>
You may download the javacript file available at https://arquivo.pt/arquivo404.js to your web server and change the "src" attribute to its new path.
The Arquivo404 script exports a globally scoped variable: ARQUIVO_NOT_FOUND_404
. This object provides methods to customize how the arquivo404 script searches for the broken URL and the message presented to the users.
Method | Description | Arguments | Example |
---|---|---|---|
messageElementId | Sets the id of the HTML element to write the message. If none is given, a new <div> will be created for this purpose. It will be appended to the parent of the <script> element that was used to load this script. |
messageElementId : string |
ARQUIVO_NOT_FOUND_404 .messageElementId('messageDiv') .call(); |
message | Sets the message to be displayed by arquivo404. | message : string |
ARQUIVO_NOT_FOUND_404 .messageElementId('messageDiv') .message('<a href="{archivedURL}">View an archived version of the page from {date} at {archiveName}</a>') .call(); |
setMinimumDate | Specifies the oldest date allowed to be retrieved (optional) | minDate : Date |
ARQUIVO_NOT_FOUND_404 .messageElementId('messageDiv') .setMinimumDate(new Date("2010-01-30 GMT")) .call(); |
setMaximumDate | Specifies the most recent date allowed to be retrieved (optional). | maxDate : Date |
ARQUIVO_NOT_FOUND_404 .messageElementId('messageDiv') .setMaximumDate(new Date("2015-01-30 GMT")) .call(); |
setMostRelevantMemento | Specifies whether to pick the oldest or the most recent memento retrieved from the web archive within the minimum and maximum dates (if defined). By default it picks the oldest one. | criterion : 'oldest' | 'most-recent' |
ARQUIVO_NOT_FOUND_404 .messageElementId('messageDiv') .setMostRelevantMemento('most-recent') .call(); |
setDateFormatter | Configures date format using the date tag on messages. The default formatting is YYYY-MM-DD . setDateFormatter 's argument is a function that receives a single javascript Date object and returns a string . |
dateFormatter : function |
ARQUIVO_NOT_FOUND_404 .messageElementId('messageDiv') .setDateFormatter(date => [date.getMonth()+1, date.getDate() ,date.getFullYear()].join('/')) .message('<a href="{archivedURL}">View an archived version of the page from {date} at {archiveName}</a>') .call(); |
addArchive | Adds a web archive compliant with the Memento API protocol to search for web-archived versions of the broken URL. By default, arquivo404 uses the Arquivo.pt web archive. The argument of this function should have 3 properties: archiveApiUrl - URL to the timemap/link/ endpoint of the API. archiveName - Archive name to be used with the archiveName tag in the message. timeout - Timeout for the API request. |
{ archiveApiUrl: string , archiveName: string ,timeout: number } |
ARQUIVO_NOT_FOUND_404 .messageElementId('messageDiv') .addArchive({ archiveApiUrl:'https://web.archive.org/web/timemap/link/', archiveName: 'Internet Archive', timeout: 2000 }) .call(); |
url | Specifies a given URL to search in web archives. If this method isn't used, arquivo404 will search for the URL in window.location.href . |
url : string |
ARQUIVO_NOT_FOUND_404 .messageElementId('messageDiv') .url('http://www.fccn.pt/SCCN/') .call(); |
call | Executes the arquivo404 script | - | ARQUIVO_NOT_FOUND_404 .call() |
Messages can use tags between curly brackets to display the following dynamic information:
Tag | Description |
---|---|
archiveName |
The name of the web archive preserving the content of the broken URL |
archivedURL |
The URL that references the web-archived content |
date |
The date when the content was web-archived. The default format is YYYY-MM-DD , but it can be customized using the setDateFormatter method. |
- Import the arquivo404 script in the header of the soft 404 webpage:
<head>
...
<script type="text/javascript" src="https://arquivo.pt/arquivo404.js"></script>
...
</head>
- Create an empty
<div>
with a specific id (e.g. "messageDiv") where you want the arquivo404 message to be presented:
<body>
...
<div id="messageDiv"></div>
- Customize the
ARQUIVO_NOT_FOUND_404
object using themessageElementId
method to identify the created<div>
and run arquivo404 script by invoking thecall()
method:
<script type="text/javascript">
ARQUIVO_NOT_FOUND_404
.messageElementId('messageDiv')
.call();
</script>
...
</body>
The message displayed by the arquivo404 script can be customized using the message
method:
<script type="text/javascript">
ARQUIVO_NOT_FOUND_404
.messageElementId('messageDiv')
.message('Oops! The page you were searching for seems to be missing! <a href="{archivedURL}">Visit an archived version of the page from {date} at {archiveName}.</a>')
.call();
</script>
...
</body>
By default, Arquivo404 will display the oldest version available among all available versions of the archived page.
This behaviour can be altered to instead display the most recent version:
<script type="text/javascript">
ARQUIVO_NOT_FOUND_404
.messageElementId('messageDiv')
.setMostRelevantMemento('most-recent')
.call();
</script>
...
</body>
Suppose that the domain of your website belongs to you since 1 January 2010, and to other people before.
Thus, you want to limit the retrieved results to the time range when website began belonging to you.
The function setMinimumDate
supports this.
<script type="text/javascript">
ARQUIVO_NOT_FOUND_404
.messageElementId('messageDiv')
.setMinimumDate(new Date("2010-01-01 GMT"))
.call();
</script>
...
</body>
Some websites redirect broken links to a soft 404 page that loses track of the broken URL (URL in window.location.href
).
In these cases, by default the arquivo404 script would search for web-archived versions of the soft 404 error page, instead of the broken URL.
If the website kept state of the broken URL originalUrl
that was requested, it can inject it in its soft 404 page using the url
method to solve this problem:
<script type="text/javascript">
ARQUIVO_NOT_FOUND_404
.messageElementId('messageDiv')
.url(originalUrl) // Here we're assuming the original URL is stored in this variable
.call();
</script>
...
</body>
By default, the date is displayed in the YYYY-MM-DD
format. This can be changed using the setDateFormatter
method:
<script type="text/javascript">
function customDateFormatter(date){
// formats the date into MM/DD/YYYY
return (date.getMonth()+1) + '/' + date.getDate() + '/' + date.getFullYear();
}
ARQUIVO_NOT_FOUND_404
.messageElementId('messageDiv')
.setDateFormatter(customDateFormatter)
.message('<a href="{archivedURL}">View an archived version of the page from {date} at {archiveName}</a>')
.call();
</script>
...
</body>
Sometimes a broken URL isn't available in Arquivo.pt but it was preserved by other archives such as the Internet Archive. Arquivo404 supports adding web archives that support the Memento protocol, as long as they have CORS enabled.
<script type="text/javascript">
ARQUIVO_NOT_FOUND_404
.messageElementId('messageDiv')
.addArchive( { // adding the Internet Archive
timeout: 6000,
archiveName: "Internet Archive",
archiveApiUrl: "https://web.archive.org/web/timemap/link/" // MUST point towards the timemap/link endpoint of the API.
} )
.call();
</script>
...
</body>
Suppose that your website used to have the domain old.website.org
but at some point in time it was changed to new.website.org
.
If we want arquivo404 to search for broken URLs across both domains, we can combine the url
and addArchive
methods as follows:
<script type="text/javascript">
ARQUIVO_NOT_FOUND_404
.url(window.location.pathname + window.location.search)
.addArchive( {timeout: 2000, archiveName: "Arquivo.pt", archiveApiUrl: "https://arquivo.pt/arquivo404server/timemap/link/https://new.website.org"} )
.addArchive( {timeout: 2000, archiveName: "Arquivo.pt", archiveApiUrl: "https://arquivo.pt/arquivo404server/timemap/link/https://old.website.org"} )
.call();
</script>
...
</body>
In the above example, if a user tries to visit new.website.org/pathname/
arquivo404 will search Arquivo.pt for mementos for both old.website.org/pathname/
and new.website.org/pathname/
.
A functional example using all of the possible configurations is available on 404-page-example.html
- Suggest your website to start being automatically web-archived
- Publish a test page on your website and write down its URL
- SavePageNow the test page
- Wait 48 hours
- Remove the test page from your website to originate a 404 error
- Try to open the URL of the test page and check if the arquivo404 message appears. If it does not appear, contact us.
- Search for your website in Arquivo.pt
- Choose one of its old versions
- Browse until you find a web page that no longer exists on your current website
- Click on its original URL displayed on the replay top bar
- Check if the arquivo404 appears. If it does not appear, contact us.
The arquivo404 JavaScript requires that the Memento API has an open CORS policy.
In practice, the web archive server should respond with the HTTP header: Access-Control-Allow-Origin: *
If the current website rewrites/redirects URLs that reference missing pages (404 errors) to the home page, the arquivo404 will not work.
In Wordpress websites, frequently there is a rule to rewrite URLs containing /index.php to /. If you cannot remove this rule, you can try to apply this alternative script to install arquivo404:
<script type="text/javascript" src=https://arquivo.pt/arquivo404.js async defer onload="ARQUIVO_NOT_FOUND_404.url([ ...(window.location.href.split('/').slice(0,3)), 'index.php', ...(window.location.href.split('/').slice(3)) ].join('/')).call();"></script>