Description
Source API Endpoint / Documentation
https://support.ala.org.au/support/solutions/articles/6000196714-how-to-download-occurrence-records
Provider description
Atlas of Living Australia (ALA) aggregates open datasets from several sources around Australia. If you exclude iNaturalist Australia, they have over 2,000,000 images and nearly 40,000 sounds.
I don't know how many of those are openly licensed, but at a quick glance, every individual record I clicked on was some variation of CC licensed. According to their image-specific search tool only 4003 images are all rights reserved. A large number are "unrecognised" licenses, but here is an example of one that has a CC license URI in the rights field: https://images.ala.org.au/image/8486cc13-4da9-4dd3-a0f4-5a3d1feea1dc
There are some unrecognised that also just do not have a license listed. I suspect the vast majority are CC license URIs though.
Licenses Provided
CC licences
Provider API Technical info
The organisation of data from ALA is similar to Europeana in that it's a collection of other sources, but also is a source itself.
There is an API, here's an example (page size set to 1): https://biocache-ws.ala.org.au/ws/occurrences/search?q=*%3A*&disableAllQualityFilters=true&qualityProfile=ALA&fq=multimedia%3A%22Image%22&fq=-data_resource_uid%3A%22dr1411%22&qc=-_nest_parent_%3A*&pageSize=1
{
"pageSize": 1,
"startIndex": 0,
"totalRecords": 2379535,
"sort": "score",
"dir": "asc",
"status": "OK",
"occurrences": [
{
"uuid": "c4262666-da59-4c89-964d-9ca5e4bcdb03",
"occurrenceID": "https://canbr.gov.au/photo/apii/id/dig/905",
"raw_catalogNumber": "dig 905.1",
"taxonConceptID": "https://id.biodiversity.org.au/node/apni/2902845",
"eventDate": 1126656000000,
"scientificName": "Acacia subcaerulea",
"vernacularName": "Blue-barked Acacia",
"taxonRank": "species",
"taxonRankID": 7000,
"kingdom": "Plantae",
"phylum": "Charophyta",
"classs": "Equisetopsida",
"order": "Fabales",
"family": "Fabaceae",
"genus": "Acacia",
"genusGuid": "https://id.biodiversity.org.au/taxon/apni/51471290",
"species": "Acacia subcaerulea",
"speciesGuid": "https://id.biodiversity.org.au/node/apni/2902845",
"year": 2005,
"month": "09",
"basisOfRecord": "HUMAN_OBSERVATION",
"dataResourceUid": "dr413",
"dataResourceName": "Australian Plant Image Index",
"assertions": [
"MODIFIED_DATE_INVALID",
"MISSING_TAXONRANK",
"TAXON_MISAPPLIED_MATCHED",
"LOCATION_NOT_SUPPLIED",
"COORDINATE_UNCERTAINTY_METERS_INVALID",
"MISSING_GEOREFERENCE_DATE",
"MISSING_GEOREFERENCEDBY",
"MISSING_GEOREFERENCEPROTOCOL",
"MISSING_GEOREFERENCESOURCES",
"MISSING_GEOREFERENCEVERIFICATIONSTATUS"
],
"speciesGroups": ["Plants", "Flowering plants", "Dicots"],
"image": "a31fb54a-255e-4d74-a647-105d36626cc5",
"images": ["a31fb54a-255e-4d74-a647-105d36626cc5"],
"spatiallyValid": true,
"recordedBy": ["Fagg, M."],
"collectors": ["Fagg, M."],
"raw_scientificName": "Acacia subcaerulea",
"raw_basisOfRecord": "HumanObservation",
"multimedia": ["Image"],
"license": "CC-BY 3.0 (Au)",
"imageUrl": "https://images.ala.org.au/image/proxyImage?imageId=a31fb54a-255e-4d74-a647-105d36626cc5",
"largeImageUrl": "https://images.ala.org.au/image/proxyImageThumbnailLarge?imageId=a31fb54a-255e-4d74-a647-105d36626cc5",
"smallImageUrl": "https://images.ala.org.au/image/proxyImageThumbnail?imageId=a31fb54a-255e-4d74-a647-105d36626cc5",
"thumbnailUrl": "https://images.ala.org.au/image/proxyImageThumbnail?imageId=a31fb54a-255e-4d74-a647-105d36626cc5",
"imageUrls": [
"https://images.ala.org.au/image/proxyImageThumbnailLarge?imageId=a31fb54a-255e-4d74-a647-105d36626cc5"
],
"geospatialKosher": "true",
"collector": ["Fagg, M."],
"namesLsid": "Acacia subcaerulea|https://id.biodiversity.org.au/node/apni/2902845|Blue-barked Acacia|Plantae|Fabaceae",
"left": 587970,
"right": 587970
}
],
"facetResults": [],
"query": "?q=*%3A*&disableAllQualityFilters=true&qualityProfile=ALA&fq=multimedia%3A%22Image%22&fq=-data_resource_uid%3A%22dr1411%22&qc=-_nest_parent_%3A*",
"urlParameters": "?q=*%3A*&disableAllQualityFilters=true&qualityProfile=ALA&fq=multimedia%3A%22Image%22&fq=-data_resource_uid%3A%22dr1411%22&qc=-_nest_parent_%3A*",
"queryTitle": "[all records]",
"activeFacetMap": {
"multimedia": {
"name": "multimedia",
"displayName": "Multimedia:\"Image\"",
"value": "\"Image\""
},
"-data_resource_uid": {
"name": "-data_resource_uid",
"displayName": "-<span>Data resource: iNaturalist Australia</span>",
"value": "\"dr1411\""
}
},
"activeFacetObj": {
"multimedia": [
{
"name": "multimedia",
"displayName": "Multimedia:\"Image\"",
"value": "multimedia:\"Image\""
}
],
"-data_resource_uid": [
{
"name": "-data_resource_uid",
"displayName": "-<span>Data resource: iNaturalist Australia</span>",
"value": "-data_resource_uid:\"dr1411\""
}
]
}
}
However, I think more powerful is the fact that they offer bulk downloads of individual queries. If you visit the "advanced search" page for the above query (https://biocache.ala.org.au/occurrence/search?q=*%3A*&disableAllQualityFilters=true&qualityProfile=ALA&fq=multimedia%3A%22Image%22&qc=-_nest_parent_%3A*&fq=-data_resource_uid%3A%22dr1411%22), there is a download button, which lets you export a CSV. The "download" is asynchronous, in that you trigger an export on their end, they generate a zip, and then you get back a link later.
The API for that is documented here: https://docs.ala.org.au/openapi/index.html?urls.primaryName=occurrences#/Download
We'd need a DAG that completes this flow:
- Trigger a download
- Poke the status endpoint until it says it's complete
- Download the zip to disk
- Unzip it and upload the CSV to s3
- Then follow the iNaturalist approach (load the CSV into Postgres, etc)
ALA has their own image proxying with various sizes of thumbnails.
Note that each "occurrence" may have more than one image! The "occurrenceID" only points to the "main" image, I think? The other UUIDs in images
all have proxied image URLs provided by ALA and are distinct on the ones that I saw this happening for.
Checklist to complete before beginning development
- Verify there is a way to retrieve the entire relevant portion of the provider's collection in a systematic way via their API.
- Verify the API provides license info (license type and version; license URL provides both, and is preferred)
- Verify the API provides stable direct links to individual works.
- Verify the API provides a stable landing page URL to individual works.
- Note other info the API provides, such as thumbnails, dimensions, attribution info (required if non-CC0 licenses will be kept), title, description, other meta data, tags, etc.
- Attach example responses to API queries that have the relevant info.
Implementation
- 🙋 I would be interested in implementing this feature.
Metadata
Assignees
Labels
Type
Projects
Status
📋 Backlog