Skip to content
rsimon edited this page Nov 25, 2014 · 1 revision

Pleiades+ ("Pleiades Plus") is a toponym extension to Pleiades based on GeoNames that is a spinoff output of Google Ancient Places (GAP) and Pelagios. Its initial development was spearheaded by Leif Isaksen.

Get the Data

Build It Yourself

Availability (License)

In the interests of openness and collaboration Leif has made Pleiades+ freely available under a CC-zero license. This essentially means that you can do whatever you like with it but please take into account the HEALTH WARNING and remember that it's always polite to give attribution, even when you don't have to. It's also good scholarly practice.

To the extent possible under law, Leif Isaksen has waived all copyright and related or neighboring rights to Pleiades+. This work is published from: United Kingdom.

What is it?

Pleiades (http://pleiades.stoa.org/) is an on-line, open-access publication that provides stable identifiers for geographic objects of interest for the study of antiquity while engaging scholars, students, and enthusiasts worldwide in curating descriptive information about those objects. Begun by digitizing the contents of the Barrington Atlas of the Greek and Roman World (R.J.A. Talbert, ed., Princeton, 2000), Pleiades has extensive coverage for the Greek and Roman world, and is beginning to expand into Ancient Near Eastern, Byzantine, Celtic, and Early Medieval geography. A key function for Pleiades is providing stable Uniform Resource Identifiers (URIs) for ancient places so that other online publications about the ancient world can make use of a common reference for geographic places, names, and locations thereby providing a basis for interoperation and cross-project services.

GeoNames is a much larger gazetteer of (mainly contemporary) locations, some with historic equivalents, and a much richer array of alternative toponyms. The GeoNames dataset is also available for download free of charge under a creative commons attribution license (http://www.geonames.org/)

Pleiades+ attempts to match Pleiades and GeoNames identifiers together based on both location and toponym. When such matches are found, the additional toponyms from GeoNames are associated with the Pleiades URI. We have found that the main Pleiades gazetteer can be expanded in this manner by approximately 30% (from approx. 31,000 to 43,000 toponyms).

Why is this useful?

When another project attempts to align its existing geographic information with Pleiades (why?), some matches may fail because the two projects are not using the same toponyms. By using Pleiades+ for the initial alignment process, projects may find a significant improvement in their results by using Pleiades+, particularly if their dataset relies heavily on modern toponymy.

For example, Google Ancient Places uses Pleiades+ for geoparsing - identifying references to ancient places in texts. This is a two step process in which they first tag possible references based on a string match with any entry in gazetteer. Secondly, where the same name may refer to multiple places they seek to identify which is the correct one. This is partly done by taking into account places nearby so in both cases the more tags they have, the better their results. Increasing the number of toponyms increases the tagging rate.

HEALTH WARNING

No list of alternative toponyms is, or ever could be, complete. Names simply don't work that way. Pleiades+ is merely an attempt to add additional toponyms to Pleiades and makes no claims whatsoever about completeness. It provides about 50% more toponyms than the original Pleiades data, although the distribution tends to reflect many additional toponyms for relatively few sites, rather than many sites with one or two extra toponyms. Other things you should know are:

  • Currently it is only using data from a subset of GeoNames' country datasets (see list below).
  • Matching has been done automatically based on a) a toponymic match, and b) the point coordinates of GeoNames being within the grid square of Pleiades/Barrington Atlas so false pasitives may exist.
  • The centroid coordinates, where present, relate to the Barrington Atlas grid square, not the place. We use these as a proxy for the location when geoparsing.

As time permits we will extend this page with a fuller set of instructions on how to regenerate Pleiades+ should you wish to. In the meantime we provide those from the original blog poost. Feel free to contact Leif Isaksen for suggestions and code if necessary.

List of GeoNames countries and regions used (largely mediterranean, littoral): (AL) Albania, (BA) Bosnia, (CY) Cyprus, (DZ) Algeria, (EG) Egypt, (ES) Spain, (FR) France, (GR) Greece, (HR) Croatia, (IL) Israel, (IT) Italy, (LB) Lebanon, (LY) Libya, (MA) Morocco, (MT) Malta, (PT) Portugal, (TN) Tunisia, (TR) Turkey

A tip from Kate Byrne (sometimes referred to as "Pleiades++"): in processing, results can sometimes be improved when Pleiades+ fails to match GeoNames by looking at the "synonym ring" of alternative names in GeoNames and trying all of those against Pleiades+.

Acknowledgements

We are grateful to the folks at both Pleiades and GeoNames for making this work possible

A technical summary in 11 stages

The following steps were undertaken in order to produce Pleiades+:

  1. The latest data dump is retrieved from Pleiades. This includes a table of Pleiades locations, a table of the Barrington Atlas Ids and table of the Barrington Atlas Maps.
  2. The latest data dump is retrieved from GeoNames. This includes all data from countries covered by the Pleiades gazetteer, filtered to exclude irrelevant feature types (e.g. Airports, etc.).
  3. Alternative toponyms are extracted from the Pleiades and GeoNames gazetteers in order to produce ‘Toponym tables’ which map normalized toponyms to their identifiers (this equates to a ‘many-to-many’ mapping).
  4. A table of Barrington Atlas grid squares and their bounding coordinates is calculated from a table of the Barrington Atlas maps.
  5. The Barrington Atlas Ids table is expanded to extract the grid square(s) associated with each Pleiades identifier and joined to the Grid Square table in order to access its bounding coordinates.
  6. The Pleiades Toponym table is joined to the Barrington Atlas Ids table (and thus in turn to the Grid Square table) in order to ascertain bounding coordinates for each toponym, where known.
  7. The Geonames Toponym table is rejoined to the Geonames gazetteer in order to ascertain coordinates for each toponym.
  8. The Pleiades Toponym table is aligned with the Geonames Toponym table in cases where the normalized toponyms are the same and the Geonames coordinates fall within the bounding coordinates of the Pleiades toponym. This has the result of matching Pleiades identifiers to Geonames identifiers.
  9. The resulting Pleiades-GeoNames matches are then rejoined to the GeoNames Toponym table in order to elicit all the other Geonames toponyms associated with the GeoNames ID.
  10. A new CSV file is generated containing i) the original Pleiades Toponym table expanded to include a centroid for each grid square as a proxy location; and ii) the additional toponyms derived from GeoNames.
  11. The final list of results contains the following fields: a. The Pleiades identifier (mandatory) a. The normalized toponym (mandatory) a. The unnormalized toponym, (mandatory) a. The source [Pleiades | GeoNames ] (mandatory) a. The GeoNames Id (mandatory if Geonames is the source) a. The Pleaides centroid x, y (where known)