University of Nottingham
Paper Scraper

A utility to scrape a collection of staff profile pages from the University of Nottingham to allow for collation of publication lists.

Setup

Download the latest version of the repository (no 'releases' are planned). You will need a local copy of Composer in the root of the downloaded repository. Run the command php composer.phar install. Beyond this, look at how to use the code by checking out the example.php file.

Examples

Below are a number of separate examples. See example.php for example scraping of an entire website and converting the output into HTML, including grouping publications into five-year batches.

Author List

To extract an iterable list of authors:

$url = 'URL TO PEOPLE/STAFF DIRECTORY';
$authors = new \Porcheron\UonPaperScraper\Authors($url); // specifying a URL causes a crawl to occur

foreach ($authors as &$author) {
	// each author object has surname(), otherNames(), and
	// url() methods to return their respective content
}

This list can also be converted to a JSON list using the standard PHP json_encode function:

\json_encode($authors);

A Staff Member's Publication List

To retrieve the list of publications for a single staff member:

$surname = 'SURNAME';
$otherNames = 'OTHER NAMES';
$url = 'URL TO STAFF MEMBER'S PUBLIC ESTAFFPROFILE PAGE';
$author = new \Porcheron\UonPaperScraper\Author($surname, $otherNames, $url);

$publications = $author->publications(true); // true causes a crawl to occur

Again, this $publications object is iterable, and can also be converted to a JSON string:

foreach ($publications as &$publication) {
	// each publication object has doi(), title(), year() and 
	// html() functions to return their respective content  
}

\json_encode($publications);

All Staff Members' Publication Lists as One List

To retrieve a combined list of publications from all staff, use:

$url = 'URL TO PEOPLE/STAFF DIRECTORY';
$authors = new \Porcheron\UonPaperScraper\Authors($url);
$publications = $authors->publications(true);

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
composer.json		composer.json
example.php		example.php

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

University of Nottingham
Paper Scraper

Setup

Examples

Author List

A Staff Member's Publication List

All Staff Members' Publication Lists as One List

About

Uh oh!

Releases

Packages

Languages

License

MixedRealityLab/uon-paper-scraper

Folders and files

Latest commit

History

Repository files navigation

University of Nottingham Paper Scraper

Setup

Examples

Author List

A Staff Member's Publication List

All Staff Members' Publication Lists as One List

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

University of Nottingham
Paper Scraper

Packages