-
Notifications
You must be signed in to change notification settings - Fork 91
Entity Extraction with PHP
yooper edited this page Aug 16, 2016
·
5 revisions
Entity Extraction is performed by using a 3rd party library. This functionality is dependent upon having the Stanford Named Entity Extraction java jar files available and also java must be available.
Download and install the latest jar files from Stanford. Here is link to the downloadable zip
To use the API available through PHP do the following :
- unzip the download
- provide the path to the jar file as the 1st parameter to StanfordNerTagger class
- provide the path to the trained classifier file as the 2nd paramter to the StanfordNerTagger class
- set $JAVA_HOME to the path of the java installation you wish to use to run the jar files
- use the example below to help you with extracting entities using PHP Text Analysis
use TextAnalysis\Taggers\StanfordNerTagger;
use TextAnalysis\Tokenizers\WhitespaceTokenizer;
use TextAnalysis\Documents\TokensDocument;
Class EntityExtractionTest extends \PHPUnit_Framework_TestCase
{
protected $text = "Marquette County is a county located in the Upper Peninsula of the US state of Michigan. As of the 2010 census, the population was 67,077.";
public function testStanfordNer()
{
$document = new TokensDocument((new WhitespaceTokenizer())->tokenize($this->text));
$jarPath = get_storage_path('ner').'stanford-ner.jar';
$classiferPath = get_storage_path('ner'.DIRECTORY_SEPARATOR."classifiers")."english.all.3class.distsim.crf.ser.gz";
$tagger = new StanfordNerTagger($jarPath, $classiferPath);
$output = $tagger->tag($document->getDocumentData());
$this->assertFileExists($tagger->getTmpFilePath());
$this->assertEquals(138, filesize($tagger->getTmpFilePath()));
$this->assertEquals(['LOCATION','Michigan'], $output[15], "Did you set JAVA_HOME env variable?");
}