NLP (Natrual Language Processing) API via the pke (Python Keyphrase Extraction) engine to extract keywords and analyse topics from text. Salience
This library ships with supervised models trained on the SemEval-2010 dataset.
- Main production API URL sits at
https://nlp-vqyb5tu4fq-ew.a.run.app
Note - This will likely change. - The base URL of the application is
/api/v1
and must be prepended to every request. - A token must be passed for request to this application, it must be set as a header with the key of
X-Auth-Token
.
Below is a list of endpoints the API serves.
➡️ GET /api/v1/
Heartbeat endpoint that doesn't require any authorisation.
Example response:
{
"status": 200,
"message": "PONG",
"error": false,
"data": null
}
➡️ POST /api/v1/
This endpoint extracts keywords from a given piece of text. The JSON body for the endpoint is described below. A slice of objects is returned on successful submission which details the keyword and salience score.
Key | Example Value | Default Value | Required | Notes |
---|---|---|---|---|
language | "en" |
en | ✅ | See below for available language keys |
limit | 10 |
30 | ✅ | The amount of keywords to extract. |
text | "My keywords" |
N/A | ❌ | The content to extract the keywords from |
stopwords | ["exclude"] |
N/A | ❌ | Specific words to exclude |
dirty | ["exclude"] |
N/A | ❌ | Words that contain a substring to exclude |
Example response:
{
"status": 200,
"message": "Successfully obtained keywords.",
"error": false,
"data": [
{
"term": "seo",
"salience": 165.13790907034348
},
{
"term": "reddico",
"salience": 100.51872726020909
},
{
"term": "serp",
"salience": 28.719636360059738
},
{
"term": "brands",
"salience": 25.899545450074672
},
{
"term": "insights",
"salience": 23.97899999016428
},
{
"term": "unique technology",
"salience": 21.539727270044803
},
{
"term": "blackrock",
"salience": 21.539727270044803
},
{
"term": "reddico digital",
"salience": 21.539727270044803
},
{
"term": "technology",
"salience": 21.251797342284842
},
{
"term": "learn",
"salience": 20.368295910502656
},
{
"term": "agency",
"salience": 20.04992044286311
},
{
"term": "optimised",
"salience": 19.43192398051029
},
{
"term": "talent",
"salience": 18.539727270044803
},
{
"term": "company",
"salience": 18.16462612505645
},
{
"term": "team",
"salience": 16.725550003417045
}
]
}
The available languages and keys for the library is listed below.
"da": "danish"
"du": "dutch"
"en": "english"
"fi": "finnish"
"fr": "french"
"ge": "german"
"it": "italian"
"no": "norwegian"
"po": "portuguese"
"ro": "romanian"
"ru": "russian"
"sp": "spanish"
"sw": "swedish"
To exclude words from the extraction you can either pass stopwords
or dirty
in the JSON body of the request, the
difference is explained below. If you notice a pattern with a word regularly occurring, please add the word
to ./exclude/stopwords.json
or ./exclude/dirty.json
and make a pull request.
Stopwords are specific words to exclude from the analysis.
Dirty words will be compared by a substring to see if the keyword contains the word passed, if it does it will be excluded from the analysis.
This library currently implements the following keyphrase extraction models:
- Unsupervised models
- Statistical models
- FirstPhrases
- TfIdf
- KPMiner (El-Beltagy and Rafea, 2010)
- YAKE (Campos et al., 2020)
- Graph-based models
- TextRank (Mihalcea and Tarau, 2004)
- SingleRank (Wan and Xiao, 2008)
- TopicRank (Bougouin et al., 2013)
- TopicalPageRank (Sterckx et al., 2015)
- PositionRank (Florescu and Caragea, 2017)
- MultipartiteRank (Boudin, 2018)
- Statistical models
- Supervised models
- Feature-based models
To get started with local development for the project, please see the following steps below.
This library relies on relies on spacy
(>= 3.2.3) for text processing and
requires models to be installed. To set up the dependencies of the project, run the
following setup script.
sudo chmod -R 777 ./bin
./bin/start.sh
Export the environment variable NLP_TOKEN
to set an authorisation token to be used for the API. Subsequent requests
should use X-Auth-Token
with the value of the exported token.
export NLP_TOKEN=mytoken
A dockerfile is included in this project, so you can run the API locally.
docker build . nlp
docker run -it -p 8080:8080 nlp