Skip to content

Commit ac5789f

Browse files
authored
4.0.0 TextBlob Extension Support (#18)
- New custom attribute `doc._.blob`, `span._.blob`, `token._.blob`. - Support for TextBlob extensions (https://textblob.readthedocs.io/en/dev/extensions.html#extensions). - Docs are build using Material for MkDocs (https://squidfunk.github.io/mkdocs-material/) instead of Docusaurus.
1 parent f51b5ff commit ac5789f

File tree

75 files changed

+1530
-1768
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

75 files changed

+1530
-1768
lines changed

.github/workflows/pytest.yml

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ jobs:
1616
runs-on: ubuntu-latest
1717
strategy:
1818
matrix:
19-
python-version: ["3.6", "3.7", "3.8", "3.9"]
19+
python-version: ["3.7", "3.8", "3.9"]
2020

2121
steps:
2222
- uses: actions/checkout@v2
@@ -33,6 +33,10 @@ jobs:
3333
pip install -r requirements.txt
3434
python -m textblob.download_corpora
3535
python -m spacy download en_core_web_sm
36+
pip install textblob-de
37+
python -m spacy download de_core_news_sm
38+
pip install textblob-fr
39+
python -m spacy download fr_core_news_sm
3640
pip install pytest
3741
- name: Test with pytest
3842
run: |

CONTRIBUTING.md

Lines changed: 23 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,30 @@
1-
# Contiributing
1+
# Contributing
22

3-
spaCyTextBlob is happy to accept contributions from the community. Please review the guidelines below.
3+
*spacytextblob* is happy to accept contributions from the community. Please review the guidelines below.
4+
5+
## Development environment
6+
7+
### poetry
8+
9+
`poetry` is used to manage python dependencies. See the docs on how to install python [https://python-poetry.org/](https://python-poetry.org/). To activate the poetry virtual environment run the following commands:
10+
11+
```bash
12+
poetry install
13+
poetry shell
14+
```
15+
16+
### just
17+
18+
`just` is used to run scripts. See the just docs for instructions on how to install: [https://github.com/casey/just](https://github.com/casey/just).
419

520
## Code formatting
621

722
Please use the [black](https://black.readthedocs.io/en/stable/) for formatting code before submitting a PR.
823

24+
```bash
25+
black spacytextblob
26+
```
27+
928
## Testing
1029

1130
Please validate that all tests pass before submitting a PR by running:
@@ -16,11 +35,8 @@ pytest
1635

1736
## Docs
1837

19-
To build the docs please run:
38+
To build the docs and visually inspect the docs please run:
2039

2140
```bash
22-
bash scripts/build_docs.sh
41+
just docs
2342
```
24-
25-
If you add new documentation using a jupyter notebook please make sure to update [scripts/build_docs.sh](scripts/build_docs.sh) to include the new notebook.
26-

Makefile

Lines changed: 0 additions & 11 deletions
This file was deleted.

README.md

Lines changed: 32 additions & 69 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,14 @@
1-
# spaCyTextBlob <a href='https://spacytextblob.netlify.app/'><img src='website/static/img/logo-thumb-circle-250x250.png' align="right" height="139" /></a>
1+
# spacytextblob
22

33
[![PyPI version](https://badge.fury.io/py/spacytextblob.svg)](https://badge.fury.io/py/spacytextblob)
4-
[![pytest](https://github.com/SamEdwardes/spaCyTextBlob/actions/workflows/pytest.yml/badge.svg)](https://github.com/SamEdwardes/spaCyTextBlob/actions/workflows/pytest.yml)
4+
[![pytest](https://github.com/SamEdwardes/spacytextblob/actions/workflows/pytest.yml/badge.svg)](https://github.com/SamEdwardes/spacytextblob/actions/workflows/pytest.yml)
55
![PyPI - Downloads](https://img.shields.io/pypi/dm/spacytextblob?label=PyPi%20Downloads)
66
[![Netlify Status](https://api.netlify.com/api/v1/badges/e2f2caac-7239-45a2-b145-a00205c3befb/deploy-status)](https://app.netlify.com/sites/spacytextblob/deploys)
77

8-
9-
A TextBlob sentiment analysis pipeline compponent for spaCy.
10-
11-
Version 3.0 is a major version update providing support for spaCy 3.0's new interface for adding pipeline components. As a result, it is not backwards compatible with previous versions of spaCyTextBlob. For compatability with spaCy 2.0 please use `pip install spacytextblob==0.1.7`.
12-
13-
*Note that version 1.0, and 2.0 have been skipped. The numbering has been aligned with spaCy's version numbering in the hopes of making it easier to compar.*
8+
A TextBlob sentiment analysis pipeline component for spaCy.
149

1510
- [Docs](https://spacytextblob.netlify.app/)
16-
- [GitHub](https://github.com/SamEdwardes/spaCyTextBlob)
11+
- [GitHub](https://github.com/SamEdwardes/spacytextblob)
1712
- [PyPi](https://pypi.org/project/spacytextblob/)
1813

1914
## Table of Contents
@@ -25,119 +20,87 @@ Version 3.0 is a major version update providing support for spaCy 3.0's new inte
2520

2621
## Install
2722

28-
Install spaCyTextBlob from pypi.
23+
Install *spacytextblob* from PyPi.
2924

3025
```bash
3126
pip install spacytextblob
3227
```
3328

34-
TextBlob also requires some data to be downloaded before getting started.
29+
TextBlob requires additional data to be downloaded before getting started.
3530

3631
```bash
3732
python -m textblob.download_corpora
3833
```
3934

40-
spaCy requires that you download a model to get started.
35+
spaCy also requires that you download a model to get started.
4136

4237
```bash
4338
python -m spacy download en_core_web_sm
4439
```
4540

4641
## Quick Start
4742

48-
spaCyTextBlob allows you to access all of the attributes created by TextBlob sentiment method but within the spaCy framework. The code below will demonstrate how to use spaCyTextBlob on a simple string.
49-
50-
51-
```python
52-
text = "I had a really horrible day. It was the worst day ever! But every now and then I have a really good day that makes me happy."
53-
```
54-
55-
Using `spaCyTextBlob`:
56-
43+
*spacytextblob* allows you to access all of the attributes created of the `textblob.TextBlob` class but within the spaCy framework. The code below will demonstrate how to use *spacytextblob* on a simple string.
5744

5845
```python
5946
import spacy
6047
from spacytextblob.spacytextblob import SpacyTextBlob
6148

6249
nlp = spacy.load('en_core_web_sm')
50+
text = "I had a really horrible day. It was the worst day ever! But every now and then I have a really good day that makes me happy."
6351
nlp.add_pipe("spacytextblob")
6452
doc = nlp(text)
65-
```
6653

54+
print(doc._.blob.polarity)
55+
# -0.125
6756

68-
```python
69-
print('Polarity:', doc._.polarity)
70-
```
71-
72-
Polarity: -0.125
57+
print(doc._.blob.subjectivity)
58+
# 0.9
7359

74-
75-
76-
```python
77-
print('Sujectivity:', doc._.subjectivity)
60+
print(doc._.blob.sentiment_assessments.assessments)
61+
# [(['really', 'horrible'], -1.0, 1.0, None), (['worst', '!'], -1.0, 1.0, None), (['really', 'good'], 0.7, 0.6000000000000001, None), (['happy'], 0.8, 1.0, None)]
7862
```
7963

80-
Sujectivity: 0.9
81-
82-
83-
84-
```python
85-
print('Assessments:', doc._.assessments)
86-
```
87-
88-
Assessments: [(['really', 'horrible'], -1.0, 1.0, None), (['worst', '!'], -1.0, 1.0, None), (['really', 'good'], 0.7, 0.6000000000000001, None), (['happy'], 0.8, 1.0, None)]
89-
90-
91-
Using `TextBlob`:
92-
64+
In comparison, here is how the same code would look using `TextBlob`:
9365

9466
```python
9567
from textblob import TextBlob
96-
blob = TextBlob(text)
97-
```
9868

69+
text = "I had a really horrible day. It was the worst day ever! But every now and then I have a really good day that makes me happy."
70+
blob = TextBlob(text)
9971

100-
```python
10172
print(blob.sentiment_assessments.polarity)
102-
```
103-
104-
-0.125
73+
# -0.125
10574

106-
107-
108-
```python
10975
print(blob.sentiment_assessments.subjectivity)
110-
```
111-
112-
0.9
113-
76+
# 0.9
11477

115-
116-
```python
11778
print(blob.sentiment_assessments.assessments)
79+
# [(['really', 'horrible'], -1.0, 1.0, None), (['worst', '!'], -1.0, 1.0, None), (['really', 'good'], 0.7, 0.6000000000000001, None), (['happy'], 0.8, 1.0, None)]
11880
```
11981

120-
[(['really', 'horrible'], -1.0, 1.0, None), (['worst', '!'], -1.0, 1.0, None), (['really', 'good'], 0.7, 0.6000000000000001, None), (['happy'], 0.8, 1.0, None)]
121-
122-
12382
## Quick Reference
12483

125-
spaCyTextBlob performs sentiment analysis using the [TextBlob](https://textblob.readthedocs.io/en/dev/quickstart.html) library. Adding spaCyTextBlob to a spaCy nlp pipeline provides access to three new extension attributes.
84+
*spacytextblob* performs sentiment analysis using the [TextBlob](https://textblob.readthedocs.io/en/dev/quickstart.html) library. Adding *spacytextblob* to a spaCy nlp pipeline creates a new extension attribute for the `Doc`, `Span`, and `Token` classes from spaCy.
85+
86+
- `Doc._.blob`
87+
- `Span._.blob`
88+
- `Token._.blob`
12689

127-
- `._.polarity`
128-
- `._.subjectivity`
129-
- `._.assessments`
90+
The `._.blob` attribute contains all of the methods and attributes that belong to the `textblob.TextBlob` class Some of the common methods and attributes include:
13091

131-
These extension attributes can be accessed at the `Doc`, `Span`, or `Token` level.
92+
- **`._.blob.polarity`**: a float within the range [-1.0, 1.0].
93+
- **`._.blob.subjectivity`**: a float within the range [0.0, 1.0] where 0.0 is very objective and 1.0 is very subjective.
94+
- **`._.blob.sentiment_assessments.assessments`**: a list of polarity and subjectivity scores for the assessed tokens.
13295

133-
Polarity is a float within the range [-1.0, 1.0], subjectivity is a float within the range [0.0, 1.0] where 0.0 is very objective and 1.0 is very subjective, and assessments is a list of polarity and subjectivity scores for the assessed tokens.
96+
See the [textblob docs](https://textblob.readthedocs.io/en/dev/api_reference.html#textblob.blob.TextBlob) for the complete listing of all attributes and methods that are available in `._.blob`.
13497

13598
## Reference and Attribution
13699

137100
- TextBlob
138101
- [https://github.com/sloria/TextBlob](https://github.com/sloria/TextBlob)
139102
- [https://textblob.readthedocs.io/en/latest/](https://textblob.readthedocs.io/en/latest/)
140-
- negspaCy (for inpiration in writing pipeline and organizing repo)
103+
- negspaCy (for inspiration in writing pipeline and organizing repo)
141104
- [https://github.com/jenojp/negspacy](https://github.com/jenojp/negspacy)
142105
- spaCy custom components
143106
- [https://spacy.io/usage/processing-pipelines#custom-components](https://spacy.io/usage/processing-pipelines#custom-components)

docs/api.md

Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
# API Reference
2+
3+
## Custom attributes
4+
5+
When you add *spacytextblob* into your spaCy pipeline it exposes a custom attribute `._.blob`. This attribute is available for for the `Doc`, `Span`, and `Token` classes from spaCy.
6+
7+
- `Doc._.blob`
8+
- `Span._.blob`
9+
- `Token._.blob`
10+
11+
The section below outlines commonly accessed `._.blob` attributes and methods. See the [textblob docs](https://textblob.readthedocs.io/en/dev/api_reference.html#textblob.blob.TextBlob) for the complete listing of all attributes and methods that are available in `._.blob`.
12+
13+
### Attributes
14+
15+
| Name | Type | Description |
16+
|------|------|-------------|
17+
| `doc._.blob.polarity` | `Float` | The polarity of the document. The polarity score is a float within the range [-1.0, 1.0]. |
18+
| `doc._.blob.subjectivity` | `Float` | The subjectivity of the document. The subjectivity is a float within the range [0.0, 1.0] where 0.0 is very objective and 1.0 is very subjective. |
19+
| `doc._.blob.sentiment_assessments.assessments` | `tuple` | Return a tuple of form (polarity, subjectivity, assessments ) where polarity is a float within the range [-1.0, 1.0], subjectivity is a float within the range [0.0, 1.0] where 0.0 is very objective and 1.0 is very subjective, and assessments is a list of polarity and subjectivity scores for the assessed tokens. |
20+
21+
### Methods
22+
23+
**`doc._.blob.ngrams`**
24+
25+
| Name | Type | Description |
26+
|------|------|-------------|
27+
| n | `int` | The number of words to include in the ngram. By default `3`. |
28+
| RETURNS | `List[WordLists]` | |
29+
30+
31+
## Config
32+
33+
When adding *spacytextblob* to your spaCy pipeline you can optionally pass additional parameters into the `config` parameter:
34+
35+
| Name | Type | Description |
36+
|------|------|-------------|
37+
| `blob_only` | `bool` | If True, *spacytextblob* will only expose `._.blob` and not attempt to expose `._.polarity`, `._.subjectivity`, or `._.assessments`. This should always be set to True when using TextBlob extensions. By default `False`. |
38+
| `custom_blob` | `Dict[str, str]` | The `"custom_blob"` key should be assigned to a dictionary that tells spaCy what function to replace `textblob.TextBlob` with. In this case, we want to replace it with `TextBlobDE`. The key of the dictionary is `"@misc"`. This tells spaCy to look into the misc section of the spaCy register. The value should be the string name of a function that you have registered with spaCy. See the [TextBlob extensions](tutorial/textblob_extensions.md) section for more details. |
39+
40+
41+
```python
42+
import spacy
43+
from spacytextblob.spacytextblob import SpacyTextBlob
44+
45+
nlp = spacy.load("de_core_news_sm")
46+
47+
nlp.add_pipe( "spacytextblob", config={
48+
"blob_only": ..., # bool
49+
"custom_blob": ... # Dict[str, str]
50+
})
51+
```
52+
53+
### Examples
54+
55+
Using *spacytextblob* without an extension:
56+
57+
```python
58+
{! docs/static/reference_code/spacytextblob_example.py !}
59+
```
60+
61+
Using *spacytextblob* with an extension:
62+
63+
```python
64+
{! docs/static/reference_code/textblob_de_example.py !}
65+
```

docs/changelog.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
# Changelog
2+
3+
## 4.0.0 (TBD)
4+
5+
- New custom attribute `doc._.blob`, `span._.blob`, `token._.blob`.
6+
- Support for TextBlob extensions (https://textblob.readthedocs.io/en/dev/extensions.html#extensions).
7+
- Docs are build using Material for MkDocs (https://squidfunk.github.io/mkdocs-material/) instead of Docusaurus.
8+
9+
## 3.0.1 (2021-05-05)
10+
11+
- Update the README on PyPi.
12+
13+
## 3.0 (2021-04-02)
14+
15+
- Dropped support for spaCy 2.0 API.
16+
17+
## 0.1.0 to 0.1.7
18+
19+
- Supports spaCy 2.0 API.

docs/contributing.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
{! CONTRIBUTING.md !}

docs/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
{! README.md !}

0 commit comments

Comments
 (0)