Skip to content

Commit 7e94fee

Browse files
authored
Rename references to wrapper classes (#471)
* rename * check workflow * Add ftx package * nltk tests are not needed * fix nltk imports * scripts * fix tests * fix doctest path * fix test * fix doctest * fix codecov * coverage drop too much * path fixing * Upload coverage inside project * Upload coverage inside project * Remove test depencency on nltk wrappers * remove wrapper installation * diciontary need nltk
1 parent e83165b commit 7e94fee

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

48 files changed

+534
-486
lines changed

.github/workflows/main.yml

+3-13
Original file line numberDiff line numberDiff line change
@@ -61,15 +61,7 @@ jobs:
6161
rm -rf texar-pytorch
6262
- name: Install Forte
6363
run: |
64-
pip install --use-feature=in-tree-build --progress-bar off .[ner,test,example,ir,wikipedia,augment,stave]
65-
- name: Install a few wrappers for testing
66-
run: |
67-
git clone https://github.com/asyml/forte-wrappers.git
68-
cd forte-wrappers
69-
pip install --use-feature=in-tree-build --progress-bar off .[nltk]
70-
cd ..
71-
# Remove them to avoid confusion.
72-
rm -rf forte-wrappers
64+
pip install --use-feature=in-tree-build --progress-bar off .[ner,test,example,wikipedia,augment,stave]
7365
- name: Build ontology
7466
run: |
7567
./scripts/build_ontology_specs.sh
@@ -87,11 +79,9 @@ jobs:
8779
if [[ ${{ matrix.torch-version }} != "1.5.0" ]]; then mypy .; fi
8880
- name: Test with pytest and run coverage
8981
run: |
90-
coverage run -m pytest
91-
- name: Test doctest and try to append the coverage
92-
run: |
82+
coverage run -m pytest tests
9383
coverage run --append -m pytest --doctest-modules forte
94-
- name: Upload to codecov
84+
- name: Upload coverage
9585
run: |
9686
codecov
9787
docs:

README.md

+116-108
Original file line numberDiff line numberDiff line change
@@ -10,61 +10,120 @@
1010
[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/asyml/forte/blob/master/LICENSE)
1111
[![Chat](http://img.shields.io/badge/gitter.im-asyml/forte-blue.svg)](https://gitter.im/asyml/community)
1212

13+
**Forte** is a toolkit for building Natural Language Processing pipelines,
14+
featuring cross-task interaction, adaptable data-model interfaces and composable
15+
pipeline. Forte was originally developed in CMU and is actively contributed
16+
by [Petuum](https://petuum.com/)
17+
in collaboration with other institutes. This project is part of
18+
the [CASL Open Source](http://casl-project.ai/) family.
19+
20+
Forte provides a platform to assemble state-of-the-art NLP and ML technologies
21+
in a highly-composable fashion, including a wide spectrum of tasks ranging from
22+
Information Retrieval, Natural Language Understanding to Natural Language
23+
Generation.
1324

14-
**Forte** is a toolkit for building Natural Language Processing pipelines, featuring cross-task
15-
interaction, adaptable data-model interfaces and composable pipeline.
16-
Forte was originally developed in CMU and is actively contributed by [Petuum](https://petuum.com/)
17-
in collaboration with other institutes.
18-
This project is part of the [CASL Open Source](http://casl-project.ai/) family.
25+
### Download and Installation
26+
27+
To install the released version from PyPI:
28+
29+
```bash
30+
pip install forte
31+
```
32+
33+
To install from source,
34+
35+
```bash
36+
git clone https://github.com/asyml/forte.git
37+
cd forte
38+
pip install .
39+
```
40+
41+
To install some forte adapter for some
42+
existing [libraries](https://github.com/asyml/forte-wrappers#libraries-and-tools-supported):
1943

20-
Forte provides a platform to assemble
21-
state-of-the-art NLP and ML technologies in a highly-composable fashion, including a wide
22-
spectrum of tasks ranging from Information Retrieval, Natural Language Understanding to Natural
23-
Language Generation.
44+
```bash
45+
git clone https://github.com/asyml/forte-wrappers.git
46+
cd forte-wrappers
47+
# Change spacy to other tools. Check here https://github.com/asyml/forte-wrappers#libraries-and-tools-supported for available tools.
48+
pip install ."[spacy]"
49+
```
50+
51+
With Forte, it is extremely simple to build an integrated system that can search
52+
documents, analyze, extract information and generate language all in one place.
53+
This allows developers to fully utilize the strength of individual module,
54+
combine the results from each step, and enables the system to make fully
55+
informed decision at the end of the pipeline.
56+
57+
Forte not only makes it easy to integrate with arbitrary 3rd party tools (Check
58+
out these [examples](./examples)!), but also brings technology to you by
59+
offering a miscellaneous collection of deep learning modules via Texar, and a
60+
convenient model-data interface for casting tasks to models.
2461

25-
With Forte, it is extremely simple to build an integrated system that can search documents,
26-
analyze, extract information and generate language all in one place. This allows developers
27-
to fully utilize the strength of individual module, combine the results from each step, and enables
28-
the system to make fully informed decision at the end of the pipeline.
62+
### Library Example
2963

30-
Forte not only makes it easy to integrate with arbitrary 3rd party tools (Check out these [examples](./examples)!),
31-
but also brings technology to you by offering a miscellaneous collection of deep learning modules via Texar, and
32-
a convenient model-data interface for casting tasks to models.
64+
A simple code example that runs Named Entity Recognizer from Spacy (required
65+
installing forte spacy wrapper)
66+
67+
```python
68+
from forte import Pipeline
69+
from forte.data.readers import TerminalReader
70+
from forte.spacy import SpacyProcessor
71+
72+
for pack in Pipeline().set_reader(
73+
TerminalReader()
74+
).add(
75+
SpacyProcessor(), {"processors": "sentence, ner"}
76+
).initialize().process_dataset():
77+
for sentence in pack.get("ft.onto.base_ontology.Sentence"):
78+
print("The sentence is: ", sentence.text)
79+
print("The entities are: ")
80+
for ent in pack.get("ft.onto.base_ontology.EntityMention", sentence):
81+
print(ent.text, ent.ner_type)
82+
83+
```
84+
85+
Find more examples [here](./examples).
3386

3487
## Core Design Principles
3588

36-
The core design principle of Forte is the abstraction of NLP concepts and machine learning models. It
37-
not only separates data, model and tasks but also enables interactions between different components of
38-
the pipeline. Based on this principle, we make Forte:
39-
40-
* **Composable**: Forte helps users to decompose a problem into *data*, *models* and *tasks*.
41-
The tasks can further be divided into sub-tasks. A complex use case
42-
can be solved by composing heterogeneous modules via straightforward python APIs or declarative
43-
configuration files. The components (e.g. models or tasks) in the pipeline can be flexibly
44-
swapped in and out, as long as the API contracts are matched. This approach greatly improves module
45-
reusability, enables fast development and enhances the flexibility of using libraries.
46-
47-
* **Generalizable and Extensible**: Forte not only generalizes well on a wide
48-
range of NLP tasks, but also extends easily to new tasks or new domains. In particular, Forte
49-
provides the *Ontology* system that helps users define types according to their specific tasks.
50-
Users can declaratively specify the type through simple JSON files and our Code Generation tool
51-
will automatically generate ready-to-use python files for your project. Check out our
52-
[Ontology Generation documentation](./docs/ontology_generation.md) for more details.
53-
54-
* **Universal Data Flow**: Forte enables a universal data flow that supports seamless data flow between
55-
different steps. Central to Forte's composable architecture, a transparent data flow facilitates flexible
56-
process interventions and simple pipeline management. Adaptive to generic data formats, Forte is positioned as
57-
a perfect tool for data inspection, component swapping and result sharing.
58-
This is particularly helpful during team collaborations!
89+
The core design principle of Forte is the abstraction of NLP concepts and
90+
machine learning models. It not only separates data, model and tasks but also
91+
enables interactions between different components of the pipeline. Based on this
92+
principle, we make Forte:
93+
94+
* **Composable**: Forte helps users to decompose a problem into *data*, *models*
95+
and *tasks*. The tasks can further be divided into sub-tasks. A complex use
96+
case can be solved by composing heterogeneous modules via straightforward
97+
python APIs or declarative configuration files. The components (e.g. models or
98+
tasks) in the pipeline can be flexibly swapped in and out, as long as the API
99+
contracts are matched. This approach greatly improves module reusability,
100+
enables fast development and enhances the flexibility of using libraries.
101+
102+
* **Generalizable and Extensible**: Forte not only generalizes well on a wide
103+
range of NLP tasks, but also extends easily to new tasks or new domains. In
104+
particular, Forte provides the *Ontology* system that helps users define types
105+
according to their specific tasks. Users can declaratively specify the type
106+
through simple JSON files and our Code Generation tool will automatically
107+
generate ready-to-use python files for your project. Check out our
108+
[Ontology Generation documentation](./docs/ontology_generation.md) for more
109+
details.
110+
111+
* **Universal Data Flow**: Forte enables a universal data flow that supports
112+
seamless data flow between different steps. Central to Forte's composable
113+
architecture, a transparent data flow facilitates flexible process
114+
interventions and simple pipeline management. Adaptive to generic data
115+
formats, Forte is positioned as a perfect tool for data inspection, component
116+
swapping and result sharing. This is particularly helpful during team
117+
collaborations!
59118

60119
-----------------
61-
| ![forte_arch.jpg](https://raw.githubusercontent.com/asyml/forte/master/docs/_static/img/forte_arch.png) |
62-
|:--:|
63-
| *A high level Architecture of Forte showing how ontology and entries work with the pipeline.* |
120+
| ![forte_arch.jpg](https://raw.githubusercontent.com/asyml/forte/master/docs/_static/img/forte_arch.png)
121+
| |:--:| | *A high level Architecture of Forte showing how ontology and entries
122+
work with the pipeline.* |
64123
-----------------
65-
| ![forte_results.jpg](https://raw.githubusercontent.com/asyml/forte/master/docs/_static/img/forte_results.png) |
66-
|:--:|
67-
| *Forte stores results in data packs and use the ontology to represent task logic.* |
124+
| ![forte_results.jpg](https://raw.githubusercontent.com/asyml/forte/master/docs/_static/img/forte_results.png)
125+
| |:--:| | *Forte stores results in data packs and use the ontology to represent
126+
task logic.* |
68127
-----------------
69128

70129
## Package Overview
@@ -83,77 +142,21 @@ This is particularly helpful during team collaborations!
83142
<td><b> forte.processors </b></td>
84143
<td> a collection of processors for building NLP pipelines </td>
85144
</tr>
86-
<tr>
87-
<td><b> forte.trainer </b></td>
88-
<td> a collection of modules for training different NLP tasks </td>
89-
</tr>
90145
<tr>
91146
<td><b> ft.onto.base_ontology </b></td>
92147
<td> a module containing basic ontologies like Token, Sentence, Document etc </td>
93148
</tr>
94149
</table>
95150

96-
### Library API Example
97-
98-
A simple code example that runs Named Entity Recognizer
99-
100-
```python
101-
import yaml
102-
103-
from forte.pipeline import Pipeline
104-
from forte.data.readers import CoNLL03Reader
105-
from forte.processors.nlp import CoNLLNERPredictor
106-
from ft.onto.base_ontology import Token, Sentence
107-
from forte.common.configuration import Config
108-
109-
110-
config_data = yaml.safe_load(open("config_data.yml", "r"))
111-
config_model = yaml.safe_load(open("config_model.yml", "r"))
112-
113-
config = Config({}, default_hparams=None)
114-
config.add_hparam('config_data', config_data)
115-
config.add_hparam('config_model', config_model)
116-
117-
118-
pl = Pipeline()
119-
pl.set_reader(CoNLL03Reader())
120-
pl.add(CoNLLNERPredictor(), config=config)
121-
122-
pl.initialize()
123-
124-
for pack in pl.process_dataset(config.config_data.test_path):
125-
for pred_sentence in pack.get_data(context_type=Sentence, request={Token: {"fields": ["ner"]}}):
126-
print("============================")
127-
print(pred_sentence["context"])
128-
print("The entities are...")
129-
print(pred_sentence["Token"]["ner"])
130-
print("============================")
131-
132-
```
133-
134-
Find more examples [here](./examples).
135-
136-
### Download and Installation
137-
138-
To install the released version from PyPI:
139-
```bash
140-
pip install forte
141-
```
142-
143-
To install from source,
144-
```bash
145-
git clone https://github.com/asyml/forte.git
146-
cd forte
147-
pip install .
148-
```
149-
150151
### Getting Started
151152

152153
* [Examples](./examples)
153154
* [Documentation](https://asyml-forte.readthedocs.io/)
154-
* Currently we are working on some interesting [tutorials](https://github.com/asyml/forte/wiki)
155+
* Currently we are working on some
156+
interesting [tutorials](https://github.com/asyml/forte/wiki)
155157

156158
### Trouble Shooting
159+
157160
1. If you try to run `generate_ontology` script but encounter the following
158161
```
159162
Traceback (most recent call last):
@@ -167,18 +170,23 @@ pip install .
167170
raise PackageNotFoundError(name)
168171
importlib_metadata.PackageNotFoundError: forte
169172
```
170-
This is likely to be caused by multiple conflicting installation, such as
171-
installing both from source or from PIP. One way to solve this is to manually
172-
remove the script `~/anaconda3/bin/generate_ontology` and re-install the package.
173+
This is likely to be caused by multiple conflicting installation, such as
174+
installing both from source or from PIP. One way to solve this is to manually
175+
remove the script `~/anaconda3/bin/generate_ontology` and re-install the
176+
package.
173177
174178
### Contributing
175-
If you are interested in making enhancement to Forte, please first go over our [Code of Conduct](https://github.com/asyml/forte/blob/master/CODE_OF_CONDUCT.md) and [Contribution Guideline](https://github.com/asyml/forte/blob/master/CONTRIBUTING.md)
179+
180+
If you are interested in making enhancement to Forte, please first go over
181+
our [Code of Conduct](https://github.com/asyml/forte/blob/master/CODE_OF_CONDUCT.md)
182+
and [Contribution Guideline](https://github.com/asyml/forte/blob/master/CONTRIBUTING.md)
176183
177184
### License
178185
179186
[Apache License 2.0](./LICENSE)
180187
181188
### Companies and Universities Supporting Forte
189+
182190
<p float="left">
183191
<img src="https://raw.githubusercontent.com/asyml/forte/master/docs/_static/img/Petuum.png" width="200" align="top">
184192
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;

examples/chatbot/chatbot_example.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,8 @@
1616
from termcolor import colored
1717
import torch
1818

19+
from forte.nltk import (
20+
NLTKSentenceSegmenter, NLTKWordTokenizer, NLTKPOSTagger)
1921
from forte.common.configuration import Config
2022
from forte.data.multi_pack import MultiPack
2123
from forte.data.readers import MultiPackTerminalReader
@@ -25,8 +27,6 @@
2527
from forte.processors.nlp import SRLPredictor
2628
from forte.processors.ir import SearchProcessor, BertBasedQueryCreator
2729
from forte.data.selector import NameMatchSelector
28-
from forte_wrapper.nltk import (
29-
NLTKSentenceSegmenter, NLTKWordTokenizer, NLTKPOSTagger)
3030
from ft.onto.base_ontology import PredicateLink, Sentence
3131

3232
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

examples/chatbot/create_index.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@
2525
import texar.torch as tx
2626

2727
from forte.common.configuration import Config
28-
from forte_wrapper.faiss import EmbeddingBasedIndexer
28+
from forte.faiss import EmbeddingBasedIndexer
2929

3030
logging.basicConfig(level=logging.INFO)
3131

examples/clinical_pipeline/clinical_processing_pipeline.py

+4-4
Original file line numberDiff line numberDiff line change
@@ -4,10 +4,10 @@
44
import yaml
55
from mimic3_note_reader import Mimic3DischargeNoteReader
66

7-
from forte_wrapper.elastic import ElasticSearchPackIndexProcessor
8-
from forte_wrapper.nltk import NLTKSentenceSegmenter
9-
from forte_wrapper.hugginface.bio_ner_predictor import BioBERTNERPredictor
10-
from forte_wrapper.hugginface.transformers_processor import BERTTokenizer
7+
from forte.elastic import ElasticSearchPackIndexProcessor
8+
from forte.nltk import NLTKSentenceSegmenter
9+
from forte.hugginface.bio_ner_predictor import BioBERTNERPredictor
10+
from forte.hugginface.transformers_processor import BERTTokenizer
1111

1212
from forte.common.configuration import Config
1313
from forte.data.data_pack import DataPack

examples/clinical_pipeline/utterance_searcher.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
import sqlite3
44
from typing import Dict, Any, Optional, List
55

6-
from forte_wrapper.elastic import ElasticSearchIndexer
6+
from forte.elastic import ElasticSearchIndexer
77

88
from forte.common import Resources, ProcessorConfigError
99
from forte.common.configuration import Config

examples/data_augmentation/data_select/data_select_and_augment_example.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -15,10 +15,10 @@
1515
import logging
1616
import yaml
1717

18+
from forte.nltk import NLTKWordTokenizer, NLTKPOSTagger
1819
from forte.data.multi_pack import MultiPack
1920
from forte.pipeline import Pipeline
2021
from forte.processors.base.data_selector_for_da import RandomDataSelector
21-
from forte_wrapper.nltk import NLTKWordTokenizer, NLTKPOSTagger
2222
from forte.data.selector import AllPackSelector
2323
from forte.data.caster import MultiPackBoxer
2424
from forte.processors.data_augment import ReplacementDataAugmentProcessor

examples/data_augmentation/data_select/data_select_index_pipeline.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -26,8 +26,8 @@
2626
from typing import Dict, Any
2727
import logging
2828

29-
from forte_wrapper.elastic import ElasticSearchIndexer
30-
from forte_wrapper.elastic import ElasticSearchPackIndexProcessor
29+
from forte.elastic import ElasticSearchIndexer
30+
from forte.elastic import ElasticSearchPackIndexProcessor
3131

3232
from forte.common.configuration import Config
3333
from forte.data.data_pack import DataPack

examples/passage_ranker/create_index.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@
1818

1919
import yaml
2020

21-
from forte_wrapper.elastic import ElasticSearchTextIndexProcessor
21+
from forte.elastic import ElasticSearchTextIndexProcessor
2222

2323
from forte.common.configuration import Config
2424
from forte.data.data_pack import DataPack

examples/passage_ranker/indexer_reranker_eval_pipeline.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@
1616

1717
import yaml
1818

19-
from forte_wrapper.elastic import ElasticSearchQueryCreator, \
19+
from forte.elastic import ElasticSearchQueryCreator, \
2020
ElasticSearchProcessor
2121

2222
from ms_marco_evaluator import MSMarcoEvaluator

0 commit comments

Comments
 (0)