10
10
[ ![ License] ( https://img.shields.io/badge/license-Apache%202.0-blue.svg )] ( https://github.com/asyml/forte/blob/master/LICENSE )
11
11
[ ![ Chat] ( http://img.shields.io/badge/gitter.im-asyml/forte-blue.svg )] ( https://gitter.im/asyml/community )
12
12
13
+ ** Forte** is a toolkit for building Natural Language Processing pipelines,
14
+ featuring cross-task interaction, adaptable data-model interfaces and composable
15
+ pipeline. Forte was originally developed in CMU and is actively contributed
16
+ by [ Petuum] ( https://petuum.com/ )
17
+ in collaboration with other institutes. This project is part of
18
+ the [ CASL Open Source] ( http://casl-project.ai/ ) family.
19
+
20
+ Forte provides a platform to assemble state-of-the-art NLP and ML technologies
21
+ in a highly-composable fashion, including a wide spectrum of tasks ranging from
22
+ Information Retrieval, Natural Language Understanding to Natural Language
23
+ Generation.
13
24
14
- ** Forte** is a toolkit for building Natural Language Processing pipelines, featuring cross-task
15
- interaction, adaptable data-model interfaces and composable pipeline.
16
- Forte was originally developed in CMU and is actively contributed by [ Petuum] ( https://petuum.com/ )
17
- in collaboration with other institutes.
18
- This project is part of the [ CASL Open Source] ( http://casl-project.ai/ ) family.
25
+ ### Download and Installation
26
+
27
+ To install the released version from PyPI:
28
+
29
+ ``` bash
30
+ pip install forte
31
+ ```
32
+
33
+ To install from source,
34
+
35
+ ``` bash
36
+ git clone https://github.com/asyml/forte.git
37
+ cd forte
38
+ pip install .
39
+ ```
40
+
41
+ To install some forte adapter for some
42
+ existing [ libraries] ( https://github.com/asyml/forte-wrappers#libraries-and-tools-supported ) :
19
43
20
- Forte provides a platform to assemble
21
- state-of-the-art NLP and ML technologies in a highly-composable fashion, including a wide
22
- spectrum of tasks ranging from Information Retrieval, Natural Language Understanding to Natural
23
- Language Generation.
44
+ ``` bash
45
+ git clone https://github.com/asyml/forte-wrappers.git
46
+ cd forte-wrappers
47
+ # Change spacy to other tools. Check here https://github.com/asyml/forte-wrappers#libraries-and-tools-supported for available tools.
48
+ pip install ." [spacy]"
49
+ ```
50
+
51
+ With Forte, it is extremely simple to build an integrated system that can search
52
+ documents, analyze, extract information and generate language all in one place.
53
+ This allows developers to fully utilize the strength of individual module,
54
+ combine the results from each step, and enables the system to make fully
55
+ informed decision at the end of the pipeline.
56
+
57
+ Forte not only makes it easy to integrate with arbitrary 3rd party tools (Check
58
+ out these [ examples] ( ./examples ) !), but also brings technology to you by
59
+ offering a miscellaneous collection of deep learning modules via Texar, and a
60
+ convenient model-data interface for casting tasks to models.
24
61
25
- With Forte, it is extremely simple to build an integrated system that can search documents,
26
- analyze, extract information and generate language all in one place. This allows developers
27
- to fully utilize the strength of individual module, combine the results from each step, and enables
28
- the system to make fully informed decision at the end of the pipeline.
62
+ ### Library Example
29
63
30
- Forte not only makes it easy to integrate with arbitrary 3rd party tools (Check out these [ examples] ( ./examples ) !),
31
- but also brings technology to you by offering a miscellaneous collection of deep learning modules via Texar, and
32
- a convenient model-data interface for casting tasks to models.
64
+ A simple code example that runs Named Entity Recognizer from Spacy (required
65
+ installing forte spacy wrapper)
66
+
67
+ ``` python
68
+ from forte import Pipeline
69
+ from forte.data.readers import TerminalReader
70
+ from forte.spacy import SpacyProcessor
71
+
72
+ for pack in Pipeline().set_reader(
73
+ TerminalReader()
74
+ ).add(
75
+ SpacyProcessor(), {" processors" : " sentence, ner" }
76
+ ).initialize().process_dataset():
77
+ for sentence in pack.get(" ft.onto.base_ontology.Sentence" ):
78
+ print (" The sentence is: " , sentence.text)
79
+ print (" The entities are: " )
80
+ for ent in pack.get(" ft.onto.base_ontology.EntityMention" , sentence):
81
+ print (ent.text, ent.ner_type)
82
+
83
+ ```
84
+
85
+ Find more examples [ here] ( ./examples ) .
33
86
34
87
## Core Design Principles
35
88
36
- The core design principle of Forte is the abstraction of NLP concepts and machine learning models. It
37
- not only separates data, model and tasks but also enables interactions between different components of
38
- the pipeline. Based on this principle, we make Forte:
39
-
40
- * ** Composable** : Forte helps users to decompose a problem into * data* , * models* and * tasks* .
41
- The tasks can further be divided into sub-tasks. A complex use case
42
- can be solved by composing heterogeneous modules via straightforward python APIs or declarative
43
- configuration files. The components (e.g. models or tasks) in the pipeline can be flexibly
44
- swapped in and out, as long as the API contracts are matched. This approach greatly improves module
45
- reusability, enables fast development and enhances the flexibility of using libraries.
46
-
47
- * ** Generalizable and Extensible** : Forte not only generalizes well on a wide
48
- range of NLP tasks, but also extends easily to new tasks or new domains. In particular, Forte
49
- provides the * Ontology* system that helps users define types according to their specific tasks.
50
- Users can declaratively specify the type through simple JSON files and our Code Generation tool
51
- will automatically generate ready-to-use python files for your project. Check out our
52
- [ Ontology Generation documentation] ( ./docs/ontology_generation.md ) for more details.
53
-
54
- * ** Universal Data Flow** : Forte enables a universal data flow that supports seamless data flow between
55
- different steps. Central to Forte's composable architecture, a transparent data flow facilitates flexible
56
- process interventions and simple pipeline management. Adaptive to generic data formats, Forte is positioned as
57
- a perfect tool for data inspection, component swapping and result sharing.
58
- This is particularly helpful during team collaborations!
89
+ The core design principle of Forte is the abstraction of NLP concepts and
90
+ machine learning models. It not only separates data, model and tasks but also
91
+ enables interactions between different components of the pipeline. Based on this
92
+ principle, we make Forte:
93
+
94
+ * ** Composable** : Forte helps users to decompose a problem into * data* , * models*
95
+ and * tasks* . The tasks can further be divided into sub-tasks. A complex use
96
+ case can be solved by composing heterogeneous modules via straightforward
97
+ python APIs or declarative configuration files. The components (e.g. models or
98
+ tasks) in the pipeline can be flexibly swapped in and out, as long as the API
99
+ contracts are matched. This approach greatly improves module reusability,
100
+ enables fast development and enhances the flexibility of using libraries.
101
+
102
+ * ** Generalizable and Extensible** : Forte not only generalizes well on a wide
103
+ range of NLP tasks, but also extends easily to new tasks or new domains. In
104
+ particular, Forte provides the * Ontology* system that helps users define types
105
+ according to their specific tasks. Users can declaratively specify the type
106
+ through simple JSON files and our Code Generation tool will automatically
107
+ generate ready-to-use python files for your project. Check out our
108
+ [ Ontology Generation documentation] ( ./docs/ontology_generation.md ) for more
109
+ details.
110
+
111
+ * ** Universal Data Flow** : Forte enables a universal data flow that supports
112
+ seamless data flow between different steps. Central to Forte's composable
113
+ architecture, a transparent data flow facilitates flexible process
114
+ interventions and simple pipeline management. Adaptive to generic data
115
+ formats, Forte is positioned as a perfect tool for data inspection, component
116
+ swapping and result sharing. This is particularly helpful during team
117
+ collaborations!
59
118
60
119
-----------------
61
- | ![ forte_arch.jpg] ( https://raw.githubusercontent.com/asyml/forte/master/docs/_static/img/forte_arch.png ) |
62
- | :--:|
63
- | * A high level Architecture of Forte showing how ontology and entries work with the pipeline.* |
120
+ | ![ forte_arch.jpg] ( https://raw.githubusercontent.com/asyml/forte/master/docs/_static/img/forte_arch.png )
121
+ | | :--:| | * A high level Architecture of Forte showing how ontology and entries
122
+ work with the pipeline.* |
64
123
-----------------
65
- | ![ forte_results.jpg] ( https://raw.githubusercontent.com/asyml/forte/master/docs/_static/img/forte_results.png ) |
66
- | :--:|
67
- | * Forte stores results in data packs and use the ontology to represent task logic.* |
124
+ | ![ forte_results.jpg] ( https://raw.githubusercontent.com/asyml/forte/master/docs/_static/img/forte_results.png )
125
+ | | :--:| | * Forte stores results in data packs and use the ontology to represent
126
+ task logic.* |
68
127
-----------------
69
128
70
129
## Package Overview
@@ -83,77 +142,21 @@ This is particularly helpful during team collaborations!
83
142
<td><b> forte.processors </b></td>
84
143
<td> a collection of processors for building NLP pipelines </td>
85
144
</tr >
86
- <tr >
87
- <td><b> forte.trainer </b></td>
88
- <td> a collection of modules for training different NLP tasks </td>
89
- </tr >
90
145
<tr >
91
146
<td><b> ft.onto.base_ontology </b></td>
92
147
<td> a module containing basic ontologies like Token, Sentence, Document etc </td>
93
148
</tr >
94
149
</table >
95
150
96
- ### Library API Example
97
-
98
- A simple code example that runs Named Entity Recognizer
99
-
100
- ``` python
101
- import yaml
102
-
103
- from forte.pipeline import Pipeline
104
- from forte.data.readers import CoNLL03Reader
105
- from forte.processors.nlp import CoNLLNERPredictor
106
- from ft.onto.base_ontology import Token, Sentence
107
- from forte.common.configuration import Config
108
-
109
-
110
- config_data = yaml.safe_load(open (" config_data.yml" , " r" ))
111
- config_model = yaml.safe_load(open (" config_model.yml" , " r" ))
112
-
113
- config = Config({}, default_hparams = None )
114
- config.add_hparam(' config_data' , config_data)
115
- config.add_hparam(' config_model' , config_model)
116
-
117
-
118
- pl = Pipeline()
119
- pl.set_reader(CoNLL03Reader())
120
- pl.add(CoNLLNERPredictor(), config = config)
121
-
122
- pl.initialize()
123
-
124
- for pack in pl.process_dataset(config.config_data.test_path):
125
- for pred_sentence in pack.get_data(context_type = Sentence, request = {Token: {" fields" : [" ner" ]}}):
126
- print (" ============================" )
127
- print (pred_sentence[" context" ])
128
- print (" The entities are..." )
129
- print (pred_sentence[" Token" ][" ner" ])
130
- print (" ============================" )
131
-
132
- ```
133
-
134
- Find more examples [ here] ( ./examples ) .
135
-
136
- ### Download and Installation
137
-
138
- To install the released version from PyPI:
139
- ``` bash
140
- pip install forte
141
- ```
142
-
143
- To install from source,
144
- ``` bash
145
- git clone https://github.com/asyml/forte.git
146
- cd forte
147
- pip install .
148
- ```
149
-
150
151
### Getting Started
151
152
152
153
* [ Examples] ( ./examples )
153
154
* [ Documentation] ( https://asyml-forte.readthedocs.io/ )
154
- * Currently we are working on some interesting [ tutorials] ( https://github.com/asyml/forte/wiki )
155
+ * Currently we are working on some
156
+ interesting [ tutorials] ( https://github.com/asyml/forte/wiki )
155
157
156
158
### Trouble Shooting
159
+
157
160
1 . If you try to run ` generate_ontology ` script but encounter the following
158
161
```
159
162
Traceback (most recent call last):
@@ -167,18 +170,23 @@ pip install .
167
170
raise PackageNotFoundError(name)
168
171
importlib_metadata.PackageNotFoundError: forte
169
172
```
170
- This is likely to be caused by multiple conflicting installation, such as
171
- installing both from source or from PIP. One way to solve this is to manually
172
- remove the script `~/anaconda3/bin/generate_ontology` and re-install the package.
173
+ This is likely to be caused by multiple conflicting installation, such as
174
+ installing both from source or from PIP. One way to solve this is to manually
175
+ remove the script `~/anaconda3/bin/generate_ontology` and re-install the
176
+ package.
173
177
174
178
### Contributing
175
- If you are interested in making enhancement to Forte, please first go over our [Code of Conduct](https://github.com/asyml/forte/blob/master/CODE_OF_CONDUCT.md) and [Contribution Guideline](https://github.com/asyml/forte/blob/master/CONTRIBUTING.md)
179
+
180
+ If you are interested in making enhancement to Forte, please first go over
181
+ our [Code of Conduct](https://github.com/asyml/forte/blob/master/CODE_OF_CONDUCT.md)
182
+ and [Contribution Guideline](https://github.com/asyml/forte/blob/master/CONTRIBUTING.md)
176
183
177
184
### License
178
185
179
186
[Apache License 2.0](./LICENSE)
180
187
181
188
### Companies and Universities Supporting Forte
189
+
182
190
<p float="left">
183
191
<img src="https://raw.githubusercontent.com/asyml/forte/master/docs/_static/img/Petuum.png" width="200" align="top">
184
192
0 commit comments