Skip to content

Commit 812c5da

Browse files
committed
Merge remote-tracking branch 'origin/bug544fix' into bug544fix
2 parents dce6a6f + 5514dd8 commit 812c5da

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

52 files changed

+1431
-382
lines changed

.github/workflows/main.yml

+3-3
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@ jobs:
4848
sudo apt-get install -y libsndfile1-dev
4949
python -m pip install --progress-bar off --upgrade pip
5050
pip install --progress-bar off Django django-guardian
51-
pip install --progress-bar off pylint==2.10.2 flake8==3.9.2 mypy==0.910 pytest==5.1.3 black==20.8b1
51+
pip install --progress-bar off pylint==2.10.2 flake8==3.9.2 mypy==0.931 pytest==5.1.3 black==20.8b1
5252
pip install --progress-bar off types-PyYAML==5.4.8 types-typed-ast==1.4.4 types-requests==2.25.6 types-dataclasses==0.1.7
5353
pip install --progress-bar off coverage codecov
5454
- name: Format check with Black
@@ -91,9 +91,9 @@ jobs:
9191
- name: Lint with pylint
9292
run: |
9393
pylint forte/
94-
- name: Lint main code with mypy when torch version is not 1.5.0
94+
- name: Lint main code with mypy when torch version is not 1.5.0 and python is 3.9
9595
run: |
96-
if [[ ${{ matrix.torch-version }} != "1.5.0" ]]; then mypy forte; fi
96+
if [[ ${{ matrix.torch-version }} != "1.5.0" && ${{ matrix.python-version }} == "3.9" ]]; then mypy forte; fi
9797
- name: Test with pytest and run coverage
9898
run: |
9999
coverage run -m pytest tests

.pre-commit-config.yaml

+14
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
# See https://pre-commit.com for more information
2+
# See https://pre-commit.com/hooks.html for more hooks
3+
repos:
4+
- repo: https://github.com/pre-commit/pre-commit-hooks
5+
rev: v3.2.0
6+
hooks:
7+
- id: trailing-whitespace
8+
- id: end-of-file-fixer
9+
- id: check-yaml
10+
- id: check-added-large-files
11+
- repo: https://github.com/psf/black
12+
rev: 20.8b1
13+
hooks:
14+
- id: black

CONTRIBUTING.md

+3-1
Original file line numberDiff line numberDiff line change
@@ -127,6 +127,8 @@ the [Google Python Style guide](http://google.github.io/styleguide/pyguide.html)
127127
project code is examined using `pylint`, `flake8`, `mypy`, `black` and `sphinx-build` which will be run
128128
automatically in CI. It's recommended that you should run these tests locally before submitting your pull request to save time. Refer to the github workflow [here](https://github.com/asyml/forte/blob/master/.github/workflows/main.yml) for detailed steps to carry out the tests. Basically what you need to do is to install the requirements (check out the `Install dependencies` sections) and run the commands (refer to the steps in `Format check with Black`, `Lint with flake8`, `Lint with pylint`, `Lint main code with mypy when torch version is not 1.5.0`, `Build Docs`, etc.).
129129

130+
We also recommend using tools `pre-commit` that automates the checking process before each commit since checking format is a repetitive process. We have the configuration file `.pre-commit-config.yaml` that lists several plugins including `black` to check format in the project root folder. Developers only need to install the package by `pip install pre-commit`.
131+
130132
### Docstring
131133

132134
All public methods require docstring and type annotation. It is recommended to add docstring for all functions. The docstrings should follow the [`Comments and Docstrings` section](https://google.github.io/styleguide/pyguide.html#38-comments-and-docstrings) of Google Python Style guide. We will include a pylint plugin called [docparams](https://github.com/PyCQA/pylint/blob/main/pylint/extensions/docparams.rst) to validate the parameters of docstrings:
@@ -136,7 +138,7 @@ automatically in CI. It's recommended that you should run these tests locally be
136138

137139
You should take special care of the indentations in your documentation. Make sure the indents are consistent and follow the Google Style guide. All sections other than the heading should maintain a hanging indent of two or four spaces. Refer to the examples [here](https://google.github.io/styleguide/pyguide.html#383-functions-and-methods) for what is expected and what are the requirements for different sections like `args`, `lists`, `returns`, etc. Invalid indentations might trigger errors in `sphinx-build` and will cause confusing rendering of the documentation. You can run `sphinx-build` locally to see whether the generated docs look reasonable.
138140

139-
Another aspect that should be noted is the format of links or cross-references of python objects. Make sure to follow the [sphinx cross-referencing syntax](https://www.sphinx-doc.org/en/master/usage/restructuredtext/roles.html#xref-syntax). The references will be checked by [sphinx-build nit-picky mode](https://www.sphinx-doc.org/en/master/man/sphinx-build.html#cmdoption-sphinx-build-n) which raises warnings for all the missing and unresolvable links.
141+
Another aspect that should be noted is the format of links or cross-references of python objects. Make sure to follow the [sphinx cross-referencing syntax](https://www.sphinx-doc.org/en/master/usage/restructuredtext/roles.html#xref-syntax). The references will be checked by [sphinx-build nit-picky mode](https://www.sphinx-doc.org/en/master/man/sphinx-build.html#cmdoption-sphinx-build-n) which raises warnings for all the missing and unresolvable links.
140142

141143
### Git Commit Style
142144

docs/conf.py

+9-9
Original file line numberDiff line numberDiff line change
@@ -61,19 +61,19 @@
6161
master_doc = "index"
6262

6363
# General information about the project.
64-
project = u"Forte"
65-
copyright = u"2019, Forte"
66-
author = u"Forte"
64+
project = "Forte"
65+
copyright = "2019, Forte"
66+
author = "Forte"
6767

6868
# The version info for the project you're documenting, acts as replacement for
6969
# |version| and |release|, also used in various other places throughout the
7070
# built documents.
7171
#
7272
# The short X.Y version.
7373
# version = u'{}'.format(__version_short__)
74-
version = u"{}".format(__version__)
74+
version = "{}".format(__version__)
7575
# The full version, including alpha/beta/rc tags.
76-
release = u"{}".format(__version__)
76+
release = "{}".format(__version__)
7777

7878
# The language for content autogenerated by Sphinx. Refer to documentation
7979
# for a list of supported languages.
@@ -145,7 +145,7 @@
145145

146146
# The name for this set of Sphinx documents.
147147
# "<project> v<release> documentation" by default.
148-
html_title = u"Forte v0.1"
148+
html_title = "Forte v0.1"
149149

150150
# A shorter title for the navigation bar. Default is the same as html_title.
151151
# html_short_title = None
@@ -253,7 +253,7 @@
253253
# (source start file, target name, title,
254254
# author, documentclass [howto, manual, or own class]).
255255
latex_documents = [
256-
(master_doc, "forte.tex", u"Forte Documentation", u"Forte", "manual"),
256+
(master_doc, "forte.tex", "Forte Documentation", "Forte", "manual"),
257257
]
258258

259259
# The name of an image file (relative to this directory) to place at the top of
@@ -281,7 +281,7 @@
281281

282282
# One entry per manual page. List of tuples
283283
# (source start file, name, description, authors, manual section).
284-
man_pages = [(master_doc, "forte", u"Forte Documentation", [author], 1)]
284+
man_pages = [(master_doc, "forte", "Forte Documentation", [author], 1)]
285285

286286
# If true, show URL addresses after external links.
287287
# man_show_urls = False
@@ -296,7 +296,7 @@
296296
(
297297
master_doc,
298298
"forte",
299-
u"Forte Documentation",
299+
"Forte Documentation",
300300
author,
301301
"Forte",
302302
"One line description of project.",

docs/index.rst

+139-16
Original file line numberDiff line numberDiff line change
@@ -1,36 +1,159 @@
11
Welcome to Forte's documentation!
22
******************************************
3+
This outline is currently **in progress** so many sections are empty.
34

5+
6+
Overview
7+
====================
8+
9+
**Forte** is a toolkit for building Natural Language Processing pipelines, featuring cross-task
10+
interaction, adaptable data-model interfaces and many more. It provides a platform to assemble
11+
state-of-the-art NLP and ML technologies in a highly-composable fashion, including a wide
12+
spectrum of tasks ranging from Information Retrieval, Natural Language Understanding to Natural
13+
Language Generation.
14+
15+
With Forte, it is extremely simple to build an integrated system that can search documents,
16+
analyze and extract information and generate language all in one place. This allows the developer
17+
to fully utilize and combine the strength and results from each step, and allow the system to
18+
make fully informed decision at the end of the pipeline.
19+
20+
While it is quite easy to combine arbitrary 3rd party tools (Check out these `examples <index_appendices.html>`_ !),
21+
Forte also brings technology to you by supporting deep learning via Texar, and by providing a convenient
22+
model data interface that allows user to cast tasks to models.
23+
24+
25+
Core Design Principles
26+
------------------------
27+
28+
29+
The core design principle of Forte is the abstraction of NLP concepts and machine learning models,
30+
which provides better separation between data, model and tasks, but enables interactions
31+
between different components of the pipeline. Based on this, we make Forte:
32+
33+
* **Composable**: Forte helps users to decompose a problem into *data*, *models* and *tasks*. The tasks can further be divided into sub-tasks. A complex use case can be solved by composing heterogeneous modules via straightforward python APIs or declarative configuration files. The components (e.g. models or tasks) in the pipeline can be flexibly swapped in and out, as long as the API contracts are matched. The approach greatly improves module reusability, enables fast development and makes the library flexible for user needs.
34+
35+
* **Generalizable and Extensible**: Forte promotes generalization to support not only a wide range of NLP tasks, but also extensible for new tasks or new domains. In particular, Forte provides the *Ontology* system that helps users define types according to their tasks. Users can simply specify the type declaratively through JSON files. Our Code Generation tool will automatically generate python files ready to be used into your project. Check out our `Ontology Generation documentation <toc/ontology_generation.html>`_ for more details.
36+
37+
* **Transparent Data Flow**: Central to Forte's composable architecture is a universal data format that supports seamless data flow between different steps. Forte advocates a transparent data flow to facilitate flexible process intervention and simple pipeline control. Combined with the general data format, Forte makes a perfect tool for data inspection, component swapping and result sharing. This is particularly helpful during team collaborations!
38+
39+
.. image:: _static/img/forte_arch.png
40+
41+
.. image:: _static/img/forte_results.png
42+
43+
Package Overview
44+
-----------------
45+
.. list-table:: Title
46+
:widths: 25 75
47+
:header-rows: 1
48+
49+
* - Package Name
50+
- Package Description
51+
* - :class:`~forte`
52+
- an open-source toolkit for NLP
53+
* - :class:`~forte.data.readers`
54+
- a data module for reading different formats of text data like CoNLL, Ontonotes etc
55+
* - :class:`~forte.processors`
56+
- a collection of processors for building NLP pipelines
57+
* - :class:`~forte.trainer`
58+
- a collection of modules for training different NLP tasks
59+
* - :class:`~ft.onto.base_ontology`
60+
- a module containing basic ontologies like Token, Sentence, Document etc
61+
62+
63+
64+
Library API example
65+
--------------------
66+
A simple code example that runs Named Entity Recognizer
67+
68+
.. code-block:: python
69+
70+
import yaml
71+
72+
from forte.pipeline import Pipeline
73+
from forte.data.readers import CoNLL03Reader
74+
from forte.processors.nlp import CoNLLNERPredictor
75+
from ft.onto.base_ontology import Token, Sentence
76+
from forte.common.configuration import Config
77+
78+
79+
config_data = yaml.safe_load(open("config_data.yml", "r"))
80+
config_model = yaml.safe_load(open("config_model.yml", "r"))
81+
82+
config = Config({}, default_hparams=None)
83+
config.add_hparam('config_data', config_data)
84+
config.add_hparam('config_model', config_model)
85+
86+
87+
pl = Pipeline()
88+
pl.set_reader(CoNLL03Reader())
89+
pl.add(CoNLLNERPredictor(), config=config)
90+
91+
pl.initialize()
92+
93+
for pack in pl.process_dataset(config.config_data.test_path):
94+
for pred_sentence in pack.get_data(context_type=Sentence, request={Token: {"fields": ["ner"]}}):
95+
print("============================")
96+
print(pred_sentence["context"])
97+
print("The entities are...")
98+
print(pred_sentence["Token"]["ner"])
99+
print("============================")
100+
101+
102+
103+
Many more examples are available `here <index_appendices.html>`_. We are also working assembling some
104+
interesting `tutorials <https://github.com/asyml/forte/wiki>`_
105+
106+
107+
Download and Installation
108+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
109+
Download the repository through
110+
111+
```bash
112+
git clone https://github.com/asyml/forte.git
113+
```
114+
115+
After `cd` into `forte`, you can install it through
116+
117+
```bash
118+
pip install .
119+
```
120+
121+
122+
License
123+
~~~~~~~~~
124+
125+
`Apache License 2.0 <https://github.com/asyml/forte/blob/master/LICENSE>`_
126+
127+
128+
----------------
129+
130+
131+
Overview
132+
====================
4133
.. toctree::
5134
:maxdepth: 2
6-
7-
outline.md
135+
136+
toc/overview.md
8137

9138

139+
140+
NLP with Forte
141+
====================
10142
.. toctree::
11143
:maxdepth: 2
12144

13-
tutorial/get_started.md
145+
index_toc.rst
14146

147+
APPENDICES
148+
===========
15149
.. toctree::
16150
:maxdepth: 2
17151

18-
tutorial/examples.md
19-
tutorial/ontology_generation.md
20-
tutorial/audio_processing.md
21-
tutorial/data_pack.md
152+
index_appendices.rst
22153

23154
API
24155
====
25-
26156
.. toctree::
27157
:maxdepth: 2
28158

29-
code/common.rst
30-
code/data.rst
31-
code/pipeline.rst
32-
code/processors.rst
33-
code/models.rst
34-
code/training_system.rst
35-
code/data_aug.rst
36-
code/vocabulary.rst
159+
index_api.rst

docs/index_api.rst

+25
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
.. role:: hidden
2+
:class: hidden-section
3+
4+
API
5+
*********************
6+
7+
.. toctree::
8+
:maxdepth: 2
9+
10+
11+
code/common.rst
12+
13+
code/data.rst
14+
15+
code/pipeline.rst
16+
17+
code/processors.rst
18+
19+
code/models.rst
20+
21+
code/training_system.rst
22+
23+
code/data_aug.rst
24+
25+
code/vocabulary.rst

docs/index_appendices.rst

+54
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
.. role:: hidden
2+
:class: hidden-section
3+
4+
APPENDICES
5+
*********************
6+
7+
8+
Glossary
9+
============
10+
.. toctree::
11+
:maxdepth: 2
12+
13+
* DataPack: a data class that stores structured data and supports efficient data retrieval.
14+
- `DataPack Example <https://github.com/asyml/forte/blob/master/docs/tutorial/handling_structued_data.ipynb>`_
15+
- API: :class:`~forte.data.data_pack.DataPack`
16+
17+
* Pipeline: an inference system that contains a set of processing components.
18+
- `Pipeline Example <https://github.com/asyml/forte/tree/master/examples/pipelines>`_
19+
- API: :class:`~forte.pipeline.Pipeline`
20+
21+
* Ontology: a system that defines the relations between NLP annotations, for example, the relation between words and documents, or between two words.
22+
- `An ontology Example <https://github.com/asyml/forte/tree/master/examples/ontology>`_
23+
- `An ontology tutorial <https://github.com/asyml/forte/blob/0c1dec1311f27eae150287a8aa405632b265e03e/docs/tutorial/ontology_generation.md>`_
24+
25+
26+
27+
.. rst-class:: page-break
28+
29+
30+
Examples
31+
==========
32+
.. toctree::
33+
:maxdepth: 2
34+
35+
Rich examples are included to demonstrate the use of Forte, including
36+
implementation of cutting-edge models/algorithms and system construction.
37+
38+
More examples are continuously added...
39+
40+
41+
* `Data Reading: Showcasing how to read structured data. <https://github.com/asyml/forte/tree/master/examples/wiki_parser>`_
42+
* `Serialization: Showcasing how to serialize and deserialize data. <https://github.com/asyml/forte/tree/master/examples/serialization>`_
43+
* `NER: Train a LSTM-CRF named entity recognizer. <https://github.com/asyml/forte/tree/master/examples/ner>`_
44+
* `BERT Passage Reranker <https://github.com/asyml/forte/tree/master/examples/passage_reranker>`_
45+
* `Chat Bot: This example showcases the use of Forte to build a retrieval-based chatbot and perform text analysis on the retrieved results. <https://github.com/asyml/forte/tree/master/examples/chatbot>`_
46+
* `Audio Reading: a simple speech processing example here to showcase forte's capability to support a wide range of audio processing tasks. <https://github.com/asyml/forte/tree/master/examples/audio>`_
47+
* `Classification: a text classification example that support various format of table-like dataset <https://github.com/asyml/forte/tree/master/examples/classification>`_
48+
* `Clinical Pipeline: a project handling clinical datasets shows how to make Forte and Stave work side by side. <https://github.com/asyml/forte/tree/master/examples/clinical_pipeline>`_
49+
* `Content Rewriter: a example which rewrites the sentence based on the table given a table and a sentence. <https://github.com/asyml/forte/tree/master/examples/content_rewriter>`_
50+
* `Data Augmentation: this example demonstrates the usage of forte/models/da_rl/MetaAugmentationWrapper, that wraps a BERT Masked Language Model data augmentation model to perform this RL adaptive learning with a BERT-based text classifier downstream model. <https://github.com/asyml/forte/tree/master/examples/data_augmentation>`_
51+
* `SRL: a semantic role labeling example <https://github.com/asyml/forte/tree/master/examples/srl>`_
52+
* `Tagging: an implementation of CNN-BiLSTM-CRF model, built on top of Texar and Pytorch <https://github.com/asyml/forte/tree/master/examples/tagging>`_
53+
* `Twitter sentiment analysis: this example show the use of Forte to perform sentiment analysis on the user's retrieved tweets <https://github.com/asyml/forte/tree/master/examples/twitter_sentiment_analysis>`_
54+
* `Visualization: visualize datapack data <https://github.com/asyml/forte/tree/master/examples/visualize>`_

0 commit comments

Comments
 (0)