|
1 | 1 | Welcome to Forte's documentation!
|
2 | 2 | ******************************************
|
| 3 | +This outline is currently **in progress** so many sections are empty. |
3 | 4 |
|
| 5 | + |
| 6 | +Overview |
| 7 | +==================== |
| 8 | + |
| 9 | +**Forte** is a toolkit for building Natural Language Processing pipelines, featuring cross-task |
| 10 | +interaction, adaptable data-model interfaces and many more. It provides a platform to assemble |
| 11 | +state-of-the-art NLP and ML technologies in a highly-composable fashion, including a wide |
| 12 | +spectrum of tasks ranging from Information Retrieval, Natural Language Understanding to Natural |
| 13 | +Language Generation. |
| 14 | + |
| 15 | +With Forte, it is extremely simple to build an integrated system that can search documents, |
| 16 | +analyze and extract information and generate language all in one place. This allows the developer |
| 17 | +to fully utilize and combine the strength and results from each step, and allow the system to |
| 18 | +make fully informed decision at the end of the pipeline. |
| 19 | + |
| 20 | +While it is quite easy to combine arbitrary 3rd party tools (Check out these `examples <index_appendices.html>`_ !), |
| 21 | +Forte also brings technology to you by supporting deep learning via Texar, and by providing a convenient |
| 22 | +model data interface that allows user to cast tasks to models. |
| 23 | + |
| 24 | + |
| 25 | +Core Design Principles |
| 26 | +------------------------ |
| 27 | + |
| 28 | + |
| 29 | +The core design principle of Forte is the abstraction of NLP concepts and machine learning models, |
| 30 | +which provides better separation between data, model and tasks, but enables interactions |
| 31 | +between different components of the pipeline. Based on this, we make Forte: |
| 32 | + |
| 33 | +* **Composable**: Forte helps users to decompose a problem into *data*, *models* and *tasks*. The tasks can further be divided into sub-tasks. A complex use case can be solved by composing heterogeneous modules via straightforward python APIs or declarative configuration files. The components (e.g. models or tasks) in the pipeline can be flexibly swapped in and out, as long as the API contracts are matched. The approach greatly improves module reusability, enables fast development and makes the library flexible for user needs. |
| 34 | + |
| 35 | +* **Generalizable and Extensible**: Forte promotes generalization to support not only a wide range of NLP tasks, but also extensible for new tasks or new domains. In particular, Forte provides the *Ontology* system that helps users define types according to their tasks. Users can simply specify the type declaratively through JSON files. Our Code Generation tool will automatically generate python files ready to be used into your project. Check out our `Ontology Generation documentation <toc/ontology_generation.html>`_ for more details. |
| 36 | + |
| 37 | +* **Transparent Data Flow**: Central to Forte's composable architecture is a universal data format that supports seamless data flow between different steps. Forte advocates a transparent data flow to facilitate flexible process intervention and simple pipeline control. Combined with the general data format, Forte makes a perfect tool for data inspection, component swapping and result sharing. This is particularly helpful during team collaborations! |
| 38 | + |
| 39 | +.. image:: _static/img/forte_arch.png |
| 40 | + |
| 41 | +.. image:: _static/img/forte_results.png |
| 42 | + |
| 43 | +Package Overview |
| 44 | +----------------- |
| 45 | +.. list-table:: Title |
| 46 | + :widths: 25 75 |
| 47 | + :header-rows: 1 |
| 48 | + |
| 49 | + * - Package Name |
| 50 | + - Package Description |
| 51 | + * - :class:`~forte` |
| 52 | + - an open-source toolkit for NLP |
| 53 | + * - :class:`~forte.data.readers` |
| 54 | + - a data module for reading different formats of text data like CoNLL, Ontonotes etc |
| 55 | + * - :class:`~forte.processors` |
| 56 | + - a collection of processors for building NLP pipelines |
| 57 | + * - :class:`~forte.trainer` |
| 58 | + - a collection of modules for training different NLP tasks |
| 59 | + * - :class:`~ft.onto.base_ontology` |
| 60 | + - a module containing basic ontologies like Token, Sentence, Document etc |
| 61 | + |
| 62 | + |
| 63 | + |
| 64 | +Library API example |
| 65 | +-------------------- |
| 66 | +A simple code example that runs Named Entity Recognizer |
| 67 | + |
| 68 | +.. code-block:: python |
| 69 | +
|
| 70 | + import yaml |
| 71 | +
|
| 72 | + from forte.pipeline import Pipeline |
| 73 | + from forte.data.readers import CoNLL03Reader |
| 74 | + from forte.processors.nlp import CoNLLNERPredictor |
| 75 | + from ft.onto.base_ontology import Token, Sentence |
| 76 | + from forte.common.configuration import Config |
| 77 | +
|
| 78 | +
|
| 79 | + config_data = yaml.safe_load(open("config_data.yml", "r")) |
| 80 | + config_model = yaml.safe_load(open("config_model.yml", "r")) |
| 81 | +
|
| 82 | + config = Config({}, default_hparams=None) |
| 83 | + config.add_hparam('config_data', config_data) |
| 84 | + config.add_hparam('config_model', config_model) |
| 85 | +
|
| 86 | +
|
| 87 | + pl = Pipeline() |
| 88 | + pl.set_reader(CoNLL03Reader()) |
| 89 | + pl.add(CoNLLNERPredictor(), config=config) |
| 90 | +
|
| 91 | + pl.initialize() |
| 92 | +
|
| 93 | + for pack in pl.process_dataset(config.config_data.test_path): |
| 94 | + for pred_sentence in pack.get_data(context_type=Sentence, request={Token: {"fields": ["ner"]}}): |
| 95 | + print("============================") |
| 96 | + print(pred_sentence["context"]) |
| 97 | + print("The entities are...") |
| 98 | + print(pred_sentence["Token"]["ner"]) |
| 99 | + print("============================") |
| 100 | +
|
| 101 | +
|
| 102 | +
|
| 103 | +Many more examples are available `here <index_appendices.html>`_. We are also working assembling some |
| 104 | +interesting `tutorials <https://github.com/asyml/forte/wiki>`_ |
| 105 | + |
| 106 | + |
| 107 | +Download and Installation |
| 108 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 109 | +Download the repository through |
| 110 | + |
| 111 | +```bash |
| 112 | +git clone https://github.com/asyml/forte.git |
| 113 | +``` |
| 114 | + |
| 115 | +After `cd` into `forte`, you can install it through |
| 116 | + |
| 117 | +```bash |
| 118 | +pip install . |
| 119 | +``` |
| 120 | + |
| 121 | + |
| 122 | +License |
| 123 | +~~~~~~~~~ |
| 124 | + |
| 125 | +`Apache License 2.0 <https://github.com/asyml/forte/blob/master/LICENSE>`_ |
| 126 | + |
| 127 | + |
| 128 | +---------------- |
| 129 | + |
| 130 | + |
| 131 | +Overview |
| 132 | +==================== |
4 | 133 | .. toctree::
|
5 | 134 | :maxdepth: 2
|
6 |
| - |
7 |
| - outline.md |
| 135 | + |
| 136 | + toc/overview.md |
8 | 137 |
|
9 | 138 |
|
| 139 | + |
| 140 | +NLP with Forte |
| 141 | +==================== |
10 | 142 | .. toctree::
|
11 | 143 | :maxdepth: 2
|
12 | 144 |
|
13 |
| - tutorial/get_started.md |
| 145 | + index_toc.rst |
14 | 146 |
|
| 147 | +APPENDICES |
| 148 | +=========== |
15 | 149 | .. toctree::
|
16 | 150 | :maxdepth: 2
|
17 | 151 |
|
18 |
| - tutorial/examples.md |
19 |
| - tutorial/ontology_generation.md |
20 |
| - tutorial/audio_processing.md |
21 |
| - tutorial/data_pack.md |
| 152 | + index_appendices.rst |
22 | 153 |
|
23 | 154 | API
|
24 | 155 | ====
|
25 |
| - |
26 | 156 | .. toctree::
|
27 | 157 | :maxdepth: 2
|
28 | 158 |
|
29 |
| - code/common.rst |
30 |
| - code/data.rst |
31 |
| - code/pipeline.rst |
32 |
| - code/processors.rst |
33 |
| - code/models.rst |
34 |
| - code/training_system.rst |
35 |
| - code/data_aug.rst |
36 |
| - code/vocabulary.rst |
| 159 | + index_api.rst |
0 commit comments