(C) 2019-2020 by Damir Cavar, Oren Baldinger, Joshua Herring, Anurag Kumar, Aarushi Bisht, Jagpreet Chawla, Semiring Inc.
The JSON-NLP Schema is a standard middle-ware type for the transfer of NLP output from various NLP pipelines and chains to a uniform JSON format.
This project provides the JSON-NLP specification, a proposed JavaScript Object Notation (JSON) format and JSON Schema for Natural Language Processing (NLP) annotations and example use-cases for common NLP pipelines like spaCy, Stanford CoreNLP, OpenNLP, LingPipe, and so on.
The proposed schema provides an abstraction and normalization layer, and a uniform API for NLP annotations generated by various heterogeneous NLP pipelines and components.
An exhaustive list of major NLP components is provided in the whitepaper (Cavar et al., to appear 2019, see on Arxiv) that describes the different data structures that these components can generate. For the majority of these components, the output conversion to JSON-NLP is explained and discussed in the whitepaper. In addition to a detailed description of the JSON-NLP schema properties, various JSON-NLP examples are provided and discussed.
The JSON-NLP schema provides a wider annotation framework than most common NLP-pipelines might need. There are various related projects that will explain the advanced properties of the JSON-NLP format.
Standards used:
JSON-NLP is accompanied by numerous tools and technologies. There are wrappers in Python, Java, Go, and many other languages. See for more details:
- Go JSON-NLP module
- Python JSON-NLP module
- NLP API example pipeline in Python
- NLP example code
- spaCy JSON-NLP wrapper
- Flair JSON-NLP wrapper
- Xrenner JSON-NLP wrapper
- Polyglot JSON-NLP wrapper
- NLTK JSON-NLP wrapper
- Stanford CoreNLP JSON-NLP wrapper and Java code
There is also a JSON-NLP Visualizer:
For licensing details see the LICENSE file. The JSON-NLP code and specification are published under the Apache License Version 2.0.
To cite this work, refer to the paper:
Damir Cavar, Oren Baldinger, Joshua Herring, Umang Mehta, Yiwen Zhang, Shantanu Bedekar, Shreejith Panicker (2019) An Annotation Encoding Schema for Natural Language Processing using JSON: NLP JSON Schema Version 0.1. Technical Report, NLP Lab, Indiana University, Version 1.0 from November 2018.
(Link to paper is coming soon)
This code, documentation, and examples are brought to you by the NLP-Lab.org and Semiring.
(C) 2018-2020 The Semiring