Skip to content

Commit 6755b5f

Browse files
author
Arun Tejasvi Chaganty
committed
Initialized with protobuf and tests
1 parent 6417157 commit 6755b5f

File tree

12 files changed

+3523
-0
lines changed

12 files changed

+3523
-0
lines changed

.travis.yml

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
# this file is *not* meant to cover or endorse the use of travis, but rather to
2+
# help confirm pull requests to this project.
3+
4+
language: python
5+
6+
env:
7+
- TOXENV=py27
8+
- TOXENV=py33
9+
- TOXENV=py34
10+
11+
install: pip install tox
12+
13+
script: tox
14+
15+
notifications:
16+
email: false

MANIFEST.in

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
# Include the license file
2+
include LICENSE.txt
3+
4+
# Include the data files
5+
recursive-include data *

README.rst

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
Stanford CoreNLP Python Bindings
2+
================================
3+
4+
This package contains python bindings for [Stanford
5+
CoreNLP](https://github.com/stanfordnlp/CoreNLP)'s protobuf
6+
specifications, as generated by `protoc`. These bindings can used to
7+
parse binary data produced by, e.g., the [Stanford CoreNLP
8+
server](https://stanfordnlp.github.io/CoreNLP/corenlp-server.html).
9+
10+
----
11+
12+
Usage::
13+
14+
from corenlp_protobuf import Document
15+
16+
# document.dat contains a serialized Document.
17+
with open('document.dat', 'r') as f:
18+
buffer = f.read()
19+
doc = Document()
20+
doc.ParseFromString(buffer)
21+
22+
# You can access the sentences from doc.sentence.
23+
sentence = doc.sentence[0]
24+
25+
# You can access any property within a sentence.
26+
print(sentence.text)
27+
28+
# Likewise for tokens
29+
token = sentence.token[0]
30+
print(token.lemma)

corenlp_protobuf/CoreNLP_pb2.py

Lines changed: 2681 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

corenlp_protobuf/__init__.py

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
from __future__ import absolute_import
2+
3+
from google.protobuf.internal.decoder import _DecodeVarint
4+
from .CoreNLP_pb2 import *
5+
6+
def parseFromDelimitedString(obj, buf, offset=0):
7+
"""
8+
Stanford CoreNLP uses the Java "writeDelimitedTo" function, which
9+
writes the size (and offset) of the buffer before writing the object.
10+
This function handles parsing this message starting from offset 0.
11+
12+
@returns how many bytes of @buf were consumed.
13+
"""
14+
size, pos = _DecodeVarint(buf, offset)
15+
obj.ParseFromString(buf[offset+pos:offset+pos+size])
16+
return pos+size
17+
18+
def to_text(sentence):
19+
"""
20+
Helper routine that converts a Sentence protobuf to a string from its tokens.
21+
"""
22+
text = ""
23+
for i, tok in enumerate(sentence.token):
24+
if i != 0:
25+
text += tok.before
26+
text += tok.word
27+
return text

0 commit comments

Comments
 (0)