Skip to content

Commit e3ff6d3

Browse files
authored
Merge pull request #3 from ArgLab/main
Resource fix (#2)
2 parents 893c0af + 638073b commit e3ff6d3

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

41 files changed

+512
-53
lines changed

.gitignore

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
# Ignore those pesky files
2+
3+
__pycache__/
4+
*.egg-info/
5+
build/

README.md

Lines changed: 16 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -15,16 +15,18 @@ cleanup work for all clones/forks with unmerged changes.
1515
## Installation
1616

1717
1. Clone repo from Github
18-
1. `cd AWE_LanguageTool`
19-
1. `pip install -e .`
20-
1. NOTE: should this just be included in AWE Components? That's what we use to interface with the other NLP stuff
18+
2. `cd AWE_LanguageTool`
19+
3. `pip install -e .`
2120

22-
## Running
21+
## LanguageTool Configuration & Running
2322

24-
The system is ran using a client-server model.
25-
The server runs the relevant Java command to start the `languagetool-server.jar` file.
26-
This files comes from directly from the original Language Tool.
27-
The client handles wrapping the output and adding in additional error classification categories.
23+
The system is run using a client-server model. The server runs the relevant Java command to start the `languagetool-server.jar` file (which comes from LanaguageTool). The client handles wrapping the output and adding in additional error classification categories.
24+
25+
Before running LanguageTool, *you must first copy/write a server config file into* `awe_languagetool/LanguageTool5_5/`. We've provided a sample config file called `languagetool.tmp.cfg`. If you copy this file, you must rename it to `languagetool.cfg`.
26+
27+
By default, LT runs pretty slow with too many incoming requests; you can modify the server settings for LT in `awe_languagetool/LanguageTool5_5/languagetool.cfg`. See [this forum post](https://forum.languagetool.org/t/too-many-parallel-requests/8290/3) on a decent server config file. For a full description of all server config options, see [LT5.5 Source Code](https://github.com/languagetool-org/languagetool/blob/c6321ab5837a9e1ae5501d746f947f5706b4b274/languagetool-server/src/main/java/org/languagetool/server/HTTPServerConfig.java).
28+
29+
With the python LT wrapper, this can be run from anywhere in the project. However, if you decide to run the java command directly (see below), this needs to be run within the `awe_languagetool/LanguageTool5_5/` directory.
2830

2931
1. Start the server
3032

@@ -36,18 +38,21 @@ languagetoolServer.runServer()
3638
This can also be ran using directly using the Java command.
3739
Note that this command has not been fully tested with which directory it needs to be run from.
3840
If running this does not work, see the `languagetoolServer.py` file for more information about how the system is started.
41+
3942
```bash
40-
java -cp languagetool-server.jar org.languagetool.server.HTTPServer --port {port} --allow-origin "*"
43+
java -cp languagetool-server.jar org.languagetool.server.HTTPServer --config languagetool.cfg --port {port} --allow-origin "*"
4144
```
4245

4346
1. Connect the client (requires another terminal)
4447

4548
```python
4649
from awe_languagetool import languagetoolClient
50+
import asyncio
51+
4752
client = languagetoolClient.languagetoolClient()
4853
text_to_process = '...'
49-
output = client.summarizeText(text_to_process)
50-
# Example of output
54+
output = asyncio.run(client.summarizeText(text_to_process))
55+
# Example output
5156
# {
5257
# 'wordcounts': {
5358
# 'tokens': 0,

awe_languagetool/languagetoolClient.py

Lines changed: 27 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -56,31 +56,39 @@ class languagetoolClient:
5656
def __init__(self, port=8081):
5757

5858
self.port = port
59-
self.MAPPING_PATH = \
60-
resources.path('awe_languagetool',
61-
'languagetool_rulemapping.json')
62-
print(self.MAPPING_PATH)
6359

60+
# As of python 3.11, resources.path() is deprecated, and the following
61+
# code is the 'equivalent' replacement.
62+
path_context = resources.as_file(
63+
resources.files('awe_languagetool').joinpath('languagetool_rulemapping.json')
64+
)
65+
with path_context as out_path:
66+
self.MAPPING_PATH = str(out_path)
67+
print(str(self.MAPPING_PATH))
68+
69+
# The importlib.resources objects behave differently than standard paths
70+
# so it is necessary to adjust the code to use the resource context functions
71+
# rather than standard file functions.
6472
if not os.path.exists(self.MAPPING_PATH):
6573
raise mappingPathError(
6674
"Trying to load AWE Workbench Lexicon Module \
6775
without supporting datafiles"
6876
)
69-
fo = open(self.MAPPING_PATH, "r")
70-
jsonContent = fo.read()
71-
self.ruleInfo = json.loads(jsonContent)
72-
fo.close()
73-
for rule in self.ruleInfo:
74-
for subrule in self.ruleInfo[rule]:
75-
[cat, subcat] = self.ruleInfo[rule][subrule]
76-
if cleanstring(cat).title() not in self.categoryList:
77-
self.categoryList.append(cleanstring(cat).title())
78-
subcat_name = cleanstring(cat).title() + ': ' + cleanstring(subcat).title()
79-
if cleanstring(subcat_name) not in self.subcategoryList:
80-
self.subcategoryList.append(cleanstring(subcat_name))
81-
82-
self.categoryList.sort()
83-
self.subcategoryList.sort()
77+
78+
with open(self.MAPPING_PATH, "r") as fo:
79+
jsonContent = fo.read()
80+
self.ruleInfo = json.loads(jsonContent)
81+
for rule in self.ruleInfo:
82+
for subrule in self.ruleInfo[rule]:
83+
[cat, subcat] = self.ruleInfo[rule][subrule]
84+
if cleanstring(cat).title() not in self.categoryList:
85+
self.categoryList.append(cleanstring(cat).title())
86+
subcat_name = cleanstring(cat).title() + ': ' + cleanstring(subcat).title()
87+
if cleanstring(subcat_name) not in self.subcategoryList:
88+
self.subcategoryList.append(cleanstring(subcat_name))
89+
90+
self.categoryList.sort()
91+
self.subcategoryList.sort()
8492

8593
def make_printable(self, s):
8694
"""Replace non-printable characters in a string."""

awe_languagetool/languagetoolServer.py

Lines changed: 16 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -16,39 +16,34 @@
1616
from importlib import resources
1717

1818

19-
def runServer(fileName=None, port=8081):
19+
def runServer(fileName=None, port=8081, config_file="languagetool.cfg"):
2020
'''
2121
Runs the LanguageTool server, using `importlib.resources` to find the
2222
jar file.
2323
'''
24-
# In order for python 3.9 to work we have to make a slight hack to the
25-
# language tools module in order to ensure that this works. To do that
26-
# we first import the tool and then set the origin explicitly.
27-
#
28-
# This cheap hack does just that by using the submodule_search_location
29-
# value which *does* seem to be set by default to supply the location
30-
# for origin. Having done that we can then go about the rest of it
31-
# without error.
32-
import platform
33-
if (platform.python_version()[0:3] == "3.9"):
34-
import awe_languagetool.LanguageTool5_5
35-
LTSpec = awe_languagetool.LanguageTool5_5.__spec__
36-
LTSpec.origin = LTSpec.submodule_search_locations[0]
37-
38-
with resources.path('awe_languagetool.LanguageTool5_5',
39-
'languagetool-server.jar') as LANGUAGE_TOOL_PATH:
24+
# NOTE: after playing with python3.9, it does not like the 'package.sub' string.
25+
# So, I added multiple 'joinpaths'; this worked for both 3.9 and 3.11
26+
with resources.as_file(
27+
resources.files('awe_languagetool').joinpath('LanguageTool5_5').joinpath('languagetool-server.jar')
28+
) as LANGUAGE_TOOL_PATH:
29+
print("Setting Language Path:", LANGUAGE_TOOL_PATH)
4030
MAPPING_PATH = os.path.dirname(LANGUAGE_TOOL_PATH)
4131

42-
4332
try:
4433
os.chdir(MAPPING_PATH)
34+
print("Changed Dir to {}".format(MAPPING_PATH))
4535
except FileNotFoundError:
4636
print("Path not found starting LanguageTool: ", MAPPING_PATH)
4737
raise
4838

49-
language_tool_command = f"java -cp languagetool-server.jar \
50-
org.languagetool.server.HTTPServer \
51-
--port {port} --allow-origin \"*\""
39+
if not config_file:
40+
language_tool_command = f"java -cp languagetool-server.jar \
41+
org.languagetool.server.HTTPServer \
42+
--port {port} --allow-origin \"*\""
43+
else:
44+
language_tool_command = f"java -cp languagetool-server.jar \
45+
org.languagetool.server.HTTPServer \
46+
--config {config_file} --port {port} --allow-origin \"*\""
5247

5348
runner = subprocess.Popen(language_tool_command, shell=True)
5449
if not runner:

awe_languagetool/setup/data.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -47,8 +47,9 @@ def extra_build_commands(develop=False, install=False):
4747
# approach. #
4848
#################################################################
4949

50-
dir_name = \
51-
resources.path('awe_languagetool', '')
50+
# Since python 3.11, we cannot use 'path', but this:
51+
with resources.as_file(resources.files('awe_languagetool')) as path:
52+
dir_name = str(path)
5253

5354
extension = ".zip"
5455
os.makedirs(dir_name, exist_ok=True)

languagetool.tmp.cfg

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
# This config file comes from LT5.5 forums https://forum.languagetool.org/t/languagetool-5-5-slower-than-4-5/7396
2+
# The full list of variables which you can modify in this config file is described in their github repo:
3+
# languagetool/languagetool-server/src/main/java/org/languagetool/server/HTTPServerConfig.java
4+
maxTextLength=50000
5+
cacheSize=3000
6+
pipelineCaching=true
7+
maxPipelinePoolSize=500
8+
pipelineExpireTimeInSeconds=3600
9+
pipelinePrewarming=true
10+
11+
# maxCheckThreads=6 and maxWorkQueueSize=100 are good values for a server with 8 cores. For more cores, increase the values accordingly.
12+
maxCheckThreads=3
13+
maxWorkQueueSize=50

tests/__init__.py

Whitespace-only changes.

tests/test_languagetoolClient.py

Lines changed: 146 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,146 @@
1+
# --- [ Test: languagetoolClient.py ] ----------------------------------------------
2+
#
3+
# Set of "sanity tests" for languagetool client. Ideally, these should be run on
4+
# a fresh install to ensure that methods like 'path()' are working for the
5+
# current version of python.
6+
#
7+
# Author: Caleb Scott (cwscott3@ncsu.edu)
8+
# ----------------------------------------------------------------------------------
9+
10+
# --- [ IMPORTS ] ------------------------------------------------------------------
11+
12+
import unittest
13+
import asyncio
14+
import time
15+
16+
from awe_languagetool import languagetoolClient
17+
from awe_languagetool.languagetoolClient import mappingPathError
18+
19+
# --- [ CONSTS ] -------------------------------------------------------------------
20+
21+
# This describes the desired mapping path pointing to a rulemapping.json file
22+
EXPECTED_RULEMAP_PATH = "awe_languagetool/languagetool_rulemapping.json"
23+
24+
TEXT_BASE = "test_texts/"
25+
LION_TEXT = TEXT_BASE + "lion.txt"
26+
27+
CENSORSHIP_TEXTS = [
28+
"censorship1.txt",
29+
"censorship2.txt",
30+
"censorship3.txt",
31+
"censorship4.txt",
32+
"censorship5.txt",
33+
"censorship6.txt",
34+
"censorship7.txt",
35+
"censorship8.txt",
36+
"censorship9.txt",
37+
"censorship10.txt",
38+
"censorship11.txt",
39+
"censorship12.txt",
40+
"censorship13.txt",
41+
"censorship14.txt",
42+
"censorship15.txt",
43+
"censorship16.txt",
44+
"censorship17.txt",
45+
"censorship18.txt",
46+
"censorship19.txt",
47+
"censorship20.txt",
48+
"censorship21.txt",
49+
"censorship22.txt",
50+
"censorship23.txt",
51+
"censorship24.txt",
52+
"censorship25.txt"
53+
]
54+
55+
# --- [ SETUP ] --------------------------------------------------------------------
56+
57+
# Make sure you are running the server before running client tests.
58+
# NOTE: you must ensure that the java server is not already running; otherwise,
59+
# this setup step will fail saying that the port is already in use.
60+
61+
# languagetoolServer.runServer()
62+
63+
# --- [ CLASSES ] ------------------------------------------------------------------
64+
65+
class LanguageToolClientTest(unittest.TestCase):
66+
67+
def test_client_init(self):
68+
"""
69+
Basic test to see if the client properly aligns its paths
70+
for files in its package.
71+
"""
72+
try:
73+
new_client = languagetoolClient.languagetoolClient()
74+
self.assertTrue(
75+
new_client.MAPPING_PATH.endswith(EXPECTED_RULEMAP_PATH)
76+
)
77+
except mappingPathError as e:
78+
self.fail()
79+
80+
def test_client_timing_java_single(self):
81+
"""
82+
Attempt to start up a client, and pass in a sample text.
83+
We are measuring runtime of how well the server responds to a
84+
single request.
85+
86+
Precondition: languagetoolServer is running.
87+
"""
88+
try:
89+
client = languagetoolClient.languagetoolClient()
90+
with open(LION_TEXT, 'r') as text_file:
91+
92+
# Grab the text
93+
sample_text = text_file.read()
94+
95+
# Start timing benchmark
96+
start = time.time()
97+
output = asyncio.run(client.summarizeText(sample_text))
98+
end = time.time()
99+
100+
# Show results
101+
print()
102+
print("---------[ TIMING BENCHMARK ]---------")
103+
print(f"Time Elapsed: {end - start}")
104+
print("--------------------------------------")
105+
except mappingPathError as e:
106+
self.fail()
107+
108+
def test_client_timing_java_multiple_sequential(self):
109+
"""
110+
Attempt to start up a client, and pass in a sample text.
111+
We are measuring runtime of how well the server responds to
112+
many sequential requests.
113+
114+
Precondition: languagetoolServer is running.
115+
"""
116+
try:
117+
client = languagetoolClient.languagetoolClient()
118+
119+
# Pre-load all texts
120+
all_texts = {}
121+
index = 0
122+
for text_filename in CENSORSHIP_TEXTS:
123+
with open(TEXT_BASE + text_filename) as censorship_file:
124+
all_texts[index] = censorship_file.read()
125+
index = index + 1
126+
127+
# Make multiple requests
128+
start = time.time()
129+
output = asyncio.run(
130+
client.summarizeMultipleTexts(
131+
list(all_texts.keys()),
132+
list(all_texts.values())
133+
)
134+
)
135+
end = time.time()
136+
137+
# Show results
138+
print()
139+
print("---------[ TIMING BENCHMARK ]---------")
140+
print(f"Time Elapsed: {end - start}")
141+
print("--------------------------------------")
142+
143+
except mappingPathError as e:
144+
self.fail()
145+
146+
# --- [ END ] ----------------------------------------------------------------------

tests/test_languagetoolServer.py

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
# --- [ Test: languagetoolServer.py ] ----------------------------------------------
2+
#
3+
# Set of "sanity tests" for languagetool server. Ideally, these should be run on
4+
# a fresh install to ensure that methods like 'path()' are working for the
5+
# current version of python.
6+
#
7+
# Author: Caleb Scott (cwscott3@ncsu.edu)
8+
# ----------------------------------------------------------------------------------
9+
10+
# --- [ IMPORTS ] ------------------------------------------------------------------
11+
12+
import unittest
13+
14+
from awe_languagetool import languagetoolServer
15+
16+
# --- [ CONSTS ] -------------------------------------------------------------------
17+
18+
# This describes the desired mapping path directory (where languagetool jar is)
19+
EXPECTED_MAPPING_PATH = "awe_languagetool/LanguageTool5_5"
20+
21+
# This describes the full path to the language tool jar file
22+
EXPECTED_LANGTOOL_PATH = "awe_languagetool/LanguageTool5_5/languagetool-server.jar"
23+
24+
# --- [ SETUP ] --------------------------------------------------------------------
25+
26+
# --- [ CLASSES ] ------------------------------------------------------------------
27+
28+
class LanguageToolServerTest(unittest.TestCase):
29+
30+
def test_client_init(self):
31+
try:
32+
languagetoolServer.runServer()
33+
except FileNotFoundError as e:
34+
self.fail()
35+
except ChildProcessError as e:
36+
self.fail()
37+
38+
# --- [ END ] ----------------------------------------------------------------------

tests/test_texts/censorship1.txt

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
I do not believe that censorship should be an option for people that find a book, magazine, or movie offensive. I believe that if a person decides to take a book off the shelf because he/she is offended, they are obviously paranoid. If a few cuss words are found in a book and a person doesn't want their children to read that book, then don't allow them to get it. Don't make all of the other people suffer. Censorship is an unnecessary solution to something that isn't even a problem. You know what they say, 'If it ain't broken, don't fix it.'
2+
3+
First of all, people have a variety of opinions. Some people think that violence is ok and some people don't think it's ok. If one person decides to put a book off the shelf because they find it offensive, what will the next person think? They may find violence perfectly acceptable and they might allow their kids to read the book. Then it would create a big conflict.
4+
5+
Lastly, parents need to allow their kids to grow up and mature. They also need their kids to know the difference between right and wrong. If they don't read books or movies with violence or cuss words when they're young, what will they do when they discover it when they're older? As a child, it is a time to learn life lessons and begin to form clear distinctions between right and wrong. Shielding your children from it will only hinder their progress in figuring things out and learning to think for themselves as adults (Paterson, 2009).
6+
7+
Censorship is the same thing that Hitler did to prevent people to think for themselves and discover that his tactics were a great evil to society. In other words, censorship is part of fascism and should not even be considered in American society. The citizens of the U.S. have the responsibility of exposing the truth and teaching our children the differences between right and wrong. It's one of the ideals that help make this country great

0 commit comments

Comments
 (0)