Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 9 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,22 +15,18 @@ cleanup work for all clones/forks with unmerged changes.
## Installation

1. Clone repo from Github
1. `cd AWE_LanguageTool`
1. `pip install -e .`
1. NOTE: should this just be included in AWE Components? That's what we use to interface with the other NLP stuff
2. `cd AWE_LanguageTool`
3. `pip install -e .`

## Running
## LanguageTool Configuration & Running

The system is ran using a client-server model.
The server runs the relevant Java command to start the `languagetool-server.jar` file.
This files comes from directly from the original Language Tool.
The client handles wrapping the output and adding in additional error classification categories.
The system is run using a client-server model. The server runs the relevant Java command to start the `languagetool-server.jar` file (which comes from LanaguageTool). The client handles wrapping the output and adding in additional error classification categories.

### LanguageTool Configuration & Running
Before running LanguageTool, *you must first copy/write a server config file into* `awe_languagetool/LanguageTool5_5/`. We've provided a sample config file called `languagetool.tmp.cfg`. If you copy this file, you must rename it to `languagetool.cfg`.

With the python LT wrapper, this can be run from anywhere in the project. However, if you decide to run the java command directly (see below), this needs to be run within the `awe_languagetool/LanguageTool5_5/` directory.
By default, LT runs pretty slow with too many incoming requests; you can modify the server settings for LT in `awe_languagetool/LanguageTool5_5/languagetool.cfg`. See [this forum post](https://forum.languagetool.org/t/too-many-parallel-requests/8290/3) on a decent server config file. For a full description of all server config options, see [LT5.5 Source Code](https://github.com/languagetool-org/languagetool/blob/c6321ab5837a9e1ae5501d746f947f5706b4b274/languagetool-server/src/main/java/org/languagetool/server/HTTPServerConfig.java).

By default, LT runs pretty slow with too many incoming requests; you can modify the server settings for LT in `awe_languagetool/LanguageTool5_5/languagetool.cfg`. See [this forum post](https://forum.languagetool.org/t/too-many-parallel-requests/8290/3) on a decent server config file.
With the python LT wrapper, this can be run from anywhere in the project. However, if you decide to run the java command directly (see below), this needs to be run within the `awe_languagetool/LanguageTool5_5/` directory.

1. Start the server

Expand All @@ -42,6 +38,7 @@ languagetoolServer.runServer()
This can also be ran using directly using the Java command.
Note that this command has not been fully tested with which directory it needs to be run from.
If running this does not work, see the `languagetoolServer.py` file for more information about how the system is started.

```bash
java -cp languagetool-server.jar org.languagetool.server.HTTPServer --config languagetool.cfg --port {port} --allow-origin "*"
```
Expand All @@ -55,7 +52,7 @@ import asyncio
client = languagetoolClient.languagetoolClient()
text_to_process = '...'
output = asyncio.run(client.summarizeText(text_to_process))
# Example of output
# Example output
# {
# 'wordcounts': {
# 'tokens': 0,
Expand Down
10 changes: 0 additions & 10 deletions awe_languagetool/LanguageTool5_5/languagetool.cfg

This file was deleted.

5 changes: 4 additions & 1 deletion awe_languagetool/languagetoolClient.py
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,9 @@ class languagetoolClient:
def __init__(self, port=8081):

self.port = port

# As of python 3.11, resources.path() is deprecated, and the following
# code is the 'equivalent' replacement.
path_context = resources.as_file(
resources.files('awe_languagetool').joinpath('languagetool_rulemapping.json')
)
Expand All @@ -71,7 +74,7 @@ def __init__(self, port=8081):
"Trying to load AWE Workbench Lexicon Module \
without supporting datafiles"
)
# Adjusting for context functions.

with open(self.MAPPING_PATH, "r") as fo:
jsonContent = fo.read()
self.ruleInfo = json.loads(jsonContent)
Expand Down
17 changes: 1 addition & 16 deletions awe_languagetool/languagetoolServer.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,20 +21,6 @@ def runServer(fileName=None, port=8081, config_file="languagetool.cfg"):
Runs the LanguageTool server, using `importlib.resources` to find the
jar file.
'''
# In order for python 3.9 to work we have to make a slight hack to the
# language tools module in order to ensure that this works. To do that
# we first import the tool and then set the origin explicitly.
#
# This cheap hack does just that by using the submodule_search_location
# value which *does* seem to be set by default to supply the location
# for origin. Having done that we can then go about the rest of it
# without error.
# import platform
# if (platform.python_version()[0:3] == "3.9"):
# import awe_languagetool.LanguageTool5_5
# LTSpec = awe_languagetool.LanguageTool5_5.__spec__
# LTSpec.origin = LTSpec.submodule_search_locations[0]

# NOTE: after playing with python3.9, it does not like the 'package.sub' string.
# So, I added multiple 'joinpaths'; this worked for both 3.9 and 3.11
with resources.as_file(
Expand All @@ -43,15 +29,14 @@ def runServer(fileName=None, port=8081, config_file="languagetool.cfg"):
print("Setting Language Path:", LANGUAGE_TOOL_PATH)
MAPPING_PATH = os.path.dirname(LANGUAGE_TOOL_PATH)


try:
os.chdir(MAPPING_PATH)
print("Changed Dir to {}".format(MAPPING_PATH))
except FileNotFoundError:
print("Path not found starting LanguageTool: ", MAPPING_PATH)
raise

if config_file == "":
if not config_file:
language_tool_command = f"java -cp languagetool-server.jar \
org.languagetool.server.HTTPServer \
--port {port} --allow-origin \"*\""
Expand Down
2 changes: 0 additions & 2 deletions awe_languagetool/setup/data.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,8 +50,6 @@ def extra_build_commands(develop=False, install=False):
# Since python 3.11, we cannot use 'path', but this:
with resources.as_file(resources.files('awe_languagetool')) as path:
dir_name = str(path)
#dir_name = \
# resources.path('awe_languagetool', '')

extension = ".zip"
os.makedirs(dir_name, exist_ok=True)
Expand Down
13 changes: 13 additions & 0 deletions languagetool.tmp.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# This config file comes from LT5.5 forums https://forum.languagetool.org/t/languagetool-5-5-slower-than-4-5/7396
# The full list of variables which you can modify in this config file is described in their github repo:
# languagetool/languagetool-server/src/main/java/org/languagetool/server/HTTPServerConfig.java
maxTextLength=50000
cacheSize=3000
pipelineCaching=true
maxPipelinePoolSize=500
pipelineExpireTimeInSeconds=3600
pipelinePrewarming=true

# maxCheckThreads=6 and maxWorkQueueSize=100 are good values for a server with 8 cores. For more cores, increase the values accordingly.
maxCheckThreads=3
maxWorkQueueSize=50