Skip to content

Commit c3e3010

Browse files
committed
First Commit
0 parents  commit c3e3010

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

45 files changed

+1793
-0
lines changed

.gitignore

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
.venv/
2+
.vscode/
3+
.idea/
4+
__pycache__
5+
__pycache__/*

LICENSE

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
MIT License
2+
3+
Copyright (c) <year> <real name>
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

README.md

Lines changed: 127 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,127 @@
1+
# Example
2+
3+
This project was generated with the Hypergol framework
4+
5+
Please see documentation for instructions: [https://hypergol.readthedocs.io/en/latest/](https://hypergol.readthedocs.io/en/latest/)
6+
7+
### Initialise git
8+
9+
Hypergol is heavily integrated with git, all projects must be in a git repository to ensure code and data lineage (to record which data was created by which version of the code).
10+
11+
Initialise git with:
12+
13+
```git init .```
14+
15+
Create the first commit (datasets record the last commit when they are created and without this there is nothing to record):
16+
17+
```git commit -m "First Commit!"```
18+
19+
The project now (and any time a file is changed but the change is not committed to the repo) is in a "dirty" stage. If you run a pipeline or train a model, the last commit will be recorded but that commit will not represent the code that is running! Add changes and commit:
20+
21+
```
22+
git add .
23+
git commit -m "All the files!"
24+
```
25+
26+
If there are files that shouldn't be checked in ever to git they should be to the `.gitignore` file before `git add .`
27+
28+
Alternatively individual files can be added to git with `git add <filename>`.
29+
30+
### Make the virtual environment
31+
32+
Having dedicated virtual environment fully described by the projects `requirements.txt` is the recommended practice. Don't forget to `deactivate` the current virtual environment! Files from the environment are included in the projects `.gitignore` file and will ignored by git.
33+
34+
```
35+
deactivate
36+
./make_venv.sh
37+
source .venv/bin/activate
38+
```
39+
40+
41+
### How to list existing Datasets (in Jupyter)
42+
43+
```
44+
sys.path.insert(0, '<project_directory>/example')
45+
from hypergol import HypergolProject
46+
from data_models.example_datamodel_class import ExampleDatamodelClass
47+
project = HypergolProject(
48+
projectDirectory='<project_directory>/example',
49+
dataDirectory='<data_directory>'
50+
)
51+
ds = project.datasetFactory.get(dataType=ExampleDatamodelClass, name='sentences')
52+
# project.list_datasets(pattern='.*', asCode=True);
53+
```
54+
55+
This will list all existing datasets that matches `pattern` as self contained executable code.
56+
57+
58+
### How to start Tensorboard
59+
60+
It is recommended to start it in a screen session (`screen -S tensorboard`) so you can close the terminal window or if you disconnect from a remote Linux machine (reconnect with `screen -x tensorboard`). In the project directory:
61+
62+
```
63+
screen -S tensorboard
64+
source .venv/bin/activate
65+
tensorboard --logdir=<data_directory>/example/tensorboard/
66+
```
67+
68+
69+
### How to train your model
70+
71+
After implementing all components and required functions:
72+
73+
```
74+
./train_example.sh
75+
```
76+
77+
This will execute the model manager's run() function with the prescribed schedule (training steps, evaluation steps, etc.). Training can be stopped with Ctrl-C, this will won't result in the corruption of the output dataset (datasets must be closed properly to generate their chk file after they are read only). This is possible because the entire training happen in a `try/finally` block.
78+
79+
### How to serve your model
80+
81+
In the generated `models/serve_example.py` function specify the directory of the model to be served at:
82+
83+
```
84+
MODEL_DIRECTORY = '<data_directory>/example/<branch>/models/<ModelName>/<epoch_number>'
85+
```
86+
87+
then start serving with (port and host can be set in the shell script):
88+
89+
```
90+
./serve_example.sh
91+
```
92+
93+
94+
### How to call your model from python with requests
95+
96+
```
97+
import requests
98+
response = json.loads(requests.get('http://0.0.0.0:8000', headers={'accept': 'application/json'}).text)
99+
modelLongName = response['model']
100+
```
101+
102+
This allows to verify if indeed the intended model is served. The generated training script sets training day and the commit hash at that point to be part of the long name and to ensure that the exact conditions of training are available at serving. Long name should be used in logging to identify which model created an output. From v0.0.10 the long name is returned in the header of the response of `/output` endpoint as well in the `x-model-long-name` field.
103+
104+
To get the response of the model to a list of objects, see example below. Replace `ExampleOutput` with the correct output type and load a dataset into `ds`, use `list_datasets` from above to do this.
105+
106+
```
107+
sys.path.insert(0, '<project_directory>/example')
108+
import requests
109+
from itertools import islice
110+
from data_models.example_model_output import ExampleModelOutput
111+
112+
with ds.open('r') as dsr:
113+
values = [value.to_data() for value in islice(dsr, 10)]
114+
115+
response = requests.post(
116+
'http://0.0.0.0:8000/output',
117+
headers={
118+
'accept': 'application/json',
119+
'Content-Type': 'application/json',
120+
},
121+
data=json.dumps(values)
122+
)
123+
outputs = [ExampleModelOutput.from_data(v) for v in json.loads(response.text)]
124+
modelLongName = response.headers['x-model-long-name']
125+
```
126+
127+
It is not recommended to do large scale evaluation through the API as the overhead per object is too high and it is single threaded.

data_models/__init__.py

Whitespace-only changes.

data_models/article.py

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
from typing import List
2+
from datetime import datetime
3+
4+
from hypergol import BaseData
5+
6+
from data_models.sentence import Sentence
7+
8+
9+
class Article(BaseData):
10+
11+
def __init__(self, articleId: int, url: str, title: str, text: str, publishDate: datetime, sentences: List[Sentence]):
12+
self.articleId = articleId
13+
self.url = url
14+
self.title = title
15+
self.text = text
16+
self.publishDate = publishDate
17+
self.sentences = sentences
18+
19+
def get_id(self):
20+
return (self.articleId, )
21+
22+
def to_data(self):
23+
data = self.__dict__.copy()
24+
data['publishDate'] = data['publishDate'].isoformat()
25+
data['sentences'] = [v.to_data() for v in data['sentences']]
26+
return data
27+
28+
@classmethod
29+
def from_data(cls, data):
30+
data['publishDate'] = datetime.fromisoformat(data['publishDate'])
31+
data['sentences'] = [Sentence.from_data(v) for v in data['sentences']]
32+
return cls(**data)

data_models/article_page.py

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
from hypergol import BaseData
2+
3+
4+
class ArticlePage(BaseData):
5+
6+
def __init__(self, articlePageId: int, url: str, body: str):
7+
self.articlePageId = articlePageId
8+
self.url = url
9+
self.body = body
10+
11+
def get_id(self):
12+
return (self.articlePageId, )

data_models/article_text.py

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
from datetime import datetime
2+
3+
from hypergol import BaseData
4+
5+
6+
class ArticleText(BaseData):
7+
8+
def __init__(self, articleTextId: int, publishDate: datetime, title: str, text: str, url: str):
9+
self.articleTextId = articleTextId
10+
self.publishDate = publishDate
11+
self.title = title
12+
self.text = text
13+
self.url = url
14+
15+
def get_id(self):
16+
return (self.articleTextId, )
17+
18+
def to_data(self):
19+
data = self.__dict__.copy()
20+
data['publishDate'] = data['publishDate'].isoformat()
21+
return data
22+
23+
@classmethod
24+
def from_data(cls, data):
25+
data['publishDate'] = datetime.fromisoformat(data['publishDate'])
26+
return cls(**data)

data_models/evaluation_output.py

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
from hypergol import BaseData
2+
3+
4+
class EvaluationOutput(BaseData):
5+
6+
def __init__(self, articleId: int, sentenceId: int, inputs: object, outputs: object, targets: object):
7+
self.articleId = articleId
8+
self.sentenceId = sentenceId
9+
self.inputs = inputs
10+
self.outputs = outputs
11+
self.targets = targets
12+
13+
def get_id(self):
14+
return (self.articleId, self.sentenceId, )
15+
16+
def to_data(self):
17+
data = self.__dict__.copy()
18+
data['inputs'] = BaseData.to_string(data['inputs'])
19+
data['outputs'] = BaseData.to_string(data['outputs'])
20+
data['targets'] = BaseData.to_string(data['targets'])
21+
return data
22+
23+
@classmethod
24+
def from_data(cls, data):
25+
data['inputs'] = BaseData.from_string(data['inputs'])
26+
data['outputs'] = BaseData.from_string(data['outputs'])
27+
data['targets'] = BaseData.from_string(data['targets'])
28+
return cls(**data)

data_models/labelled_article.py

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
from hypergol import BaseData
2+
3+
4+
class LabelledArticle(BaseData):
5+
6+
def __init__(self, labelledArticleId: int, articleId: int, labelId: int):
7+
self.labelledArticleId = labelledArticleId
8+
self.articleId = articleId
9+
self.labelId = labelId
10+
11+
def get_id(self):
12+
return (self.labelledArticleId, )

data_models/model_output.py

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
from typing import List
2+
3+
from hypergol import BaseData
4+
5+
6+
class ModelOutput(BaseData):
7+
8+
def __init__(self, articleId: int, sentenceId: int, posTags: List[str]):
9+
self.articleId = articleId
10+
self.sentenceId = sentenceId
11+
self.posTags = posTags
12+
13+
def get_id(self):
14+
return (self.articleId, self.sentenceId, )

0 commit comments

Comments
 (0)