Merge pull request #118 from JetBrains-Research/readme_update

SpirinEgor · web-flow · commit 26f409de9b83 · 2022-01-11T20:40:32.000+05:00
README update
diff --git a/README.md b/README.md
@@ -14,9 +14,31 @@ You can easily install model through the PIP:
 pip install code2seq
 ```
 
-## Usage
+## Dataset mining
 
-Minimal code example to run the model:
+To prepare your own dataset with a storage format supported by this implementation, use on the following:
+1. Original dataset preprocessing from vanilla repository
+2. [`astminer`](https://github.com/JetBrains-Research/astminer):
+the tool for mining path-based representation and more with multiple language support.
+3. [`PSIMiner`](https://github.com/JetBrains-Research/psiminer):
+the tool for extracting PSI trees from IntelliJ Platform and creating datasets from them.
+## Available checkpoints
+
+### Method name prediction
+| Dataset (with link)                                                                                                     | Checkpoint                                                                                                        | # epochs | F1-score | Precision | Recall | ChrF  |
+|-------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------|----------|----------|-----------|--------|-------|
+| [Java-small](https://s3.eu-west-1.amazonaws.com/datasets.ml.labs.aws.intellij.net/java-paths-methods/java-small.tar.gz) | [link](https://s3.eu-west-1.amazonaws.com/datasets.ml.labs.aws.intellij.net/checkpoints/code2seq_java_small.ckpt) | 11       | 41.49    | 54.26     | 33.59  | 30.21 |
+| [Java-med](https://s3.eu-west-1.amazonaws.com/datasets.ml.labs.aws.intellij.net/java-paths-methods/java-med.tar.gz)     | [link](https://s3.eu-west-1.amazonaws.com/datasets.ml.labs.aws.intellij.net/checkpoints/code2seq_java_med.ckpt)   | 10       | 48.17    | 58.87     | 40.76  | 42.32 |
+
+## Configuration
+
+The model is fully configurable by standalone YAML file.
+Navigate to [config](config) directory to see examples of configs.
+
+## Examples
+
+Model training may be done via PyTorch Lightning trainer.
+See it [documentation](https://pytorch-lightning.readthedocs.io/en/latest/common/trainer.html) for more information.
 
 ```python
 from argparse import ArgumentParser
@@ -29,20 +51,21 @@ from code2seq.model import Code2Seq
 
 
 def train(config: DictConfig):
-    # Load data module
+    # Define data module
     data_module = PathContextDataModule(config.data_folder, config.data)
-    data_module.prepare_data()
-    data_module.setup()
 
-    # Load model
+    # Define model
     model = Code2Seq(
         config.model,
         config.optimizer,
         data_module.vocabulary,
         config.train.teacher_forcing
     )
 
-    trainer = Trainer(max_epochs=config.hyper_parameters.n_epochs)
+    # Define hyper parameters
+    trainer = Trainer(max_epochs=config.train.n_epochs)
+
+    # Train model
     trainer.fit(model, datamodule=data_module)
 
 
@@ -54,6 +77,3 @@ if __name__ == "__main__":
     __config = OmegaConf.load(__args.config)
     train(__config)
 ```
-
-Navigate to [config](config) directory to see examples of configs.
-If you have any questions, then feel free to open the issue.