Skip to content

Commit fbc1935

Browse files
authored
Update README.md
1 parent 723d334 commit fbc1935

File tree

1 file changed

+47
-53
lines changed

1 file changed

+47
-53
lines changed

README.md

Lines changed: 47 additions & 53 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66

77
codeBERT is a package to **automatically check if your code documentation is up-to-date**. codeBERT currently works for Python code.
88

9-
*code-bert present version is available for Linux and Mac only. We are working on the Windows release. Please hang on*
9+
*If you are using the source distribution the present version is available for Linux and Mac only. We are working on the Windows release. Please hang on*
1010

1111

1212
This is [CodistAI](https://codist-ai.com/) open source version to easily use the fine tuned model based on our open source MLM code model [codeBERT-small-v2](https://huggingface.co/codistai/codeBERT-small-v2)
@@ -16,12 +16,9 @@ This is [CodistAI](https://codist-ai.com/) open source version to easily use the
1616

1717
## 🏆 code-bert output
1818

19-
Given a function body `f` as a string of code tokens (including special tokens such as `indent` and `dedent`) and a doc string `d` as a string of Natual Language tokens. Predict whether `f` and `d` are assciated or not (meaning, whether they represent the same concept or not)
19+
Given a function `f` and a doc string `d` a code-bert predicts whether `f` and `d` are matching or not (meaning, whether they represent the same concept or not)
2020

21-
22-
It will produce a report like the following -
23-
24-
`run_pipeline -r test_files`
21+
A report lists out all the functions where docsting does not matchn as follow:
2522

2623
```
2724
======== Analysing test_files/inner_dir/test_code_get.py =========
@@ -33,42 +30,7 @@ No
3330
```
3431

3532

36-
## An example
37-
38-
Let's consider the following code
39-
40-
```python
41-
from pathlib import Path
42-
43-
def get_file(filename):
44-
"""
45-
opens a url
46-
"""
47-
if not Path(filename).is_file():
48-
return None
49-
return open(filename, "rb")
50-
51-
```
52-
53-
💡 Using our another open source library [tree-hugger](https://github.com/autosoft-dev/tree-hugger) it is fairly trivial to get the code and separate out the function body and the docstring with a single API call.
54-
55-
We can use then, the [`process_code`](https://github.com/autosoft-dev/code-bert/blob/2dd35f16fa2cdb96f75e21bb0a9393aa3164d885/code_bert/core/data_reader.py#L136) method from this prsent repo to process the code lines in the proper format as [codeBERT-small-v2](https://huggingface.co/codistai/codeBERT-small-v2) would want.
56-
57-
Doing the above two steps properly would produce something like the following
58-
59-
- **Function** - `def get file ( filename ) : indent if not path ( filename ) . is file ( ) : indent return none dedent return open ( filename , "rb" ) dedent`
60-
61-
- **Doc String** - `opens a url`
62-
63-
Ideally then we need some model to run the following Pseudocode
64-
65-
```python
66-
match, confidence = model(function, docstring)
67-
```
68-
69-
And ideally, in this case, the match should be `False`
70-
71-
## code-bert CLI
33+
## code-bert local setup
7234

7335
**The entire code base is built and abvailble for Python3.6+**
7436

@@ -90,7 +52,7 @@ We have provided very easy to use CLI commands to achieve all these, and at scal
9052

9153
-----------
9254

93-
Assuming that model is downloaded and ready, you can run the following command to analyze one file or a directory containing a bunch of files
55+
You can run the following command to analyze one file or a directory containing a bunch of files
9456

9557
```
9658
usage: run_pipeline [-h] [-f FILE_NAME] [-r RECURSIVE] [-m]
@@ -104,14 +66,25 @@ optional arguments:
10466
-m, --show_match Shall we show the matches? (Default false)
10567
```
10668

107-
So, let's say you have a directory called `test_files` with some python files in it. This is how you can analyze them
69+
## code-bert Docker
10870

109-
`run_pipeline -r test_files`
71+
It has been request by our users and here it is! You will not need to go through any painful setup process at all. We have Dockerized the entire thing for you. Here are the steps to use it.
72+
73+
- Pull the image `docker pull codistai/codebert`
74+
75+
- Assuming that you have a bunch of files to be analyzed under `test_files` in your present working directory, run this command - `docker run -v "$(pwd)"/test_files:/usr/src/app/test_files -it codistai/codebert run_pipeline -r test_files`
76+
77+
- If you wish to analyze any other directory, simply change the mounting option in the `docker run` command (the path after `-v` the format should be `full/local/path:/usr/src/app/<mount_dir_name>`) and also mention the same `<mount_dir_name>` after the `run_pipeline` command.
11078

111-
A prompt will appear to confirm the model location. Once you confirm that then the algorithm will take one file at a time and analyze that, recursively on the whole directory.
11279

113-
🏆 It should produce a report like the following -
11480

81+
## 🎮 code-bert example
82+
83+
SLet's say you have a directory called `test_files` with some python files in it. Here is how to run the analysis:
84+
85+
`run_pipeline -r test_files`
86+
87+
The algorithm will take one file at a time to analyze recursively on the whole directory and prompt out a report of not matching function-docstring pairs.
11588

11689
```
11790
======== Analysing test_files/test_code_add.py =========
@@ -124,7 +97,8 @@ No
12497
******************************************************************
12598
```
12699

127-
You can optionally pass the `--show_match` flag like so `run_pipeline -r test_files --show_match` and then it will also show you the function docstring pairs where they match. By default it will only show you the mis-matches. So, with this flag set the report will look like this
100+
101+
You can optionally pass the `--show_match` flag like so `run_pipeline -r test_files --show_match` to prompt out both match and mismatching function-docstring pairs.
128102

129103
```
130104
======== Analysing test_files/test_code_add.py =========
@@ -148,14 +122,34 @@ No
148122
******************************************************************
149123
```
150124

151-
## code-bert Docker
152125

153-
It has been request by our users and here it is! You will not need to go through any painful setup process at all. We have Dockerized the entire thing for you. Here are the steps to use it.
154126

155-
- Pull the image `docker pull codistai/codebert`
127+
## 💡 code-bert logic
156128

157-
- Assuming that you have a bunch of files to be analyzed under `test_files` in your present working directory, run this command - `docker run -v "$(pwd)"/test_files:/usr/src/app/test_files -it codistai/codebert run_pipeline -r test_files`
129+
Let's consider the following code
130+
131+
```python
132+
from pathlib import Path
133+
134+
def get_file(filename):
135+
"""
136+
opens a url
137+
"""
138+
if not Path(filename).is_file():
139+
return None
140+
return open(filename, "rb")
141+
142+
```
143+
1. Mine souce code to get function-docstring pairs using [tree-hugger](https://github.com/autosoft-dev/tree-hugger)
144+
2. Prep for functions and docstring data to fit input format expected by [codeBERT-small-v2](https://huggingface.co/codistai/codeBERT-small-v2) model.
145+
- **Function** - `def get file ( filename ) : indent if not path ( filename ) . is file ( ) : indent return none dedent return open ( filename , "rb" ) dedent`
146+
147+
- **Doc String** - `opens a url`
148+
149+
3. Run the model
150+
```python
151+
match, confidence = model(function, docstring)
152+
```
158153

159-
- If you wish to analyze any other directory, simply change the mounting option in the `docker run` command (the path after `-v` the format should be `full/local/path:/usr/src/app/<mount_dir_name>`) and also mention the same `<mount_dir_name>` after the `run_pipeline` command.
160154

161155
Stay tuned!

0 commit comments

Comments
 (0)