You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+47-53Lines changed: 47 additions & 53 deletions
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,7 @@
6
6
7
7
codeBERT is a package to **automatically check if your code documentation is up-to-date**. codeBERT currently works for Python code.
8
8
9
-
*code-bert present version is available for Linux and Mac only. We are working on the Windows release. Please hang on*
9
+
*If you are using the source distribution the present version is available for Linux and Mac only. We are working on the Windows release. Please hang on*
10
10
11
11
12
12
This is [CodistAI](https://codist-ai.com/) open source version to easily use the fine tuned model based on our open source MLM code model [codeBERT-small-v2](https://huggingface.co/codistai/codeBERT-small-v2)
@@ -16,12 +16,9 @@ This is [CodistAI](https://codist-ai.com/) open source version to easily use the
16
16
17
17
## 🏆 code-bert output
18
18
19
-
Given a function body `f`as a string of code tokens (including special tokens such as `indent`and `dedent`) and a doc string `d`as a string of Natual Language tokens. Predict whether `f` and `d` are assciated or not (meaning, whether they represent the same concept or not)
19
+
Given a function `f` and a doc string `d`a code-bert predicts whether `f` and `d` are matching or not (meaning, whether they represent the same concept or not)
20
20
21
-
22
-
It will produce a report like the following -
23
-
24
-
`run_pipeline -r test_files`
21
+
A report lists out all the functions where docsting does not matchn as follow:
💡 Using our another open source library [tree-hugger](https://github.com/autosoft-dev/tree-hugger) it is fairly trivial to get the code and separate out the function body and the docstring with a single API call.
54
-
55
-
We can use then, the [`process_code`](https://github.com/autosoft-dev/code-bert/blob/2dd35f16fa2cdb96f75e21bb0a9393aa3164d885/code_bert/core/data_reader.py#L136) method from this prsent repo to process the code lines in the proper format as [codeBERT-small-v2](https://huggingface.co/codistai/codeBERT-small-v2) would want.
56
-
57
-
Doing the above two steps properly would produce something like the following
58
-
59
-
-**Function** - `def get file ( filename ) : indent if not path ( filename ) . is file ( ) : indent return none dedent return open ( filename , "rb" ) dedent`
60
-
61
-
-**Doc String** - `opens a url`
62
-
63
-
Ideally then we need some model to run the following Pseudocode
64
-
65
-
```python
66
-
match, confidence = model(function, docstring)
67
-
```
68
-
69
-
And ideally, in this case, the match should be `False`
70
-
71
-
## code-bert CLI
33
+
## code-bert local setup
72
34
73
35
**The entire code base is built and abvailble for Python3.6+**
74
36
@@ -90,7 +52,7 @@ We have provided very easy to use CLI commands to achieve all these, and at scal
90
52
91
53
-----------
92
54
93
-
Assuming that model is downloaded and ready, you can run the following command to analyze one file or a directory containing a bunch of files
55
+
You can run the following command to analyze one file or a directory containing a bunch of files
-m, --show_match Shall we show the matches? (Default false)
105
67
```
106
68
107
-
So, let's say you have a directory called `test_files` with some python files in it. This is how you can analyze them
69
+
## code-bert Docker
108
70
109
-
`run_pipeline -r test_files`
71
+
It has been request by our users and here it is! You will not need to go through any painful setup process at all. We have Dockerized the entire thing for you. Here are the steps to use it.
72
+
73
+
- Pull the image `docker pull codistai/codebert`
74
+
75
+
- Assuming that you have a bunch of files to be analyzed under `test_files` in your present working directory, run this command - `docker run -v "$(pwd)"/test_files:/usr/src/app/test_files -it codistai/codebert run_pipeline -r test_files`
76
+
77
+
- If you wish to analyze any other directory, simply change the mounting option in the `docker run` command (the path after `-v` the format should be `full/local/path:/usr/src/app/<mount_dir_name>`) and also mention the same `<mount_dir_name>` after the `run_pipeline` command.
110
78
111
-
A prompt will appear to confirm the model location. Once you confirm that then the algorithm will take one file at a time and analyze that, recursively on the whole directory.
112
79
113
-
🏆 It should produce a report like the following -
114
80
81
+
## 🎮 code-bert example
82
+
83
+
SLet's say you have a directory called `test_files` with some python files in it. Here is how to run the analysis:
84
+
85
+
`run_pipeline -r test_files`
86
+
87
+
The algorithm will take one file at a time to analyze recursively on the whole directory and prompt out a report of not matching function-docstring pairs.
You can optionally pass the `--show_match` flag like so `run_pipeline -r test_files --show_match` and then it will also show you the function docstring pairs where they match. By default it will only show you the mis-matches. So, with this flag set the report will look like this
100
+
101
+
You can optionally pass the `--show_match` flag like so `run_pipeline -r test_files --show_match` to prompt out both match and mismatching function-docstring pairs.
It has been request by our users and here it is! You will not need to go through any painful setup process at all. We have Dockerized the entire thing for you. Here are the steps to use it.
154
126
155
-
- Pull the image `docker pull codistai/codebert`
127
+
## 💡 code-bert logic
156
128
157
-
- Assuming that you have a bunch of files to be analyzed under `test_files` in your present working directory, run this command - `docker run -v "$(pwd)"/test_files:/usr/src/app/test_files -it codistai/codebert run_pipeline -r test_files`
129
+
Let's consider the following code
130
+
131
+
```python
132
+
from pathlib import Path
133
+
134
+
defget_file(filename):
135
+
"""
136
+
opens a url
137
+
"""
138
+
ifnot Path(filename).is_file():
139
+
returnNone
140
+
returnopen(filename, "rb")
141
+
142
+
```
143
+
1. Mine souce code to get function-docstring pairs using [tree-hugger](https://github.com/autosoft-dev/tree-hugger)
144
+
2. Prep for functions and docstring data to fit input format expected by [codeBERT-small-v2](https://huggingface.co/codistai/codeBERT-small-v2) model.
145
+
-**Function** - `def get file ( filename ) : indent if not path ( filename ) . is file ( ) : indent return none dedent return open ( filename , "rb" ) dedent`
146
+
147
+
-**Doc String** - `opens a url`
148
+
149
+
3. Run the model
150
+
```python
151
+
match, confidence = model(function, docstring)
152
+
```
158
153
159
-
- If you wish to analyze any other directory, simply change the mounting option in the `docker run` command (the path after `-v` the format should be `full/local/path:/usr/src/app/<mount_dir_name>`) and also mention the same `<mount_dir_name>` after the `run_pipeline` command.
0 commit comments