Skip to content

Commit 46e7802

Browse files
The final result I am happy with
1 parent 4564f99 commit 46e7802

26 files changed

+726
-64
lines changed

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
*.pyc
2+
__pycache__

LICENSE

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
MIT License
2+
3+
Copyright (c) 2025 wpdevelopment11
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

README.md

Lines changed: 119 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,119 @@
1+
## Description
2+
3+
Find the fenced code blocks in the Markdown file that don't have the language specified.
4+
Detect the language from the block contents and insert the language name after the starting fence.
5+
Print the resulting code blocks or edit the files in-place.
6+
7+
Under the hood it uses [Magika](https://github.com/google/magika) (recommended) or [Guesslang](https://github.com/yoeo/guesslang) deep learning models to detect the language.
8+
9+
Tested on Windows and Linux.
10+
11+
## Install
12+
13+
```bash
14+
git clone https://github.com/wpdevelopment11/codeblocks
15+
cd codeblocks
16+
python3 -m venv .venv
17+
source .venv/bin/activate
18+
19+
# Install one of them:
20+
pip install magika==0.6.1 # Recommended
21+
# Or
22+
pip install guesslang # May not work,
23+
# depending on your Python version
24+
# and OS combination.
25+
```
26+
27+
> **Note:** <a id="guesslang"></a>
28+
> [Guesslang](https://github.com/yoeo/guesslang) is not maintained. I got it working on Windows with Python 3.10.
29+
>
30+
> First `pip install tensorflow==2.13`.
31+
>
32+
> Next, copy [guesslang directory](https://github.com/yoeo/guesslang/tree/master/guesslang) to the top-level directory of your project.
33+
> Start the Python shell with `python` and run `import guesslang` to check if it's installed properly.
34+
35+
## Usage
36+
37+
```bash
38+
python3 codeblocks.py [--edit] path ...
39+
```
40+
41+
* `--edit`
42+
43+
Edit files by inserting the language.
44+
By default, files are not modified,
45+
instead code blocks for which the language can be detected are printed to the terminal.
46+
47+
* `path`
48+
49+
Paths to process.
50+
Can be Markdown files or directories, or any combination of them.
51+
Directories are processed recursively.
52+
53+
### Insert the language names in all Markdown files in directory
54+
55+
This command will edit your files, make a backup.
56+
57+
```bash
58+
python3 codeblocks.py --edit /path/to/dir
59+
```
60+
61+
### Insert the language names in specified file(s) only
62+
63+
```bash
64+
python3 codeblocks.py --edit /path/to/file.md
65+
```
66+
67+
### Print code blocks with the detected language, without modifying files
68+
69+
```bash
70+
python3 codeblocks.py /path/to/file.md
71+
```
72+
73+
## Run tests
74+
75+
```bash
76+
python3 -m unittest discover test
77+
```
78+
79+
## Limitations
80+
81+
* Line that consists of three or more backticks is always detected as a fenced code block.
82+
Normal Markdown parsers consider them as such only if up to three spaces of indentation are used outside of a list item,
83+
and up to seven spaces otherwise.
84+
85+
## Motivation
86+
87+
The language names in the fenced code blocks are commonly used for syntax highlighting.
88+
89+
Some people forget to or don't know how to specify the language.
90+
This leads to a code that is not highlighted and hard to read.
91+
This script is intended to solve that issue.
92+
93+
Example:
94+
95+
* Before:
96+
97+
````
98+
```
99+
def print_table():
100+
for num in range(10):
101+
sqr = num * num
102+
print(f"{num}^2\t= {sqr}")
103+
104+
print_table()
105+
```
106+
````
107+
108+
* After:
109+
110+
````python
111+
```python
112+
def print_table():
113+
for num in range(10):
114+
sqr = num * num
115+
print(f"{num}^2\t= {sqr}")
116+
117+
print_table()
118+
```
119+
````

codeblocks.py

Lines changed: 91 additions & 64 deletions
Original file line numberDiff line numberDiff line change
@@ -1,28 +1,30 @@
1-
import glob
2-
import sys
3-
import os.path
4-
51
from enum import Enum
2+
from itertools import chain
63
from tempfile import NamedTemporaryFile
7-
from shutil import copy
8-
from os import unlink
4+
5+
import argparse
6+
import glob
7+
import os
8+
import os.path
9+
import re
10+
import shutil
11+
import sys
912

1013
class Codeblock(Enum):
1114
IN = 1
1215
IN_WITH_LANG = 2
1316
OUT = 3
1417

18+
INTLINE_PATTERN = re.compile("```+[^`]+`")
1519
DEFAULT_LANG = "txt"
16-
EDIT_FILES = True
1720

1821
try:
1922
from magika import Magika
2023
magika = Magika()
2124
def guess_language(code):
2225
codebytes = code.encode(encoding="utf-8")
2326
lang = magika.identify_bytes(codebytes).prediction.output.label
24-
if lang == "unknown": return DEFAULT_LANG
25-
return lang
27+
return lang if lang != "unknown" else DEFAULT_LANG
2628
except ImportError:
2729
try:
2830
from guesslang import Guess
@@ -34,61 +36,86 @@ def guess_language(code):
3436
print("Magika or Guesslang is required to run this script. Install one of them to proceed!")
3537
sys.exit(1)
3638

37-
if len(sys.argv) != 2:
38-
print("Usage: python codeblocks.py dir")
39-
sys.exit(1)
39+
def add_language(files, edit_files):
40+
def is_made_of_char(str, char):
41+
return len(str) == str.count(char)
42+
# newline="" is important. See below.
43+
temp = NamedTemporaryFile("w+", encoding="utf-8", newline="", delete=False)
44+
for file in files:
45+
blockstate = Codeblock.OUT
46+
code = []
47+
# The argument newline="" is important here and in the NamedTemporaryFile() call above.
48+
# We don't want to change line endings in a file which we're editing.
49+
with open(file, encoding="utf-8", newline="") as f:
50+
for linenum, line in enumerate(f, 1):
51+
stripped = line.strip()
52+
if stripped.startswith("```") and not INTLINE_PATTERN.match(stripped) \
53+
and (blockstate == Codeblock.OUT or is_made_of_char(stripped, "`") and len(stripped) >= backticks_num):
54+
if blockstate == Codeblock.IN_WITH_LANG:
55+
blockstate = Codeblock.OUT
56+
if edit_files: temp.write(line)
57+
elif blockstate == Codeblock.IN:
58+
blockstate = Codeblock.OUT
59+
code_str = "\n".join(line.removeprefix(indent).rstrip() for line in code) + "\n"
60+
lang = guess_language(code_str) if code_str else ""
61+
if edit_files:
62+
# When editing files, txt is not very useful edit.
63+
if lang == "txt": lang = ""
64+
fence_start = "`" * backticks_num
65+
temp.write(fence.replace(fence_start, fence_start + lang))
66+
temp.writelines(code)
67+
temp.write(line)
68+
elif lang:
69+
print(f"{file}:{linenum - len(code) - 1}")
70+
print(("`" * backticks_num) + lang + "\n" + code_str + stripped)
71+
print()
72+
code = []
73+
elif is_made_of_char(stripped, "`"):
74+
backticks_num = len(stripped)
75+
blockstate = Codeblock.IN
76+
count = len(line) - len(line.lstrip())
77+
indent = line[:count]
78+
fence = line
79+
else:
80+
backticks_num = stripped.count("`")
81+
blockstate = Codeblock.IN_WITH_LANG
82+
if edit_files: temp.write(line)
83+
elif blockstate == Codeblock.IN:
84+
code.append(line)
85+
elif edit_files:
86+
temp.write(line)
87+
if edit_files:
88+
if code:
89+
# non-terminated fence
90+
temp.write(fence)
91+
temp.writelines(code)
92+
temp.flush()
93+
shutil.copy(temp.name, file)
94+
temp.seek(0)
95+
temp.truncate(0)
96+
temp.close()
97+
os.unlink(temp.name)
4098

41-
files = glob.iglob("**/*.md", root_dir=sys.argv[1], recursive=True)
42-
# newline="" is important. See below.
43-
temp = NamedTemporaryFile("w+", encoding="utf-8", newline="", delete=False)
99+
def main():
100+
parser = argparse.ArgumentParser(description="Detect and insert the language in the Markdown code blocks.")
101+
parser.add_argument("--edit", action="store_true", help="Edit files by inserting the language")
102+
parser.add_argument("path", nargs="+", help="Paths to process")
103+
args = parser.parse_args()
44104

45-
for file in files:
46-
fullpath = os.path.join(sys.argv[1], file)
47-
blockstate = Codeblock.OUT
48-
code = []
49-
fence = ""
50-
# The argument newline="" is important here and in the NamedTemporaryFile() call above.
51-
# We don't want to change line endings in a file which we're editing.
52-
for linenum, line in enumerate(open(fullpath, encoding="utf-8", newline=""), 1):
53-
if line.strip().startswith("```"):
54-
if blockstate == Codeblock.IN_WITH_LANG:
55-
blockstate = Codeblock.OUT
56-
if EDIT_FILES: temp.write(line)
57-
elif blockstate == Codeblock.IN:
58-
blockstate = Codeblock.OUT
59-
indent = len(fence) - len(fence.lstrip())
60-
code_str = "\n".join(line[indent:].strip() for line in code) + "\n"
61-
lang = guess_language(code_str) if code_str else ""
62-
if EDIT_FILES:
63-
# When editing files, txt is not very useful edit.
64-
if lang == "txt": lang = ""
65-
temp.write(fence.replace("```", f"```{lang}"))
66-
temp.writelines(code)
67-
temp.write(line)
68-
elif lang:
69-
print(f"{fullpath}:{linenum - len(code) - 1}")
70-
print(f"```{lang}\n" + code_str + "```")
71-
print()
72-
code = []
73-
elif line.strip() == "```":
74-
blockstate = Codeblock.IN
75-
fence = line
76-
else:
77-
blockstate = Codeblock.IN_WITH_LANG
78-
if EDIT_FILES: temp.write(line)
79-
elif blockstate == Codeblock.IN:
80-
code.append(line)
81-
elif EDIT_FILES:
82-
temp.write(line)
83-
if EDIT_FILES:
84-
if code:
85-
# non-terminated fence
86-
temp.write(fence)
87-
temp.writelines(code)
88-
temp.flush()
89-
copy(temp.name, fullpath)
90-
temp.seek(0)
91-
temp.truncate(0)
105+
files = []
106+
iters = []
107+
108+
for path in args.path:
109+
if os.path.isfile(path):
110+
files.append(path)
111+
elif os.path.isdir(path):
112+
iters.append(glob.iglob(os.path.join(path, "**", "*.md"), recursive=True))
113+
else:
114+
print(f"Path doesn't exist: \"{path}\"")
115+
sys.exit(1)
116+
117+
iters.append(iter(files))
118+
add_language(chain.from_iterable(iters), args.edit)
92119

93-
temp.close()
94-
unlink(temp.name)
120+
if __name__ == "__main__":
121+
main()

test/files/block_termination.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
```
2+
3+
```blah
4+
5+
import os
6+
7+
def foo():
8+
for i in range(10):
9+
print(i)
10+
11+
def foobar():
12+
pass
13+
14+
foo()
15+
```
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
```python
2+
3+
```blah
4+
5+
import os
6+
7+
def foo():
8+
for i in range(10):
9+
print(i)
10+
11+
def foobar():
12+
pass
13+
14+
foo()
15+
```

test/files/inline_code.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
```console.log("I like JavaScript");```
2+
3+
```
4+
import os
5+
6+
def foo():
7+
for i in range(10):
8+
print(i)
9+
10+
def foobar():
11+
pass
12+
13+
foo()
14+
```

test/files/inline_code_result.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
```console.log("I like JavaScript");```
2+
3+
```python
4+
import os
5+
6+
def foo():
7+
for i in range(10):
8+
print(i)
9+
10+
def foobar():
11+
pass
12+
13+
foo()
14+
```

0 commit comments

Comments
 (0)