|
1 | 1 | # Summarize |
2 | 2 |
|
3 | | -A go utility that will capture files with an extension pattern into a single markdown formatted |
4 | | -file that looks like: |
| 3 | +The **Summarize** package was designed for developers who wish to leverage the use of Artificial Intelligence while |
| 4 | +working on a project. The `summarize` command give you a powerful interface that is managed by arguments and environment |
| 5 | +variables that define include/exclude extensions, and avoid substrings list while parsing paths. The binary has |
| 6 | +concurrency built into it and has limits for the output file. It ignores its default output directory so it won't |
| 7 | +recursively build summaries upon itself. It defaults to writing to a new directory that it'll try to create in the |
| 8 | +current working directory called `summaries`, that I recommend that you add to your `.gitignore` and `.dockerignore`. |
| 9 | + |
| 10 | +I've found it useful to leverage the `make summary` command in all of my projects. This way, if I need to ask an AI a |
| 11 | +question about a piece of code, I can capture the source code of the entire directory quickly and then just `cat` the |
| 12 | +output file path provided and _voila_! The `-print` argument allows you to display the summary contents in the STDOUT |
| 13 | +instead of the `Summary generated: summaries/summary.2025.07.29.08.59.03.UTC.md` that it would normally generate. |
| 14 | + |
| 15 | +The **Environment** can be used to control the native behavior of the `summarize` binary, such that you won't be required |
| 16 | +to type the arguments out each time. If you use _JSON_ all the time, you can enable its output format on every command |
| 17 | +by using the `SUMMARIZE_ALWAYS_JSON`. If you always want to write the summary, you can use the `SUMMARIZE_ALWAYS_WRITE` |
| 18 | +variable. If you want to always print the summary to STDOUT instead of the success message, you can use the variable |
| 19 | +`SUMMARIZE_ALWAYS_PRINT`. If you want to compress the rendered summary every time, you can use the variable |
| 20 | +`SUMMARIZE_ALWAYS_COMPRESS`. These `SUMMARIZE_ALWAYS_*` environment variables are responsible for customizing the |
| 21 | +runtime of the `summarize` application. |
| 22 | + |
| 23 | +When the `summarize` binary runs, it'll do its best to ignore files that it can't render to a text file. This includes |
| 24 | +images, videos, binary files, and text files that are commonly linked to secrets. |
| 25 | + |
| 26 | +The developer experience while using `summarize` is designed to enable quick use with just running `summarize` from |
| 27 | +where ever you wish to summarize. The `-d` for **source directory** defaults to `.` and the `-o`/`-f` for **output path** |
| 28 | +defaults to a new timestamped file (`-f`) in the (`-o`) `summaries/` directory from the `.` context. The `-i` and `-x` are used to |
| 29 | +define what to <b>i</b>nclude and e<b>x</b>clude various file extensions like `go,ts,py` etc.. The `-s` is used to |
| 30 | +**skip** over substrings within a scanned path. Dotfiles can completely be ignored by all paths by using `-ndf` as a flag. |
| 31 | + |
| 32 | +Performance of the application can be tuned using the `-mf=<int>` to assign **Max Files** that will concurrently be |
| 33 | +processed. The default is 369. The `-max=<int64>` represents a limit on how large the rendered summary can become. |
| 34 | + |
| 35 | +Once the program finishes running, the rendered file will look similar to: |
5 | 36 |
|
6 | 37 | ```md |
7 | 38 | # Project Summary |
8 | 39 |
|
9 | | -### `filename.ext` |
| 40 | +<AI prompt description> |
| 41 | + |
| 42 | +### `filename.go` |
| 43 | + |
| 44 | +<File Info> |
10 | 45 |
|
11 | 46 | <full source code> |
12 | 47 |
|
13 | | -### `filename.ext` |
| 48 | +### `filename.cs` |
| 49 | + |
| 50 | +<File Info> |
| 51 | + |
| 52 | +<full source code> |
14 | 53 |
|
15 | 54 | ... etc. |
16 | 55 |
|
@@ -49,19 +88,139 @@ cd ~/work/anotherProject |
49 | 88 | summarize -d anotherProject -o /home/user/summaries/anotherProject |
50 | 89 | ``` |
51 | 90 |
|
52 | | -Since `figtree` is designed to be very functional, its lightweight but feature |
53 | | -intense design through simple biology memetics makes it well suited for this program. |
54 | | - |
55 | 91 | ## Options |
56 | 92 |
|
57 | | -| Name | Argument | Type | Usage | |
58 | | -|-----------------|----------|----------|--------------------------------------------------------| |
59 | | -| `kSourceDir` | -d` | `string` | Source directory path. | |
60 | | -| `kOutputDir` | -o` | `string` | Summary destination output directory path. | |
61 | | -| `kExcludeExt` | `-x` | `list` | Comma separated string list of extensions to exclude. | |
62 | | -| `kSkipContains` | `-s` | `list` | Comma separated string to filename substrings to skip. | |
63 | | -| `kIncludeExt` | `-i` | `list` | Comma separated string of extensions to include. | |
64 | | -| `kFilename` | `-f` | `string` | Summary filename (writes to `-o` dir). | |
| 93 | +| Name | Argument | Type | Usage | |
| 94 | +|------------------|----------|----------|-------------------------------------------------------------------| |
| 95 | +| `kSourceDir` | `-d` | `string` | Source directory path. | |
| 96 | +| `kOutputDir` | `-o` | `string` | Summary destination output directory path. | |
| 97 | +| `kExcludeExt` | `-x` | `list` | Comma separated string list of extensions to exclude. | |
| 98 | +| `kSkipContains` | `-s` | `list` | Comma separated string to filename substrings to skip. | |
| 99 | +| `kIncludeExt` | `-i` | `list` | Comma separated string of extensions to include. | |
| 100 | +| `kFilename` | `-f` | `string` | Summary filename (writes to `-o` dir). | |
| 101 | +| `kVersion` | `-v` | `bool` | When `true`, the binary version is shown | |
| 102 | +| `kCompress` | `-gz` | `bool` | When `true`, **gzip** is used on the contents of the summary | |
| 103 | +| `kMaxOutputSize` | `-max` | `int64` | Maximum size of the generated summary allowed | |
| 104 | +| `kPrint` | `-print` | `bool` | Uses STDOUT to write contents of summary | |
| 105 | +| `kWrite` | `-write` | `bool` | Uses the filesystem to save contents of summary | |
| 106 | +| `kDebug` | `-debug` | `bool` | When `true`, extra content is written to STDOUT aside from report | |
| 107 | + |
| 108 | + |
| 109 | +## Environment |
| 110 | + |
| 111 | +| Environment Variable | Type | Default Value | Usage | |
| 112 | +|-----------------------------|----------|------------------------|-------------------------------------------------------------------------------------------------------------| |
| 113 | +| `SUMMARIZE_CONFIG_FILE` | `String` | `./config.yaml` | Contents of the YAML Configuration to use for [figtree](https://github.com/andreimerlescu/figtree). | |
| 114 | +| `SUMMARIZE_IGNORE_CONTAINS` | `List` | \* see below | Add items to this default list by creating your own new list here, they get concatenated. | |
| 115 | +| `SUMMARIZE_INCLUDE_EXT` | `List` | \*\* see below \* | Add extensions to include in the summary in this environment variable, comma separated. | |
| 116 | +| `SUMMARIZE_EXCLUDE_EXT` | `List` | \*\*\* see below \* \* | Add exclusionary extensions to ignore to this environment variable, comma separated. | |
| 117 | +| `SUMMARIZE_ALWAYS_PRINT` | `Bool` | `false` | When `true`, the `-print` will write the summary to STDOUT. | |
| 118 | +| `SUMMARIZE_ALWAYS_WRITE` | `Bool` | `false` | When `true`, the `-write` will write to a new file on the disk. | |
| 119 | +| `SUMMARIZE_ALWAYS_JSON` | `Bool` | `false` | When `true`, the `-json` flag will render JSON output to the console. | |
| 120 | +| `SUMMARIZE_ALWAYS_COMPRESS` | `Bool` | `false` | When `true`, the `-gz` flag will use gzip to compress the summary contents and appends `.gz` to the output. | |
| 121 | + |
| 122 | + |
| 123 | +### \* Default `SUMMARIZE_IGNORE_CONTAINS` Value |
| 124 | + |
| 125 | +```json |
| 126 | +7z,gz,xz,zst,zstd,bz,bz2,bzip2,zip,tar,rar,lz4,lzma,cab,arj,crt,cert,cer,key,pub,asc,pem,p12,pfx,jks,keystore,id_rsa,id_dsa,id_ed25519,id_ecdsa,gpg,pgp,exe,dll,so,dylib,bin,out,o,obj,a,lib,dSYM,class,pyc,pyo,__pycache__,jar,war,ear,apk,ipa,dex,odex,wasm,node,beam,elc,iso,img,dmg,vhd,vdi,vmdk,qcow2,db,sqlite,sqlite3,db3,mdb,accdb,sdf,ldb,log,trace,dump,crash,jpg,jpeg,png,gif,bmp,tiff,tif,webp,ico,svg,heic,heif,raw,cr2,nef,dng,mp3,wav,flac,aac,ogg,wma,m4a,opus,aiff,mp4,avi,mov,mkv,webm,flv,wmv,m4v,3gp,ogv,ttf,otf,woff,woff2,eot,fon,pfb,pfm,pdf,doc,docx,xls,xlsx,ppt,pptx,odt,ods,odp,rtf,suo,sln,user,ncb,pdb,ipch,ilk,tlog,idb,aps,res,iml,idea,vscode,project,classpath,factorypath,prefs,vcxproj,vcproj,filters,xcworkspace,xcuserstate,xcscheme,pbxproj,DS_Store,Thumbs.db,desktop.ini,lock,sum,resolved,tmp,temp,swp,swo,bak,backup,orig,rej,patch,~,old,new,part,incomplete,map,min.js,min.css,bundle.js,bundle.css,chunk.js,dat,data,cache,pid,sock,pack,idx,rev,pickle,pkl,npy,npz,mat,rdata,rds |
| 127 | +``` |
| 128 | + |
| 129 | +```go |
| 130 | + |
| 131 | +// defaultExclude are the -exc list of extensions that will be skipped automatically |
| 132 | +defaultExclude = []string{ |
| 133 | + // Compressed archives |
| 134 | + "7z", "gz", "xz", "zst", "zstd", "bz", "bz2", "bzip2", "zip", "tar", "rar", "lz4", "lzma", "cab", "arj", |
| 135 | + |
| 136 | + // Encryption, certificates, and sensitive keys |
| 137 | + "crt", "cert", "cer", "key", "pub", "asc", "pem", "p12", "pfx", "jks", "keystore", |
| 138 | + "id_rsa", "id_dsa", "id_ed25519", "id_ecdsa", "gpg", "pgp", |
| 139 | + |
| 140 | + // Binary & executable artifacts |
| 141 | + "exe", "dll", "so", "dylib", "bin", "out", "o", "obj", "a", "lib", "dSYM", |
| 142 | + "class", "pyc", "pyo", "__pycache__", |
| 143 | + "jar", "war", "ear", "apk", "ipa", "dex", "odex", |
| 144 | + "wasm", "node", "beam", "elc", |
| 145 | + |
| 146 | + // System and disk images |
| 147 | + "iso", "img", "dmg", "vhd", "vdi", "vmdk", "qcow2", |
| 148 | + |
| 149 | + // Database files |
| 150 | + "db", "sqlite", "sqlite3", "db3", "mdb", "accdb", "sdf", "ldb", |
| 151 | + |
| 152 | + // Log files |
| 153 | + "log", "trace", "dump", "crash", |
| 154 | + |
| 155 | + // Media files - Images |
| 156 | + "jpg", "jpeg", "png", "gif", "bmp", "tiff", "tif", "webp", "ico", "svg", "heic", "heif", "raw", "cr2", "nef", "dng", |
| 157 | + |
| 158 | + // Media files - Audio |
| 159 | + "mp3", "wav", "flac", "aac", "ogg", "wma", "m4a", "opus", "aiff", |
| 160 | + |
| 161 | + // Media files - Video |
| 162 | + "mp4", "avi", "mov", "mkv", "webm", "flv", "wmv", "m4v", "3gp", "ogv", |
| 163 | + |
| 164 | + // Font files |
| 165 | + "ttf", "otf", "woff", "woff2", "eot", "fon", "pfb", "pfm", |
| 166 | + |
| 167 | + // Document formats (typically not source code) |
| 168 | + "pdf", "doc", "docx", "xls", "xlsx", "ppt", "pptx", "odt", "ods", "odp", "rtf", |
| 169 | + |
| 170 | + // IDE/Editor/Tooling artifacts |
| 171 | + "suo", "sln", "user", "ncb", "pdb", "ipch", "ilk", "tlog", "idb", "aps", "res", |
| 172 | + "iml", "idea", "vscode", "project", "classpath", "factorypath", "prefs", |
| 173 | + "vcxproj", "vcproj", "filters", "xcworkspace", "xcuserstate", "xcscheme", "pbxproj", |
| 174 | + "DS_Store", "Thumbs.db", "desktop.ini", |
| 175 | + |
| 176 | + // Package manager and build artifacts |
| 177 | + "lock", "sum", "resolved", // package-lock.json, go.sum, yarn.lock, etc. |
| 178 | + |
| 179 | + // Temporary and backup files |
| 180 | + "tmp", "temp", "swp", "swo", "bak", "backup", "orig", "rej", "patch", |
| 181 | + "~", "old", "new", "part", "incomplete", |
| 182 | + |
| 183 | + // Source maps and minified files (usually generated) |
| 184 | + "map", "min.js", "min.css", "bundle.js", "bundle.css", "chunk.js", |
| 185 | + |
| 186 | + // Configuration that's typically binary or generated |
| 187 | + "dat", "data", "cache", "pid", "sock", |
| 188 | + |
| 189 | + // Version control artifacts (though usually in ignored directories) |
| 190 | + "pack", "idx", "rev", |
| 191 | + |
| 192 | + // Other binary formats |
| 193 | + "pickle", "pkl", "npy", "npz", "mat", "rdata", "rds", |
| 194 | +} |
| 195 | + |
| 196 | +``` |
| 197 | + |
| 198 | +### \* \* Default `SUMMARIZE_INCLUDE_EXT` |
| 199 | + |
| 200 | +```json |
| 201 | +go,ts,tf,sh,py,js,Makefile,mod,Dockerfile,dockerignore,gitignore,esconfigs,md |
| 202 | +``` |
| 203 | + |
| 204 | +```go |
| 205 | +// defaultInclude are the -inc list of extensions that will be included in the summary |
| 206 | +defaultInclude = []string{ |
| 207 | + "go", "ts", "tf", "sh", "py", "js", "Makefile", "mod", "Dockerfile", "dockerignore", "gitignore", "esconfigs", "md", |
| 208 | +} |
| 209 | +``` |
| 210 | + |
| 211 | +### \* \* \* Default `SUMMARIZE_EXCLUDE_EXT` |
| 212 | + |
| 213 | +```json |
| 214 | +.min.js,.min.css,.git/,.svn/,.vscode/,.vs/,.idea/,logs/,secrets/,.venv/,/site-packages,.terraform/,summaries/,node_modules/,/tmp,tmp/,logs/ |
| 215 | +``` |
| 216 | + |
| 217 | +```go |
| 218 | +// defaultAvoid are the -avoid list of substrings in file path names to avoid in the summary |
| 219 | +defaultAvoid = []string{ |
| 220 | + ".min.js", ".min.css", ".git/", ".svn/", ".vscode/", ".vs/", ".idea/", "logs/", "secrets/", |
| 221 | + ".venv/", "/site-packages", ".terraform/", "summaries/", "node_modules/", "/tmp", "tmp/", "logs/", |
| 222 | +} |
| 223 | +``` |
65 | 224 |
|
66 | 225 | ## Contribution |
67 | 226 |
|
|
0 commit comments