Skip to content

Commit 37cfb63

Browse files
updated examples and descriptions
1 parent 6ed01ad commit 37cfb63

File tree

3 files changed

+22
-19
lines changed

3 files changed

+22
-19
lines changed

searchindex.js

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

searchindex.json

Lines changed: 1 addition & 1 deletion
Large diffs are not rendered by default.

searching-files-and-filenames.html

Lines changed: 20 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -28,19 +28,22 @@
2828
document.getElementById('sidebar').setAttribute('aria-hidden', sidebar !== 'visible');
2929
Array.from(document.querySelectorAll('#sidebar a')).forEach(function(link) {
3030
link.setAttribute('tabIndex', sidebar === 'visible' ? 0 : -1);
31-
});</script><div class=content id=content><main><div class=sidetoc><nav class=pagetoc></nav></div><h1 id=searching-files-and-filenames><a class=header href=#searching-files-and-filenames>Searching Files and Filenames</a></h1><p>In this chapter, you'll learn how to search file contents based on literal strings or regular expressions. After that, you'll learn how to locate files based on their names and other properties like size, last modified, etc.<blockquote><p><img alt=info src=./images/info.svg> The <a href=https://github.com/learnbyexample/cli-computing/tree/master/example_files>example_files</a> directory has the scripts used in this chapter.</blockquote><h2 id=grep><a class=header href=#grep>grep</a></h2><p>Quoting from <a href=https://en.wikipedia.org/wiki/Grep>wikipedia</a>:<blockquote><p><strong><code>grep</code></strong> is a command-line utility for searching plain-text data sets for lines that match a regular expression. Its name comes from the <code>ed</code> command <code>g/re/p</code> (<strong>g</strong>lobally search a <strong>r</strong>egular <strong>e</strong>xpression and <strong>p</strong>rint), which has the same effect.</blockquote><p>The <code>grep</code> command has lots and lots of features, so much so that I wrote <a href=https://github.com/learnbyexample/learn_gnugrep_ripgrep>a book</a> about it. The most common usage is filtering lines from the input using a regular expression (regexp).<h3 id=common-options><a class=header href=#common-options>Common options</a></h3><p>Commonly used options are shown below. Examples will be discussed in later sections.<ul><li><code>--color=auto</code> highlight the matching portions, filenames, line numbers, etc using colors<li><code>-i</code> ignore case while matching<li><code>-v</code> print non-matching lines<li><code>-n</code> prefix line numbers for matching lines<li><code>-c</code> display only the count of number of matching lines<li><code>-l</code> print only the filenames matching the given expression<li><code>-L</code> print filenames NOT matching the pattern<li><code>-w</code> match pattern only as whole words<li><code>-x</code> match pattern only as whole lines<li><code>-F</code> interpret pattern as a fixed string (i.e. not a regular expression)<li><code>-o</code> print only matching parts<li><code>-A N</code> print matching line and <code>N</code> number of lines after the matched line<li><code>-B N</code> print matching line and <code>N</code> number of lines before the matched line<li><code>-C N</code> print matching line and <code>N</code> number of lines before and after the matched line<li><code>-m N</code> print a maximum of <code>N</code> matching lines<li><code>-q</code> no standard output, quit immediately if match found, useful in scripts<li><code>-s</code> suppress error messages, useful in scripts<li><code>-r</code> recursively search all files in the specified input folders (by default searches current directory)<li><code>-R</code> like <code>-r</code>, but follows symbolic links as well<li><code>-h</code> do not prefix filename for matching lines (default behavior for single input file)<li><code>-H</code> prefix filename for matching lines (default behavior for multiple input files)</ul><h3 id=literal-search><a class=header href=#literal-search>Literal search</a></h3><p>All of the following examples would be suited for <code>-F</code> option as these do not use regular expressions. <code>grep</code> is smart enough to do the right thing in such cases.<pre><code class=language-bash># lines containing 'an'
32-
$ printf 'apple\nbanana\nmango' | grep 'an'
31+
});</script><div class=content id=content><main><div class=sidetoc><nav class=pagetoc></nav></div><h1 id=searching-files-and-filenames><a class=header href=#searching-files-and-filenames>Searching Files and Filenames</a></h1><p>This chapter will show how to search file contents based on literal strings or regular expressions. After that, you'll learn how to locate files based on their names and other properties like size, last modified, etc.<blockquote><p><img alt=info src=./images/info.svg> The <a href=https://github.com/learnbyexample/cli-computing/tree/master/example_files>example_files</a> directory has the scripts used in this chapter.</blockquote><h2 id=grep><a class=header href=#grep>grep</a></h2><p>Quoting from <a href=https://en.wikipedia.org/wiki/Grep>wikipedia</a>:<blockquote><p><strong><code>grep</code></strong> is a command-line utility for searching plain-text data sets for lines that match a regular expression. Its name comes from the <code>ed</code> command <code>g/re/p</code> (<strong>g</strong>lobally search a <strong>r</strong>egular <strong>e</strong>xpression and <strong>p</strong>rint), which has the same effect.</blockquote><p>The <code>grep</code> command has lots and lots of features, so much so that I wrote <a href=https://github.com/learnbyexample/learn_gnugrep_ripgrep>a book</a> about it. The most common usage is filtering lines from the input using a regular expression (regexp).<h3 id=common-options><a class=header href=#common-options>Common options</a></h3><p>Commonly used options are shown below. Examples will be discussed in later sections.<ul><li><code>--color=auto</code> highlight the matching portions, filenames, line numbers, etc using colors<li><code>-i</code> ignore case while matching<li><code>-v</code> print non-matching lines<li><code>-n</code> prefix line numbers for matching lines<li><code>-c</code> display only the count of number of matching lines<li><code>-l</code> print only the filenames matching the given expression<li><code>-L</code> print filenames NOT matching the pattern<li><code>-w</code> match pattern only as whole words<li><code>-x</code> match pattern only as whole lines<li><code>-F</code> interpret pattern as a fixed string (i.e. not a regular expression)<li><code>-o</code> print only matching parts<li><code>-A N</code> print matching line and <code>N</code> number of lines after the matched line<li><code>-B N</code> print matching line and <code>N</code> number of lines before the matched line<li><code>-C N</code> print matching line and <code>N</code> number of lines before and after the matched line<li><code>-m N</code> print a maximum of <code>N</code> matching lines<li><code>-q</code> no standard output, quit immediately if match found, useful in scripts<li><code>-s</code> suppress error messages, useful in scripts<li><code>-r</code> recursively search all files in the specified input folders (by default searches current directory)<li><code>-R</code> like <code>-r</code>, but follows symbolic links as well<li><code>-h</code> do not prefix filename for matching lines (default behavior for single input file)<li><code>-H</code> prefix filename for matching lines (default behavior for multiple input files)</ul><h3 id=literal-search><a class=header href=#literal-search>Literal search</a></h3><p>The following examples would all be suited for <code>-F</code> option as these do not use regular expressions. <code>grep</code> is smart enough to do the right thing in such cases.<pre><code class=language-bash># lines containing 'an'
32+
$ printf 'apple\nbanana\nmango\nfig\ntango\n' | grep 'an'
3333
banana
3434
mango
35+
tango
3536

3637
# case insensitive matching
37-
$ printf 'Cat\ncut\ncOnCaT\n' | grep -i 'cat'
38+
$ printf 'Cat\ncut\ncOnCaT\nfour cats\n' | grep -i 'cat'
3839
Cat
3940
cOnCaT
41+
four cats
4042

4143
# match only whole words
42-
$ printf 'par value\nheir apparent\n' | grep -w 'par'
44+
$ printf 'par value\nheir apparent\ntar-par' | grep -w 'par'
4345
par value
46+
tar-par
4447

4548
# count empty lines
4649
$ printf 'hi\n\nhello\n\n\n\nbye\n' | grep -cx ''
@@ -91,29 +94,29 @@
9194
teal
9295
light blue
9396
brown
94-
</code></pre><h3 id=perl-compatible-regular-expression><a class=header href=#perl-compatible-regular-expression>Perl Compatible Regular Expression</a></h3><p>PCRE has lot more features compared to BRE/ERE. Here are some examples:<pre><code class=language-bash># numbers >= 100, uses possessive quantifiers
97+
</code></pre><h3 id=perl-compatible-regular-expression><a class=header href=#perl-compatible-regular-expression>Perl Compatible Regular Expression</a></h3><p>PCRE has many advaced features compared to BRE/ERE. Here are some examples:<pre><code class=language-bash># numbers >= 100, uses possessive quantifiers
9598
$ echo '0501 035 154 12 26 98234' | grep -oP '0*+\d{3,}'
9699
0501
97100
154
98101
98234
99102

100103
# extract digits only if preceded by =
101-
$ echo '100 foo=42, bar=314' | grep -oP '=\K\d+'
104+
$ echo '100 apple=42, fig=314 red:255' | grep -oP '=\K\d+'
102105
42
103106
314
104107

105-
# all digits and optional hyphen combo from the start of string
106-
$ echo '123-87-593 42 foo' | grep -oP '\G\d+-?'
108+
# all digits and optional hyphen combo from the start of the line
109+
$ echo '123-87-593 42 fig 314-12-111' | grep -oP '\G\d+-?'
107110
123-
108111
87-
109112
593
110113

111114
# all whole words except 'bat' and 'map'
112-
$ echo 'car bat cod map combat' | grep -oP '\b(bat|map)\b(*SKIP)(*F)|\w+'
113-
car
115+
$ echo 'car2 bat cod map combat' | grep -oP '\b(bat|map)\b(*SKIP)(*F)|\w+'
116+
car2
114117
cod
115118
combat
116-
</code></pre><h3 id=recursive-search><a class=header href=#recursive-search>Recursive search</a></h3><p>You can use the <code>-r</code> option to search recursively within the specified directories. By default, the current directory will be searched. Use <code>-R</code> if you want symbolic links found within the input directories to be followed as well. You do not need <code>-R</code> option for specifying symbolic links as arguments.<p>Here are some basic examples. Recursive search will work as if <code>-H</code> option was specified as well, even if only one file was matched. Also, hidden files are included by default.<pre><code class=language-bash># change to the 'scripts' directory and source the 'grep.sh' script
119+
</code></pre><p>See <code>man pcrepattern</code> or <a href=https://www.pcre.org/original/doc/html/pcrepattern.html>PCRE online manual</a> for documentation.<h3 id=recursive-search><a class=header href=#recursive-search>Recursive search</a></h3><p>You can use the <code>-r</code> option to search recursively within the specified directories. By default, the current directory will be searched. Use <code>-R</code> if you want symbolic links found within the input directories to be followed as well. You do not need <code>-R</code> option for specifying symbolic links as arguments.<p>Here are some basic examples. Recursive search will work as if <code>-H</code> option was specified as well, even if only one file was matched. Also, hidden files are included by default.<pre><code class=language-bash># change to the 'scripts' directory and source the 'grep.sh' script
117120
$ source grep.sh
118121
$ ls -AF
119122
backups/ colors_1 colors_2 .hidden projects/
@@ -179,14 +182,14 @@
179182

180183
# using -r option avoids running the command in such cases
181184
$ grep -rlZ 'violet' | xargs -r0 grep -L 'brown'
182-
</code></pre></blockquote><blockquote><p><img alt=warning src=images/warning.svg> <img alt=warning src=images/warning.svg> Do not use <code>xargs -P</code> to combine the output of parallel runs, as you are likely to get a mangled result. The <a href=https://www.gnu.org/software/parallel/>parallel</a> command would be a better option. See <a href=https://unix.stackexchange.com/q/104778/109046>unix.stackexchange: xargs vs parallel</a> for more details. See also <a href=https://unix.stackexchange.com/q/24954/109046>unix.stackexchange: when to use xargs</a>.</blockquote><h3 id=further-reading><a class=header href=#further-reading>Further Reading</a></h3><ul><li>My ebook <a href=https://github.com/learnbyexample/learn_gnugrep_ripgrep>GNU GREP and RIPGREP</a> <ul><li>See also my blog post <a href=https://learnbyexample.github.io/gnu-bre-ere-cheatsheet/>GNU BRE/ERE cheatsheet</a></ul><li><a href=https://lists.freebsd.org/pipermail/freebsd-current/2010-August/019310.html>Why GNU grep is fast</a><li><a href=https://unix.stackexchange.com/q/131535/109046>unix.stackexchange: grep -r vs find+grep</a></ul><h2 id=find><a class=header href=#find>find</a></h2><p>The <code>find</code> command has comprehensive features to narrow down files and directories based on name, size, date and so on. And more importantly, <code>find</code> helps you to perform actions on such filtered files.<h3 id=filenames><a class=header href=#filenames>Filenames</a></h3><p>By default, you'll get every entry (including hidden ones) in the current directory and sub-directories when you use <code>find</code> without any options or paths. To search within specific path(s), they should be immediately mentioned after <code>find</code>, i.e. before any options.<pre><code class=language-bash># change to the 'scripts' directory and source the 'find.sh' script
185+
</code></pre></blockquote><blockquote><p><img alt=warning src=images/warning.svg> <img alt=warning src=images/warning.svg> Do not use <code>xargs -P</code> to combine the output of parallel runs, as you are likely to get a mangled result. The <a href=https://www.gnu.org/software/parallel/>parallel</a> command would be a better option. See <a href=https://unix.stackexchange.com/q/104778/109046>unix.stackexchange: xargs vs parallel</a> for more details. See also <a href=https://unix.stackexchange.com/q/24954/109046>unix.stackexchange: when to use xargs</a>.</blockquote><h3 id=further-reading><a class=header href=#further-reading>Further Reading</a></h3><ul><li>My ebook <a href=https://github.com/learnbyexample/learn_gnugrep_ripgrep>GNU GREP and RIPGREP</a> <ul><li>See also my blog post <a href=https://learnbyexample.github.io/gnu-bre-ere-cheatsheet/>GNU BRE/ERE cheatsheet</a></ul><li><a href=https://lists.freebsd.org/pipermail/freebsd-current/2010-August/019310.html>Why GNU grep is fast</a><li><a href=https://unix.stackexchange.com/q/131535/109046>unix.stackexchange: grep -r vs find+grep</a></ul><h2 id=find><a class=header href=#find>find</a></h2><p>The <code>find</code> command has comprehensive features to filter files and directories based on their name, size, timestamp and so on. And more importantly, <code>find</code> helps you to perform actions on such filtered files.<h3 id=filenames><a class=header href=#filenames>Filenames</a></h3><p>By default, you'll get every entry (including hidden ones) in the current directory and sub-directories when you use <code>find</code> without any options or paths. To search within specific path(s), they should be immediately mentioned after <code>find</code>, i.e. before any options.<pre><code class=language-bash># change to the 'scripts' directory and source the 'find.sh' script
183186
$ source find.sh
184187
$ ls -F
185188
backups/ hello_world.py* ip.txt report.log todos/
186189
errors.log hi.sh* projects/ scripts@
187190

188191
$ cd projects
189-
# this is same as: find .
192+
# same as: find .
190193
$ find
191194
.
192195
./.venv
@@ -201,7 +204,7 @@
201204
todos/books.txt
202205
todos/TRIP.txt
203206
todos/wow.txt
204-
</code></pre><blockquote><p><img alt=info src=./images/info.svg> Note that symbolic links won't be followed by default. You can use <code>-L</code> option for such cases.</blockquote><p>To match filenames based on a particular criteria, you can use wildcards or regular expressions. For wildcards, you can use <code>-name</code> or the case-insensitive version <code>-iname</code>. This will match only the basename, so you'll get a warning if you use <code>/</code> as part of the pattern. You can use <code>-path</code> and <code>-ipath</code> if you need to include <code>/</code> as well in the pattern. Unlike <code>grep</code>, the pattern should always match the entire basename (I think this is because there are no start/end anchors in globs).<pre><code class=language-bash># filenames ending with '.log'
207+
</code></pre><blockquote><p><img alt=info src=./images/info.svg> Note that symbolic links won't be followed by default. You can use <code>-L</code> option for such cases.</blockquote><p>To match filenames based on a particular criteria, you can use wildcards or regular expressions. For wildcards, you can use <code>-name</code> or the case-insensitive version <code>-iname</code>. These will match only the basename, so you'll get a warning if you use <code>/</code> as part of the pattern. You can use <code>-path</code> and <code>-ipath</code> if you need to include <code>/</code> as well in the pattern. Unlike <code>grep</code>, the glob pattern is matched against the entire basename (as there are no start/end anchors in globs).<pre><code class=language-bash># filenames ending with '.log'
205208
# 'find .' indicates current working directory (CWD) as the path to search
206209
$ find . -name '*.log'
207210
./report.log
@@ -222,14 +225,14 @@
222225
backups
223226
backups/bookmarks.html
224227
todos/books.txt
225-
</code></pre><p>You can use the <code>-not</code> operator to invert the matching condition:<pre><code class=language-bash># except filenames containing uppercase alphabets
228+
</code></pre><p>You can use the <code>-not</code> (or <code>!</code>) operator to invert the matching condition:<pre><code class=language-bash># same as: find todos ! -name '*[A-Z]*'
226229
$ find todos -not -name '*[A-Z]*'
227230
todos
228231
todos/books.txt
229232
todos/wow.txt
230233
</code></pre><p>You can use <code>-regex</code> and <code>-iregex</code> (case-insensitive) to match filenames based on regular expressions. In this case, the pattern will match the entire path, so use of <code>/</code> is possible without needing to use special options. The default regexp flavor is <code>emacs</code> which you can change by using the <code>-regextype</code> option.<pre><code class=language-bash># filename containing only uppercase alphabets and file extension is '.txt'
231234
# note the use of '.*/' to match the entire file path
232-
$ find -regex '.*/[A-Z]+.txt'
235+
$ find -regex '.*/[A-Z]+\.txt'
233236
./todos/TRIP.txt
234237

235238
# here 'egrep' flavor is being used
@@ -365,7 +368,7 @@
365368
$ find -type f -not -path './backups/*' -prune -name '*.log'
366369
./report.log
367370
./errors.log
368-
</code></pre><p>Using <code>-not -path '*/.git/*' -prune</code> is a common practice when dealing with Git based version control projects.<h3 id=find-and-xargs><a class=header href=#find-and-xargs>find and xargs</a></h3><p>Similar to <code>grep -Z</code> and <code>xargs -0</code> combination seen earlier, you can use <code>find -print0</code> and <code>xargs -0</code> combination. The <code>-exec</code> option is sufficient for most use cases, but <code>xargs -P</code> (or the <a href=https://www.gnu.org/software/parallel/>parallel</a> command) can be handy if you need parallel execution for performance reasons.<p>Here's an example of passing filtered files to <code>sed</code> (<strong>s</strong>tream <strong>ed</strong>itor, will be discussed in the <a href=./multipurpose-text-processing-tools.html>Multipurpose Text Processing Tools</a> chapter):<pre><code class=language-bash>$ find -name '*.log'
371+
</code></pre><p>Using <code>-not -path '*/.git/*' -prune</code> can be handy when dealing with Git based version control projects.<h3 id=find-and-xargs><a class=header href=#find-and-xargs>find and xargs</a></h3><p>Similar to <code>grep -Z</code> and <code>xargs -0</code> combination seen earlier, you can use <code>find -print0</code> and <code>xargs -0</code> combination. The <code>-exec</code> option is sufficient for most use cases, but <code>xargs -P</code> (or the <a href=https://www.gnu.org/software/parallel/>parallel</a> command) can be handy if you need parallel execution for performance reasons.<p>Here's an example of passing filtered files to <code>sed</code> (<strong>s</strong>tream <strong>ed</strong>itor, will be discussed in the <a href=./multipurpose-text-processing-tools.html>Multipurpose Text Processing Tools</a> chapter):<pre><code class=language-bash>$ find -name '*.log'
369372
./report.log
370373
./backups/aug.log
371374
./backups/jan.log

0 commit comments

Comments
 (0)