Skip to content

Commit 6fe3db4

Browse files
updated examples, descriptions and exercises
1 parent 2774945 commit 6fe3db4

7 files changed

+48
-45
lines changed

assorted-text-processing-tools.html

Lines changed: 32 additions & 29 deletions
Large diffs are not rendered by default.

comparing-files.html

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@
2828
document.getElementById('sidebar').setAttribute('aria-hidden', sidebar !== 'visible');
2929
Array.from(document.querySelectorAll('#sidebar a')).forEach(function(link) {
3030
link.setAttribute('tabIndex', sidebar === 'visible' ? 0 : -1);
31-
});</script><div class=content id=content><main><div class=sidetoc><nav class=pagetoc></nav></div><h1 id=comparing-files><a class=header href=#comparing-files>Comparing Files</a></h1><p>In this chapter, you'll learn how to find and report differences between the contents of two files.<blockquote><p><img alt=info src=./images/info.svg> The <a href=https://github.com/learnbyexample/cli-computing/tree/master/example_files>example_files</a> directory has the sample input files used in this chapter.</blockquote><h2 id=cmp><a class=header href=#cmp>cmp</a></h2><p>The <code>cmp</code> command is useful to compare text and binary files. If the two files are same, no output is displayed and exit status is <code>0</code>. If there is a difference, it prints the first difference with details like line number and byte location and exit status will be <code>1</code>.<pre><code class=language-bash>$ mkdir practice_cmp
31+
});</script><div class=content id=content><main><div class=sidetoc><nav class=pagetoc></nav></div><h1 id=comparing-files><a class=header href=#comparing-files>Comparing Files</a></h1><p>In this chapter, you'll learn how to find and report differences between the contents of two files.<blockquote><p><img alt=info src=./images/info.svg> The <a href=https://github.com/learnbyexample/cli-computing/tree/master/example_files>example_files</a> directory has the sample input files used in this chapter.</blockquote><h2 id=cmp><a class=header href=#cmp>cmp</a></h2><p>The <code>cmp</code> command is useful to compare text and binary files. If the two files are same, no output is displayed and exit status is <code>0</code>. If there is a difference, it prints the first difference with details like line number and byte location and the exit status will be <code>1</code>.<pre><code class=language-bash>$ mkdir practice_cmp
3232
$ cd practice_cmp
3333
$ echo 'hello' > x1.txt
3434
$ cp x{1,2}.txt
@@ -44,7 +44,7 @@
4444
x1.txt x3.txt differ: byte 6, line 1
4545
$ echo $?
4646
1
47-
</code></pre><blockquote><p><img alt=info src=./images/info.svg> Use the <code>-s</code> option to suppress the output when you just need the exit status. The <code>-i</code> option will allow you to skip initial bytes from the input.</blockquote><h2 id=diff><a class=header href=#diff>diff</a></h2><p>Useful to find differences between text files. All the differences are printed, which might not be desirable for long files.<h3 id=common-options><a class=header href=#common-options>Common options</a></h3><ul><li><code>-i</code> ignore case while comparing<li><code>-w</code> ignore white-spaces<li><code>-b</code> ignore changes in the amount of whitespace<li><code>-B</code> ignore only blank lines<li><code>-E</code> ignore changes due to tab expansion<li><code>-z</code> ignore trailing whitespaces at the end of line<li><code>-y</code> two column output<li><code>-r</code> recursively compare files between the two directories specified<li><code>-s</code> convey message when two files are same<li><code>-q</code> report if files differ, not the details of differences</ul><h3 id=default-diff><a class=header href=#default-diff>Default diff</a></h3><p>By default, the <code>diff</code> output shows lines from the first file input starting with <code><</code> and lines from the second file input starts with <code>></code>. Between the two file contents, <code>---</code> is used as the separator. Each difference is prefixed by a command that indicates the differences (these commands are understood by tools like <code>patch</code>).<pre><code class=language-bash># change to the 'example_files/text_files' directory
47+
</code></pre><blockquote><p><img alt=info src=./images/info.svg> Use the <code>-s</code> option to suppress the output when you just need the exit status. The <code>-i</code> option will allow you to skip initial bytes from the input.</blockquote><h2 id=diff><a class=header href=#diff>diff</a></h2><p>Useful to find differences between text files. All the differences are printed, which might not be desirable for long files.<h3 id=common-options><a class=header href=#common-options>Common options</a></h3><p>Commonly used options are shown below. Examples will be discussed in later sections.<ul><li><code>-i</code> ignore case while comparing<li><code>-w</code> ignore white-spaces<li><code>-b</code> ignore changes in the amount of whitespace<li><code>-B</code> ignore only blank lines<li><code>-E</code> ignore changes due to tab expansion<li><code>-z</code> ignore trailing whitespaces at the end of line<li><code>-y</code> two column output<li><code>-r</code> recursively compare files between the two directories specified<li><code>-s</code> convey message when two files are same<li><code>-q</code> report if files differ, not the details of differences</ul><h3 id=default-diff><a class=header href=#default-diff>Default diff</a></h3><p>By default, the <code>diff</code> output shows lines from the first file input prefixed with <code><</code> and lines from the second file input prefixed with <code>></code>. A line containing <code>---</code> is used as the group separator. Each difference is prefixed by a command that indicates the differences (these commands are understood by tools like <code>patch</code>).<pre><code class=language-bash># change to the 'example_files/text_files' directory
4848
# side-by-side view of sample input files
4949
$ paste f1.txt f2.txt
5050
1 1

searchindex.js

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

searchindex.json

Lines changed: 1 addition & 1 deletion
Large diffs are not rendered by default.

searching-files-and-filenames.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -377,7 +377,7 @@
377377
# for the filtered files, replace all occurrences of 'apple' with 'fig'
378378
# 'sed -i' will edit the files inplace, so no output on the terminal
379379
$ find -name '*.log' -print0 | xargs -r0 -n2 -P2 sed -i 's/apple/fig/g'
380-
</code></pre><p>In the above example, <code>-P2</code> is used to allow <code>xargs</code> to run two processes at a time (default is one process). You can use <code>-P0</code> to allow <code>xargs</code> to launch as many processes as possible. The <code>-n2</code> option is used to limit the number of file arguments passed to each <code>sed</code> call to <code>2</code>, otherwise <code>xargs</code> is likely to pass as many arguments as possible and thus reduce/negate the effect of parallelism. Note that the values used for <code>-n</code> and <code>-P</code> in the above illustration are just random examples, you'll have to fine tune them for your particular use case.<h3 id=further-reading-1><a class=header href=#further-reading-1>Further Reading</a></h3><ul><li><a href=https://mywiki.wooledge.org/UsingFind>mywiki.wooledge: using find</a><li><a href=https://unix.stackexchange.com/q/282762/109046>unix.stackexchange: find and tar example</a><li><a href=https://unix.stackexchange.com/q/321697/109046>unix.stackexchange: Why is looping over find's output bad practice?</a></ul><h2 id=locate><a class=header href=#locate>locate</a></h2><p><code>locate</code> is a faster alternative to the <code>find</code> command for searching files by name. It is based on a database, which gets updated by a <a href=https://en.wikipedia.org/wiki/Cron>cron</a> job. So, newer files may be not present in results unless you update the database. Use this command if it is available in your distro (for example, <code>sudo apt install mlocate</code> on Debian-like systems) and you remember some part of filename. Very useful if you have to search the entire filesystem in which case <code>find</code> command will take a very long time compared to <code>locate</code>.<p><strong>Examples</strong><ul><li><code>locate 'power'</code> print path of filenames containing <code>power</code> in the whole filesystem <ul><li>implicitly, <code>locate</code> would change the string to <code>*power*</code> as no globbing characters are present in the string specified</ul><li><code>locate -b '\power.log'</code> print path matching the string <code>power.log</code> exactly at the end of the path <ul><li><code>/home/learnbyexample/power.log</code> matches<li><code>/home/learnbyexample/lowpower.log'</code> will not match since there are other characters at the start of the filename<li>use of <code>\</code> prevents the search string from implicitly being replaced by <code>*power.log*</code></ul><li><code>locate -b '\proj_adder'</code> the <code>-b</code> option is also handy to print only the matching directory name, otherwise every file under that folder would also be displayed</ul><blockquote><p><img alt=info src=./images/info.svg> See also <a href=https://unix.stackexchange.com/q/60205/109046>unix.stackexchange: pros and cons of find and locate</a>.</blockquote><h2 id=exercises><a class=header href=#exercises>Exercises</a></h2><blockquote><p><img alt=info src=./images/info.svg> For <code>grep</code> exercises, use <a href=https://github.com/learnbyexample/cli-computing/tree/master/example_files/text_files>example_files/text_files</a> directory for input files, unless otherwise specified.</blockquote><blockquote><p><img alt=info src=./images/info.svg> For <code>find</code> exercises, use the <code>find.sh</code> script, unless otherwise specified.</blockquote><p><strong>1)</strong> Display lines containing <code>an</code> from the input files <code>blocks.txt</code>, <code>ip.txt</code> and <code>uniform.txt</code>. Show the results with and without filename prefix.<pre><code class=language-bash># ???
380+
</code></pre><p>In the above example, <code>-P2</code> is used to allow <code>xargs</code> to run two processes at a time (default is one process). You can use <code>-P0</code> to allow <code>xargs</code> to launch as many processes as possible. The <code>-n2</code> option is used to limit the number of file arguments passed to each <code>sed</code> call to <code>2</code>, otherwise <code>xargs</code> is likely to pass as many arguments as possible and thus reduce/negate the effect of parallelism. Note that the values used for <code>-n</code> and <code>-P</code> in the above illustration are just random examples, you'll have to fine tune them for your particular use case.<h3 id=further-reading-1><a class=header href=#further-reading-1>Further Reading</a></h3><ul><li><a href=https://mywiki.wooledge.org/UsingFind>mywiki.wooledge: using find</a><li><a href=https://unix.stackexchange.com/q/282762/109046>unix.stackexchange: find and tar example</a><li><a href=https://unix.stackexchange.com/q/321697/109046>unix.stackexchange: Why is looping over find's output bad practice?</a></ul><h2 id=locate><a class=header href=#locate>locate</a></h2><p><code>locate</code> is a faster alternative to the <code>find</code> command for searching files by name. It is based on a database, which gets updated by a <a href=https://en.wikipedia.org/wiki/Cron>cron</a> job. So, newer files may be not present in results unless you update the database. Use this command if it is available in your distro (for example, <code>sudo apt install mlocate</code> on Debian-like systems) and you remember some part of filename. Very useful if you have to search the entire filesystem in which case <code>find</code> command will take a very long time compared to <code>locate</code>.<p>Here are some examples:<ul><li><code>locate 'power'</code> print path of filenames containing <code>power</code> in the whole filesystem <ul><li>implicitly, <code>locate</code> would change the string to <code>*power*</code> as no globbing characters are present in the string specified</ul><li><code>locate -b '\power.log'</code> print path matching the string <code>power.log</code> exactly at the end of the path <ul><li><code>/home/learnbyexample/power.log</code> matches<li><code>/home/learnbyexample/lowpower.log'</code> will not match since there are other characters at the start of the filename<li>use of <code>\</code> prevents the search string from implicitly being replaced by <code>*power.log*</code></ul><li><code>locate -b '\proj_adder'</code> the <code>-b</code> option is also handy to print only the matching directory name, otherwise every file under that folder would also be displayed</ul><blockquote><p><img alt=info src=./images/info.svg> See also <a href=https://unix.stackexchange.com/q/60205/109046>unix.stackexchange: pros and cons of find and locate</a>.</blockquote><h2 id=exercises><a class=header href=#exercises>Exercises</a></h2><blockquote><p><img alt=info src=./images/info.svg> For <code>grep</code> exercises, use <a href=https://github.com/learnbyexample/cli-computing/tree/master/example_files/text_files>example_files/text_files</a> directory for input files, unless otherwise specified.</blockquote><blockquote><p><img alt=info src=./images/info.svg> For <code>find</code> exercises, use the <code>find.sh</code> script, unless otherwise specified.</blockquote><p><strong>1)</strong> Display lines containing <code>an</code> from the input files <code>blocks.txt</code>, <code>ip.txt</code> and <code>uniform.txt</code>. Show the results with and without filename prefix.<pre><code class=language-bash># ???
381381
blocks.txt:banana
382382
ip.txt:light orange
383383
uniform.txt:mango

0 commit comments

Comments
 (0)