|
1 | 1 | .TH topfew
|
2 | 2 | .PP
|
3 | 3 | A program that finds and prints out the top few records in which a certain field or combination of fields occurs most frequently.
|
| 4 | +.SH Examples |
| 5 | +.PP |
| 6 | +To find the IP address that most commonly hits your web site, given an Apache logfile named \fB\fCaccess_log\fR\&. |
| 7 | +.PP |
| 8 | +\fB\fCtf \-\-fields 1 access_log\fR |
| 9 | +.PP |
| 10 | +The same effect could be achieved with |
| 11 | +.PP |
| 12 | +\fB\fCawk '{print $1}' access_log | sort | uniq \-c | sort \-rn | head\fR |
| 13 | +.PP |
| 14 | +But \fBtf\fP is usually much faster. |
| 15 | +.PP |
| 16 | +Do the same, but exclude high\-traffic bots (omitting the filename). |
| 17 | +.PP |
| 18 | +\fB\fCtf \-\-fields 1 \-\-vgrep googlebot \-\-vgrep bingbot\fR |
| 19 | +.PP |
| 20 | +Most popular IP addresses from May 2020. |
| 21 | +.PP |
| 22 | +\fB\fCtf \-\-fields 1 \-grep '\\[../May/2020'\fR |
| 23 | +.PP |
| 24 | +Most popular hour/minute of the day for retrievals. |
| 25 | +.PP |
| 26 | +\fB\fCtf \-\-fields 4 \-\-sed "\\\\[" "" \-\-sed '^[^:]*:' '' \-\-sed ':..$' ''\fR |
4 | 27 | .SH Usage
|
5 | 28 | .PP
|
6 | 29 | .RS
|
@@ -69,29 +92,24 @@ The default is the result of the Go \fB\fCruntime.NumCPU()\fR calls and often pr
|
69 | 92 | \fB\fC\-h\fR, \fB\fC\-help\fR, \fB\fC\-\-help\fR
|
70 | 93 | .PP
|
71 | 94 | Describes the function and options of \fBtf\fP\&.
|
72 |
| -.SH Examples |
73 |
| -.PP |
74 |
| -To find the IP address that most commonly hits your web site, given an Apache logfile named \fB\fCaccess_log\fR\&. |
75 |
| -.PP |
76 |
| -\fB\fCtf \-\-fields 1 access_log\fR |
| 95 | +.SH Performance issues |
77 | 96 | .PP
|
78 |
| -The same effect could be achieved with |
| 97 | +Since the effect of topfew can be exactly duplicated with a combination of \fB\fCawk\fR, \fB\fCgrep\fR, \fB\fCsed\fR and \fB\fCsort\fR, you wouldn’t be using it if you didn’t care about performance. |
| 98 | +Topfew is quite highly tuned and pushes your computer’s I/O subsystem and Go runtime hard. |
| 99 | +Therefore, the observed effects of combinations of options can vary dramatically from system to system. |
79 | 100 | .PP
|
80 |
| -\fB\fCawk '{print $1}' access_log | sort | uniq \-c | sort \-rn | head\fR |
| 101 | +For example, if I want to list the top records containing the string \fB\fCexample\fR from a file named \fB\fCbig\-file\fR I could do either of the following: |
81 | 102 | .PP
|
82 |
| -But \fBtf\fP is usually much faster. |
83 |
| -.PP |
84 |
| -Do the same, but exclude high\-traffic bots (omitting the filename). |
85 |
| -.PP |
86 |
| -\fB\fCtf \-fields 1 \-vgrep googlebot \-vgrep bingbot\fR |
87 |
| -.PP |
88 |
| -Most popular IP addresses from May 2020. |
89 |
| -.PP |
90 |
| -\fB\fCtf \-fields 1 \-grep '\\[../May/2020'\fR |
| 103 | +.RS |
| 104 | +.nf |
| 105 | +tf \-g example big\-file |
| 106 | +grep example big\-file | tf |
| 107 | +.fi |
| 108 | +.RE |
91 | 109 | .PP
|
92 |
| -Most popular hour/minute of the day for retrievals. |
| 110 | +When I benchmark topfew on a modern Apple\-Silicon Mac and an elderly spinning\-rust Linux VPS, I observe that the first option is faster on Mac, the second on Linux. |
93 | 111 | .PP
|
94 |
| -\fB\fCtf \-fields 4 \-sed "\\\\[" "" \-sed '^[^:]*:' '' \-sed ':..$' ''\fR |
| 112 | +Only one performance issue is uncomplicated: Topfew will \fBalways\fP run faster on a named file than a standard\-input stream. |
95 | 113 | .SH Credits
|
96 | 114 | .PP
|
97 | 115 | Tim Bray created version 0.1 of Topfew, and the path toward 1.0 was based chiefly on ideas stolen from Dirkjan Ochtman and contributed by Simon Fell.
|
0 commit comments