Skip to content

Commit ba69bce

Browse files
committed
updated man page (autogenerated from README, should probably be part of Makefile)
Signed-off-by: Tim Bray <tbray@textuality.com>
1 parent 43bfe8f commit ba69bce

File tree

1 file changed

+36
-18
lines changed

1 file changed

+36
-18
lines changed

doc/tf.1

+36-18
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,29 @@
11
.TH topfew
22
.PP
33
A program that finds and prints out the top few records in which a certain field or combination of fields occurs most frequently.
4+
.SH Examples
5+
.PP
6+
To find the IP address that most commonly hits your web site, given an Apache logfile named \fB\fCaccess_log\fR\&.
7+
.PP
8+
\fB\fCtf \-\-fields 1 access_log\fR
9+
.PP
10+
The same effect could be achieved with
11+
.PP
12+
\fB\fCawk '{print $1}' access_log | sort | uniq \-c | sort \-rn | head\fR
13+
.PP
14+
But \fBtf\fP is usually much faster.
15+
.PP
16+
Do the same, but exclude high\-traffic bots (omitting the filename).
17+
.PP
18+
\fB\fCtf \-\-fields 1 \-\-vgrep googlebot \-\-vgrep bingbot\fR
19+
.PP
20+
Most popular IP addresses from May 2020.
21+
.PP
22+
\fB\fCtf \-\-fields 1 \-grep '\\[../May/2020'\fR
23+
.PP
24+
Most popular hour/minute of the day for retrievals.
25+
.PP
26+
\fB\fCtf \-\-fields 4 \-\-sed "\\\\[" "" \-\-sed '^[^:]*:' '' \-\-sed ':..$' ''\fR
427
.SH Usage
528
.PP
629
.RS
@@ -69,29 +92,24 @@ The default is the result of the Go \fB\fCruntime.NumCPU()\fR calls and often pr
6992
\fB\fC\-h\fR, \fB\fC\-help\fR, \fB\fC\-\-help\fR
7093
.PP
7194
Describes the function and options of \fBtf\fP\&.
72-
.SH Examples
73-
.PP
74-
To find the IP address that most commonly hits your web site, given an Apache logfile named \fB\fCaccess_log\fR\&.
75-
.PP
76-
\fB\fCtf \-\-fields 1 access_log\fR
95+
.SH Performance issues
7796
.PP
78-
The same effect could be achieved with
97+
Since the effect of topfew can be exactly duplicated with a combination of \fB\fCawk\fR, \fB\fCgrep\fR, \fB\fCsed\fR and \fB\fCsort\fR, you wouldn’t be using it if you didn’t care about performance.
98+
Topfew is quite highly tuned and pushes your computer’s I/O subsystem and Go runtime hard.
99+
Therefore, the observed effects of combinations of options can vary dramatically from system to system.
79100
.PP
80-
\fB\fCawk '{print $1}' access_log | sort | uniq \-c | sort \-rn | head\fR
101+
For example, if I want to list the top records containing the string \fB\fCexample\fR from a file named \fB\fCbig\-file\fR I could do either of the following:
81102
.PP
82-
But \fBtf\fP is usually much faster.
83-
.PP
84-
Do the same, but exclude high\-traffic bots (omitting the filename).
85-
.PP
86-
\fB\fCtf \-fields 1 \-vgrep googlebot \-vgrep bingbot\fR
87-
.PP
88-
Most popular IP addresses from May 2020.
89-
.PP
90-
\fB\fCtf \-fields 1 \-grep '\\[../May/2020'\fR
103+
.RS
104+
.nf
105+
tf \-g example big\-file
106+
grep example big\-file | tf
107+
.fi
108+
.RE
91109
.PP
92-
Most popular hour/minute of the day for retrievals.
110+
When I benchmark topfew on a modern Apple\-Silicon Mac and an elderly spinning\-rust Linux VPS, I observe that the first option is faster on Mac, the second on Linux.
93111
.PP
94-
\fB\fCtf \-fields 4 \-sed "\\\\[" "" \-sed '^[^:]*:' '' \-sed ':..$' ''\fR
112+
Only one performance issue is uncomplicated: Topfew will \fBalways\fP run faster on a named file than a standard\-input stream.
95113
.SH Credits
96114
.PP
97115
Tim Bray created version 0.1 of Topfew, and the path toward 1.0 was based chiefly on ideas stolen from Dirkjan Ochtman and contributed by Simon Fell.

0 commit comments

Comments
 (0)