This script attempts to find common errors in academic writings. This is focused only on academic writing in latex, but most things should work on any ASCII text. We don't attempt to do any sort of latex parsing currently(maybe someday).
Currently the script tries to find the following issues:
- passive : Passive voice, colored red by default.
- dups : Duplicate words: 'the the' across 2 lines, colored purple by default
- weasel : Weasel words like {various, many}, colored green by default
- abbr : Wrong abbreviations like i.e and et. al., colored blue by default
- typography: Common typography errors like \footnotes before a punctuation, numbers without comma, URLs not typeset with \url, and others. colored yellow default
- strunk : Issues that Strunk and White refer to in their classic. Currently, only has a sublist of words from Chapter IV. colored cyan by default.
The script accepts options via the standard UNIX style:
--no-{option}
where {option} is one of the things in bold in the above list. The script also ignores lines beginning with a % as a helper. It outputs filename and line number with the offending issues marked in color.
You can also send in a -d to disable all checks. Checks will need to be explicitly enabled. Thus -d --abbr will only look for abbr errors.
Colors can be specified with
--{option}_color={color} --def_color={color}
where {option} is one of the options above, and {color} is one of
('black','red','green','yellow','blue','purple','cyan','white').
You can also prefix the color name with dark to get a darker shade. The def_color option sets the color of unmarked text.
Thus,
--passive_color=darkgreen
will mark passive words with dark green color.
The script can be called in multiple ways:
- ./checkwriting <files>
- ./checkwriting <directory> : In this case the script uses all *.tex and *.bbl files in the directory. If it doesn't find any, then it waits for input from stdin.
- ./checkwriting : With no files, the script waits for diff style input on STDIN. I use it this way often. Say, you made some changes to the manuscript. Just do git diff | ./checkwriting and you only have to look at new errors.
Some of the warnings are obvious, some aren't. The non-obvious ones are discussed here.
- The typography warning: "add a @" is to let LaTeX know when its end of line. LaTeX assumes that a period ends a sentence, unless it follows a capital letter in which case it assumes that it is an abbreviation. So to let LaTex know that 'iOS.' is really end of sentence, write 'iOS@.'
The original idea and code for this came from Matt Might's blog
Here are some other links that might be useful (and might be integrated into awc someday):
- Common Errors in Technical Writing - John Owens , UC Davis
- Effective Scientific Electronic Publishing - Markus Kuhn, Cambridge
- Strunk and White is now free online
- How to write a great research paper - Simon Peyton Jones, MSR Cambridge
- Everyone Can Write Better (and You Are No Exception) - Herbert H. Clark
- How to do Good Research and Get it published
- Writer's Diet Test
And to put it all in perspective, Stephen Fry's monologue on Language
Tip: If you want to pipe the output, less -R is useful to maintain the colors.