Skip to content
/ creg Public

The creg application is a POSIX/GNU regular expression commandline tool for searching with patterns in text-strings or text-files.

Notifications You must be signed in to change notification settings

nowca/creg

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CREG

The creg application is a POSIX/GNU regular expression commandline tool for searching with patterns in text-strings or text-files. It implements the functions of the compact-regex.h extensions library. ( https://github.com/nowca/compact-regex )

  • fast regex testing
  • text replacement function
  • reads large text files (up to 8 MB or more) with parameter or redirected text stream
  • structured and colored display output with filters
  • file write export
  • different output formats and layouts (table, list, plain ASCII, CSV, JSON)
  • options of the regex.h library with extended functionalites
  • can be run on Linux, Windows , Mac and all GNU C compatible platforms

Contents


How to use

Basic use

user@pc:~$  creg "abc DEF xyz ABC 123" "\d+"
  • find digit string \d+ in the text abc DEF xyz ABC 123

user@pc:~$  creg -t "abc DEF xyz ABC 123" -r "abc" -f i
  • find string -r "abc" in the text -t "abc DEF xyz ABC 123"
  • -f i: flag (insensitive case)

user@pc:~$  creg --text "abc DEF xyz ABC 123" --regex "abc" --option-flags i
  • find string --regex "abc" in the text --text "abc DEF xyz ABC 123"
  • --option-flags i: flag (insensitive case)

Print layouts

user@pc:~$  creg -t "abc DEF xyz ABC 123" -r "[\w ]+[^0-9]+" -p plain -d r
  • find string of words without numbers -r "[\w ]+[^0-9]+" in the text -t "abc DEF xyz ABC 123"
  • -d r: display just the results
  • -p plain : just as text

user@pc:~$  creg -t "abc DEF xyz ABC 123" -r "[\w]+" -p json -d r
  • find all words -r "[\w]+" in the text -t "abc DEF xyz ABC 123"
  • -d r: display just the results
  • -p json : in csv-format

Replace

user@pc:~$  creg -t "abc DEF xyz ABC 123" -r "[a-z0-9]+" -x "###"
  • replace all words -r "[a-z0-9]+" with lowercase or numbers in the text -t "abc DEF xyz ABC 123" with the string -x "###"

user@pc:~$  creg -t "abc DEF xyz ABC 123" -r "(a)(b)(c)" -x "\3\2\1" -f gi
  • replace each letter of "abc" -r "(a)(b)(c)" with the reverse letters "cba" -x "\3\2\1" with the string ###
  • -f i: flag (insensitive case)

Examples

Searching for well know ports from redirected text stream

searching for well know ports

user@pc:~$ cat service-names-port-numbers.csv | ./creg -r "(\\d+);(.*UDP.*);(.*mail.*);" -c -f gein -d srp
  • display the file contents of service-names-port-numbers.csv with cat and read STDOUT with piping redirection
  • -r: match all UDP based protocols which contains the word mail with the options:
  • -c: colored output
  • -f gein: flags (global, extended, insensitive case, newline)
  • -d srp: display statistics, results and index postions

Display words from wordlist

searching for phrases in dictionary

user@pc:~$ ./creg -i ./example-files/oxford-word-list.txt -r "^(Ae.*ion) (.+\.) (.*)$" -p list -f gei -c -d sr
  • -i: read in the file ./example-files/oxford-word-list.txt
  • -r: match all lines (from ^ to $) with words, that start with Ae and end with ion with the options:
  • -c: colored output
  • -p list: list-format
  • -f gei: flags (global, extended, insensitive case)
  • -d sr: display statistics, results, without the index postions

Searching for ROOT keys in a windows reg file on Windows

searching for ROOT keys in a windows reg file

Z:\>creg.exe  /I "example-files\windows-formatted-regfile.reg" /R ".*HKEY-CLASSES_ROOT.*" /D TSR 
  • \I: read in the file example-files\windows-formatted-regfile.reg
  • \R: match all lines that contain the phrase "HKEY-CLASSES_ROOT" with the options:
  • \D TSR: display text, statistics, results, without the index postions

The input file can also be redirected in with the windows-cmd pipeline command:

Z:\>more port-numbers.csv | creg.exe /R "^.*mail.*$" /D sr /F gein /P list 
  • more port-numbers.csv |: show contents of the file and redirect it with |
  • \R: match all lines that contain the phrase "mail" with the options:
  • \D sr: display statistics, results, without the index postions
  • \F gein: flags (global, extended, insensitive case, newline)
  • \P list: short list format

Installation

The program can be compiled and copied to the /opt/ folder.

Just run:

user@pc:~$ make

and

user@pc:~$ sudo make install

Compilation

Linux

Build the example program by typing in:

user@pc:~$ make

...or compile it directly with the GNU-C-Compiler:

user@pc:~$ gcc -Wall -static creg.c -o creg
  • The GNU Extensions with the regex.h library are needed for successful compilation. Please take care of including the neccesary header and library files.

  • Use the -m32 flag to compile the program for 32 Bit systems.

  • Important note: The program will be compiled with the -static flag, to combine the libraries into the code, there will be some memory leaks showed in valgrind. These errors are supressed on dynamically linking by default. (https://stackoverflow.com/questions/7506134/valgrind-errors-when-linked-with-static-why)


Windows

To compile the program on windows, you will need a compiler version with the regex.h library, from GNU extensions included:

C:\Users\pcuser>gcc.exe -static -IC:\MinGW-W64\mingw32\opt\include creg.c -o creg.exe -LC:\MinGW-W64\mingw32\opt\lib -lregex
  • MinGW-W64 includes the regex.h library in the \opt\include and \opt\lib folders.
  • The paths of the header and library must be included with -I and -L, with an additional -lregex parameter at the end of the command.
  • -static can be used to make your project independend from libraries.
  • The path of gcc.exe must be added to the Windows PATH user-variable

MacOS

To compile the program on MacOS or OS X, you will need a compiler version with the regex.h library, from GNU extensions included:

  • There are several ways to install the GCC development tools on your Mac:

    • Xcode
    • Homebrew
    • MacPorts
    • sourcecode compilation
    • graphical package installer like Bower or MacUpdate
  • You need a GCC installation with the regex.h library (GNU Extensions).

  • For compiler options see Linux.


Commandline options

  • see -hc or --help` to read all the options

Usage:

creg [Commands] [Options]


Commands:

Command: Meaning:
-t <input-text>, --text <input-text> text input string
-r <expression>, --regex <expression> regular expression pattern
-x <replace-text>, --replace <replace-text> replacement text substring
-i <filename>, --input <filename> filepath to read in file
-o <filename>, --output <filename> filepath to write out file
-h, --help show help for commands

Options:

Data:
Command: Meaning:
-d <data>, --data <data> show output elements

<data>:

Argument: Meaning:
t input text
s statistics
r results
p match index positions

usage example:

-d tsrp or --data sr


Display Layout:
Command: Meaning:
-p <print-layout>, --print <print-layout> printing or file writing layout

<print-layout>:

Argument: Meaning:
table table
list short list
list-full full list
plain plain result data
csv comma-seperated values
json JavaScript Object Notation

Command: Meaning:
-c, --color display with ANSI colors

Option flags:
Command: Meaning:
-f <options>, --option-flags <options> option-flags for compilation

<options>:

Argument: Meaning:
g: global search for all matches in a text
e: extended use Extended Regular Expressions (ERE)
i: icase use insensitive case matching
m: multiline search in multiple lines
n: newline ignore the newline character
p: nosubexp ignore group matching with subexpressions
q: subexp match only subexpressions

usage example:

-f ge or --option-flags geinq

default options:

  • global, extended, newline (the default options are deactivated, if an option is set with the -f command)

Memory:
Command: Meaning:
-s <length>, --max-text-size <length> max input-text length in bytes, default: 8388608 bytes (8 MB)
-n <count>, --max-num-matches <count> max number of matches, default: 8192 matches

Supported Regular Expression operations

The program supports POSIX compatible Regular Expressions from regex.h with some extended functionalities, like single character classes.

Supported: Not supported:
Wildcard . Lazy *? +? ??
Character classes \d \D \w \W Negative Lookahead (?!)
POSIX character classes [:digit:] Negative Lookbehind (?<!)
Whitespace \s \S Positive Lookahead (?<=)
Character Sets [abc] Positive Lookbehind (?<=)
Escaping \
The Asterisk *
The Plus +
The Question Mark ?
Numeric Quantifier {n}
Range Quantifier {n,m}
Alternation ` `
Anchors ^ $
Capturing Groups a(b)c
Backreferences \1
ASCII and Unicode sequences

POSIX Standard

Basic Regular Syntax

Metacharacter Description
^ Matches the starting position within the string. In line-based tools, it matches the starting position of any line.
. Matches any single character (many applications exclude newlines, and exactly which characters are considered newlines is flavor-, character-encoding-, and platform-specific, but it is safe to assume that the line feed character is included). Within POSIX bracket expressions, the dot character matches a literal dot. For example, a.c matches "abc", etc., but [a.c] matches only "a", ".", or "c".
[ ] A bracket expression. Matches a single character that is contained within the brackets. For example, [abc] matches "a", "b", or "c". [a-z] specifies a range which matches any lowercase letter from "a" to "z". These forms can be mixed: [abcx-z] matches "a", "b", "c", "x", "y", or "z", as does [a-cx-z]. The - character is treated as a literal character if it is the last or the first (after the ^, if present) character within the brackets: [abc-], [-abc], [^-abc]. Backslash escapes are not allowed. The ] character can be included in a bracket expression if it is the first (after the ^, if present) character: []abc], [^]abc].
[^ ] Matches a single character that is not contained within the brackets. For example, [^abc] matches any character other than "a", "b", or "c". [^a-z] matches any single character that is not a lowercase letter from "a" to "z". Likewise, literal characters and ranges can be mixed.
$ Matches the ending position of the string or the position just before a string-ending newline. In line-based tools, it matches the ending position of any line.
( ) Defines a marked subexpression, also called a capturing group, which is essential for extracting the desired part of the text (See also the next entry, \n). BRE mode requires ( ).
\n Matches what the nth marked subexpression matched, where n is a digit from 1 to 9. This construct is defined in the POSIX standard.[36] Some tools allow referencing more than nine capturing groups. Also known as a back-reference, this feature is supported in BRE mode.
* Matches the preceding element zero or more times. For example, abc matches "ac", "abc", "abbbc", etc. [xyz] matches "", "x", "y", "z", "zx", "zyx", "xyzzy", and so on. (ab)* matches "", "ab", "abab", "ababab", and so on.
{m,n} Matches the preceding element at least m and not more than n times. For example, a{3,5} matches only "aaa", "aaaa", and "aaaaa". This is not found in a few older instances of regexes. BRE mode requires {m,n}.

Extended Regular Syntax

Metacharacter Description
? Matches the preceding element zero or one time. For example, ab?c matches only "ac" or "abc".
+ Matches the preceding element one or more times. For example, ab+c matches "abc", "abbc", "abbbc", and so on, but not "ac".
| The choice (also known as alternation or set union) operator matches either the expression before or the expression after the operator. For example, abc

Source: https://en.wikipedia.org/wiki/Regular_expression#POSIX_basic_and_extended


Character classes

Description POSIX Shortcode ASCII
ASCII characters \x[Bytecode]
Alphanumeric characters [:alnum:] [A-Za-z0-9]
Alphanumeric characters plus "_" \w [A-Za-z0-9_]
Non-word characters \W [^A-Za-z0-9_]
Alphabetic characters [:alpha:] \a [A-Za-z]
Space and tab [:space:] \s
[:blank:] \t
Non-whitespace characters \S [^ ]
Word boundaries \b
Non-word boundaries \B
Digits [:digit:] \d [0-9]
Non-digits \D [^0-9]
Lowercase letters [:lower:] \l [a-z]
Uppercase letters [:upper:] \u [A-Z]
Visible characters [:print:] \p [\x20-\x7E]

About

The creg application is a POSIX/GNU regular expression commandline tool for searching with patterns in text-strings or text-files.

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published