CREG

The creg application is a POSIX/GNU regular expression commandline tool for searching with patterns in text-strings or text-files. It implements the functions of the compact-regex.h extensions library. ( https://github.com/nowca/compact-regex )

fast regex testing
text replacement function
reads large text files (up to 8 MB or more) with parameter or redirected text stream
structured and colored display output with filters
file write export
different output formats and layouts (table, list, plain ASCII, CSV, JSON)
options of the regex.h library with extended functionalites
can be run on Linux, Windows , Mac and all GNU C compatible platforms

How to use

Basic use

user@pc:~$  creg "abc DEF xyz ABC 123" "\d+"

find digit string \d+ in the text abc DEF xyz ABC 123

user@pc:~$  creg -t "abc DEF xyz ABC 123" -r "abc" -f i

find string -r "abc" in the text -t "abc DEF xyz ABC 123"
-f i: flag (insensitive case)

user@pc:~$  creg --text "abc DEF xyz ABC 123" --regex "abc" --option-flags i

find string --regex "abc" in the text --text "abc DEF xyz ABC 123"
--option-flags i: flag (insensitive case)

Print layouts

user@pc:~$  creg -t "abc DEF xyz ABC 123" -r "[\w ]+[^0-9]+" -p plain -d r

find string of words without numbers -r "[\w ]+[^0-9]+" in the text -t "abc DEF xyz ABC 123"
-d r: display just the results
-p plain : just as text

user@pc:~$  creg -t "abc DEF xyz ABC 123" -r "[\w]+" -p json -d r

find all words -r "[\w]+" in the text -t "abc DEF xyz ABC 123"
-d r: display just the results
-p json : in csv-format

Replace

user@pc:~$  creg -t "abc DEF xyz ABC 123" -r "[a-z0-9]+" -x "###"

replace all words -r "[a-z0-9]+" with lowercase or numbers in the text -t "abc DEF xyz ABC 123" with the string -x "###"

user@pc:~$  creg -t "abc DEF xyz ABC 123" -r "(a)(b)(c)" -x "\3\2\1" -f gi

replace each letter of "abc" -r "(a)(b)(c)" with the reverse letters "cba" -x "\3\2\1" with the string ###
-f i: flag (insensitive case)

Examples

Searching for well know ports from redirected text stream

user@pc:~$ cat service-names-port-numbers.csv | ./creg -r "(\\d+);(.*UDP.*);(.*mail.*);" -c -f gein -d srp

display the file contents of service-names-port-numbers.csv with cat and read STDOUT with piping redirection
-r: match all UDP based protocols which contains the word mail with the options:
-c: colored output
-f gein: flags (global, extended, insensitive case, newline)
-d srp: display statistics, results and index postions

Display words from wordlist

user@pc:~$ ./creg -i ./example-files/oxford-word-list.txt -r "^(Ae.*ion) (.+\.) (.*)$" -p list -f gei -c -d sr

-i: read in the file ./example-files/oxford-word-list.txt
-r: match all lines (from ^ to $) with words, that start with Ae and end with ion with the options:
-c: colored output
-p list: list-format
-f gei: flags (global, extended, insensitive case)
-d sr: display statistics, results, without the index postions

Searching for ROOT keys in a windows reg file on Windows

Z:\>creg.exe  /I "example-files\windows-formatted-regfile.reg" /R ".*HKEY-CLASSES_ROOT.*" /D TSR

\I: read in the file example-files\windows-formatted-regfile.reg
\R: match all lines that contain the phrase "HKEY-CLASSES_ROOT" with the options:
\D TSR: display text, statistics, results, without the index postions

The input file can also be redirected in with the windows-cmd pipeline command:

Z:\>more port-numbers.csv | creg.exe /R "^.*mail.*$" /D sr /F gein /P list

more port-numbers.csv |: show contents of the file and redirect it with |
\R: match all lines that contain the phrase "mail" with the options:
\D sr: display statistics, results, without the index postions
\F gein: flags (global, extended, insensitive case, newline)
\P list: short list format

Installation

The program can be compiled and copied to the /opt/ folder.

Just run:

user@pc:~$ make

and

user@pc:~$ sudo make install

Compilation

Linux

Build the example program by typing in:

user@pc:~$ make

...or compile it directly with the GNU-C-Compiler:

user@pc:~$ gcc -Wall -static creg.c -o creg

The GNU Extensions with the regex.h library are needed for successful compilation. Please take care of including the neccesary header and library files.
Use the -m32 flag to compile the program for 32 Bit systems.
Important note: The program will be compiled with the -static flag, to combine the libraries into the code, there will be some memory leaks showed in valgrind. These errors are supressed on dynamically linking by default. (https://stackoverflow.com/questions/7506134/valgrind-errors-when-linked-with-static-why)

Windows

To compile the program on windows, you will need a compiler version with the regex.h library, from GNU extensions included:

C:\Users\pcuser>gcc.exe -static -IC:\MinGW-W64\mingw32\opt\include creg.c -o creg.exe -LC:\MinGW-W64\mingw32\opt\lib -lregex

MinGW-W64 includes the regex.h library in the \opt\include and \opt\lib folders.
The paths of the header and library must be included with -I and -L, with an additional -lregex parameter at the end of the command.
-static can be used to make your project independend from libraries.
The path of gcc.exe must be added to the Windows PATH user-variable

MacOS

To compile the program on MacOS or OS X, you will need a compiler version with the regex.h library, from GNU extensions included:

There are several ways to install the GCC development tools on your Mac:
- Xcode
- Homebrew
- MacPorts
- sourcecode compilation
- graphical package installer like Bower or MacUpdate
You need a GCC installation with the regex.h library (GNU Extensions).
For compiler options see Linux.

Commandline options

see -hc or --help` to read all the options

Usage:

creg [Commands] [Options]

Commands:

Command:	Meaning:
`-t <input-text>, --text <input-text>`	text input string
`-r <expression>, --regex <expression>`	regular expression pattern
`-x <replace-text>, --replace <replace-text>`	replacement text substring
`-i <filename>, --input <filename>`	filepath to read in file
`-o <filename>, --output <filename>`	filepath to write out file
`-h, --help`	show help for commands

Options:

Data:

Command:	Meaning:
`-d <data>, --data <data>`	show output elements

<data>:

Argument:	Meaning:
`t`	input text
`s`	statistics
`r`	results
`p`	match index positions

usage example:

-d tsrp or --data sr

Display Layout:

Command:	Meaning:
`-p <print-layout>, --print <print-layout>`	printing or file writing layout

<print-layout>:

Argument:	Meaning:
`table`	table
`list`	short list
`list-full`	full list
`plain`	plain result data
`csv`	comma-seperated values
`json`	JavaScript Object Notation

Command:	Meaning:
`-c, --color`	display with ANSI colors

Option flags:

Command:	Meaning:
`-f <options>, --option-flags <options>`	option-flags for compilation

<options>:

Argument:	Meaning:
`g`: global	search for all matches in a text
`e`: extended	use Extended Regular Expressions (ERE)
`i`: icase	use insensitive case matching
`m`: multiline	search in multiple lines
`n`: newline	ignore the newline character
`p`: nosubexp	ignore group matching with subexpressions
`q`: subexp	match only subexpressions

usage example:

-f ge or --option-flags geinq

default options:

global, extended, newline (the default options are deactivated, if an option is set with the -f command)

Memory:

Command:	Meaning:
`-s <length>, --max-text-size <length>`	max input-text length in bytes, default: 8388608 bytes (8 MB)
`-n <count>, --max-num-matches <count>`	max number of matches, default: 8192 matches

Supported Regular Expression operations

The program supports POSIX compatible Regular Expressions from regex.h with some extended functionalities, like single character classes.

Supported:	Not supported:
Wildcard `.`	Lazy `*?` `+?` `??`
Character classes `\d` `\D` `\w` `\W`	Negative Lookahead `(?!)`
POSIX character classes `[:digit:]`	Negative Lookbehind `(?<!)`
Whitespace `\s` `\S`	Positive Lookahead `(?<=)`
Character Sets `[abc]`	Positive Lookbehind `(?<=)`
Escaping `\`
The Asterisk `*`
The Plus `+`
The Question Mark `?`
Numeric Quantifier `{n}`
Range Quantifier `{n,m}`
Alternation `	`
Anchors `^` `$`
Capturing Groups `a(b)c`
Backreferences `\1`
ASCII and Unicode sequences

POSIX Standard

Basic Regular Syntax

Metacharacter	Description
^	Matches the starting position within the string. In line-based tools, it matches the starting position of any line.
.	Matches any single character (many applications exclude newlines, and exactly which characters are considered newlines is flavor-, character-encoding-, and platform-specific, but it is safe to assume that the line feed character is included). Within POSIX bracket expressions, the dot character matches a literal dot. For example, a.c matches "abc", etc., but [a.c] matches only "a", ".", or "c".
[ ]	A bracket expression. Matches a single character that is contained within the brackets. For example, [abc] matches "a", "b", or "c". [a-z] specifies a range which matches any lowercase letter from "a" to "z". These forms can be mixed: [abcx-z] matches "a", "b", "c", "x", "y", or "z", as does [a-cx-z]. The - character is treated as a literal character if it is the last or the first (after the ^, if present) character within the brackets: [abc-], [-abc], [^-abc]. Backslash escapes are not allowed. The ] character can be included in a bracket expression if it is the first (after the ^, if present) character: []abc], [^]abc].
[^ ]	Matches a single character that is not contained within the brackets. For example, [^abc] matches any character other than "a", "b", or "c". [^a-z] matches any single character that is not a lowercase letter from "a" to "z". Likewise, literal characters and ranges can be mixed.
$	Matches the ending position of the string or the position just before a string-ending newline. In line-based tools, it matches the ending position of any line.
( )	Defines a marked subexpression, also called a capturing group, which is essential for extracting the desired part of the text (See also the next entry, \n). BRE mode requires ( ).
\n	Matches what the nth marked subexpression matched, where n is a digit from 1 to 9. This construct is defined in the POSIX standard.[36] Some tools allow referencing more than nine capturing groups. Also known as a back-reference, this feature is supported in BRE mode.
*	Matches the preceding element zero or more times. For example, abc matches "ac", "abc", "abbbc", etc. [xyz] matches "", "x", "y", "z", "zx", "zyx", "xyzzy", and so on. (ab)* matches "", "ab", "abab", "ababab", and so on.
{m,n}	Matches the preceding element at least m and not more than n times. For example, a{3,5} matches only "aaa", "aaaa", and "aaaaa". This is not found in a few older instances of regexes. BRE mode requires {m,n}.

Extended Regular Syntax

Metacharacter	Description
?	Matches the preceding element zero or one time. For example, ab?c matches only "ac" or "abc".
+	Matches the preceding element one or more times. For example, ab+c matches "abc", "abbc", "abbbc", and so on, but not "ac".
\|	The choice (also known as alternation or set union) operator matches either the expression before or the expression after the operator. For example, abc

Source: https://en.wikipedia.org/wiki/Regular_expression#POSIX_basic_and_extended

Character classes

Description	POSIX	Shortcode	ASCII
ASCII characters		\x[Bytecode]
Alphanumeric characters	[:alnum:]		[A-Za-z0-9]
Alphanumeric characters plus "_"		\w	[A-Za-z0-9_]
Non-word characters		\W	[^A-Za-z0-9_]
Alphabetic characters	[:alpha:]	\a	[A-Za-z]
Space and tab	[:space:]	\s
	[:blank:]	\t
Non-whitespace characters		\S	[^ ]
Word boundaries		\b
Non-word boundaries		\B
Digits	[:digit:]	\d	[0-9]
Non-digits		\D	[^0-9]
Lowercase letters	[:lower:]	\l	[a-z]
Uppercase letters	[:upper:]	\u	[A-Z]
Visible characters	[:print:]	\p	[\x20-\x7E]

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
bin		bin
doc		doc
example-files		example-files
include		include
Makefile		Makefile
README.md		README.md
creg.c		creg.c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CREG

Contents

How to use

Basic use

Print layouts

Replace

Examples

Searching for well know ports from redirected text stream

Display words from wordlist

Searching for ROOT keys in a windows reg file on Windows

Installation

Compilation

Linux

Windows

MacOS

Commandline options

Usage:

Commands:

Options:

Data:

Display Layout:

Option flags:

Memory:

Supported Regular Expression operations

POSIX Standard

Basic Regular Syntax

Extended Regular Syntax

Character classes

About

Uh oh!

Releases 1

Packages

Languages

nowca/creg

Folders and files

Latest commit

History

Repository files navigation

CREG

Contents

How to use

Basic use

Print layouts

Replace

Examples

Searching for well know ports from redirected text stream

Display words from wordlist

Searching for ROOT keys in a windows reg file on Windows

Installation

Compilation

Linux

Windows

MacOS

Commandline options

Usage:

Commands:

Options:

Data:

Display Layout:

Option flags:

Memory:

Supported Regular Expression operations

POSIX Standard

Basic Regular Syntax

Extended Regular Syntax

Character classes

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages