Skip to content

CH_CALLBACK_SKIP_PATTERN causes infinite loop in ch_scan() with patterns of empty matches #360

@liweitianux

Description

@liweitianux

Hi, I recently noticed an infinite loop in our program and found the issue to be related with CH_CALLBACK_SKIP_PATTERN.

When there are patterns that may match empty strings, the CH_CALLBACK_SKIP_PATTERN return value from the callback would cause an infinite loop in ch_scan(), specifically in catchupPcre(). I've reproduced the issue with Hyperscan 5.1.1 and 5.4.0.

Below is a test program:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#include <hs/ch.h>


static int
handler(unsigned int id,
	unsigned long long from, unsigned long long to,
	unsigned int flags, unsigned int size,
	const ch_capture_t *captured, void *ctx)
{
	(void)flags;
	(void)size;
	(void)captured;
	(void)ctx;

	printf("id=%u, from=%llu, to=%llu\n", id, from, to);

	// XXX!
	// With a regex matching an *empty* string, CH_CALLBACK_SKIP_PATTERN
	// causes an infinite loop in catchupPcre().
	// However, using CH_CALLBACK_CONTINUE is okay.
	return CH_CALLBACK_SKIP_PATTERN;
}


int
main(void)
{
	const char *input = "foo 0123 0 bar 39483 n34jfhlqekrcoi3q4";

	// test regex matching an *empty* string !
	const char *regexes[] = { "[a-z]+", "[0-9]", "0|" };
	unsigned int flags[] = { CH_FLAG_CASELESS, CH_FLAG_CASELESS, CH_FLAG_CASELESS };
	unsigned int ids[] = { 1, 2, 3 };
	unsigned int nelem = 3;

	ch_database_t *db = NULL;
	ch_scratch_t *scratch = NULL;
	ch_compile_error_t *cerr = NULL;
	ch_error_t err;

	err = ch_compile_multi(regexes, flags, ids, nelem,
			CH_MODE_NOGROUPS, NULL /* platform */,
			&db, &cerr);
	if (err != CH_SUCCESS) {
		printf("ch_compile_multi() failed\n");
		exit(1);
	}

	err = ch_alloc_scratch(db, &scratch);
	if (err != CH_SUCCESS) {
		printf("ch_alloc_scratch() failed\n");
		exit(1);
	}

	err = ch_scan(db, input, strlen(input), 0 /* flags */,
			scratch, handler, NULL /* onError */, NULL);
	printf("err = %d\n", (int)err);

	return 0;
}

And here is the output log:

id=3, from=0, to=0
id=1, from=0, to=3
id=2, from=4, to=5
id=3, from=38, to=38
id=3, from=38, to=38
id=3, from=38, to=38
id=3, from=38, to=38
id=3, from=38, to=38
id=3, from=38, to=38
id=3, from=38, to=38
id=3, from=38, to=38
id=3, from=38, to=38
id=3, from=38, to=38
id=3, from=38, to=38
id=3, from=38, to=38
id=3, from=38, to=38
id=3, from=38, to=38
id=3, from=38, to=38
...

So it just keeps repeating the last id=3, from=38, to=38 ...

If return CH_CALLBACK_CONTINUE from the callback, ch_scan() works correctly.

Please have a look and hope the above test program helps.

Thank you.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions