Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Low-hanging fruit from the R repository #106

Merged
merged 15 commits into from
Aug 16, 2024

Conversation

dag-erling
Copy link
Collaborator

As pointed out repeatedly, e.g. in #13, R includes TRE and has a number of fixes for it. This PR includes some of the simpler ones, plus a few additional fixes inspired by changes found in R. It also raises the warning level, which would not have been possible without some of these fixes.

@dag-erling dag-erling force-pushed the des/r-fixes branch 2 times, most recently from f4b1163 to f9dc4b0 Compare July 25, 2024 22:02
@dag-erling dag-erling linked an issue Jul 25, 2024 that may be closed by this pull request
When not using `alloca()`, we need to free `tags` before returning.

Obtained from the R repository.
This avoids having to deal with overflow, which can't happen but would
be undefined behavior if it did, and some compilers flag it.

Obtained from the R repository.
Like `num_submatches`, it is never negative, so this lets us not have to
worry about overflow, which would invoke undefined behavior.

Inspired by the R repository.
Obtained from the R repository.
When not using `alloca()`, `xafree(ptr)` expanded to just a comment,
which sometimes triggered compiler warnings.

Inspired by the R repository.
The R repository has a null pointer check at the top of the case for
`PARSE_MARK_FOR_SUBMATCH` in the parser loop.  However, it is not
possible to get to this point with `result` being null, even with an
empty input.  Assert this.

Inspired by the R repository.
Obtained from the R repository.
* Use `uintptr_t` instead of `unsigned long`, as the latter is not
  necessarily the right size.
* Make the hash itself unsigned, and use only unsgined arithmetic to
  compute it, to avoid the possibility of signed overflow, which would
  invoke undefined behavior.

Inspired by the R repository.
When using GCC or Clang, switch from just `-Wall` to `-Wall -Wextra`.
* Rename `cur_max` to `mb_cur_max` to avoid confusion with `curr_max`.
* Adjust some comments, drop reference to R bugzilla ticket.
* Fix one case where we were still checking the static `TRE_MB_CUR_MAX`
  instead of the dynamic `mb_cur_max`.
* Drop an invalid assertion.

Inspired by the R repository.

Fixes: 8e84229
Inspired by the R repository.
On Windows, `iswprint(L'\t')` (but not `isprint(L'\t')`) improperly
returns true.  Work around this by explicitly checking for L'\t' before
calling `iswprint()`.

This fixes laurikari#64.

Inspired by the R repository.
Obtained from the R repository.
During AST expansion, when copying a literal node, we'd copy its minimum
and maximum codepoint, but not its parameters or classes.

Both R and musl have incomplete fixes for this issue.

This fixes laurikari#52.
@dag-erling dag-erling linked an issue Jul 30, 2024 that may be closed by this pull request
@dag-erling dag-erling merged commit c504ac3 into laurikari:master Aug 16, 2024
2 checks passed
@dag-erling dag-erling deleted the des/r-fixes branch August 16, 2024 11:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

tab treated as printable character on Windows Null-pointer dereference in agrep for expression "{+}{7}"
1 participant