Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MS Windows 32bit binary does not work properly for files bigger than 4 gigabyte #1911

Closed
datatraveller1 opened this issue Jun 26, 2021 · 8 comments

Comments

@datatraveller1
Copy link

datatraveller1 commented Jun 26, 2021

What version of ripgrep are you using?

ripgrep 13.0.0 (rev af6b6c5)
-SIMD -AVX (compiled)

How did you install ripgrep?

I use the MS Windows binary rg.exe from ripgrep-13.0.0-i686-pc-windows-msvc.zip

What operating system are you using ripgrep on?

MS Windows 10 32bit

Describe your bug.

For files bigger than 4 gigabyte result lines are missing.
Only result lines from the upper part of the file or no lines are printed.

What are the steps to reproduce the behavior?

Search e.g. in a 5 gigabyte file for content of the last line.

What is the actual behavior?

rg "string_in_last_line_of_big_file" bigfile.txt

no output

What is the expected behavior?

The last line of bigfile.txt containing "string_in_last_line_of_big_file" should be printed.

@BurntSushi
Copy link
Owner

Does --no-mmap resolve this?

@datatraveller1
Copy link
Author

Yes, rg --no-mmap "string_in_last_line_of_big_file" bigfile.txt solves the issue with the 32bit version!
The missing result lines are printed with the --no-mmap option.

However, the previous 32bit version ripgrep-12.1.1-i686-pc-windows-msvc.zip works correctly without the --no-map option.
Some interesting hints about this observation:
ripgrep 12.1.1 (rev 7cb2113)
rg "string_in_last_line_of_big_file" bigfile.txt --debug
=>
DEBUG|grep_regex::literal|crates\regex\src\literal.rs:58: literal prefixes detected: Literals { lits: [Complete(string_in_last_line_of_big_file)], limit_size: 250, limit_class: 10 }
DEBUG|globset|crates\globset\src\lib.rs:431: built glob set; 0 literals, 0 basenames, 12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
DEBUG|grep_searcher::searcher::mmap|crates\searcher\src\searcher\mmap.rs:86: bigfile.txt: failed to open memory map: memory map length overflows usize

ripgrep 13.0.0 (rev af6b6c5)
rg "string_in_last_line_of_big_file" bigfile.txt --debug
=>
DEBUG|grep_regex::literal|crates\regex\src\literal.rs:58: literal prefixes detected: Literals { lits: [Complete(string_in_last_line_of_big_file)], limit_size: 250, limit_class: 10 }
DEBUG|grep_cli::decompress|crates\cli\src\decompress.rs:482: gzip: could not find executable in PATH
DEBUG|grep_cli::decompress|crates\cli\src\decompress.rs:482: gzip: could not find executable in PATH
DEBUG|grep_cli::decompress|crates\cli\src\decompress.rs:482: bzip2: could not find executable in PATH
DEBUG|grep_cli::decompress|crates\cli\src\decompress.rs:482: bzip2: could not find executable in PATH
DEBUG|grep_cli::decompress|crates\cli\src\decompress.rs:482: xz: could not find executable in PATH
DEBUG|grep_cli::decompress|crates\cli\src\decompress.rs:482: xz: could not find executable in PATH
DEBUG|grep_cli::decompress|crates\cli\src\decompress.rs:482: lz4: could not find executable in PATH
DEBUG|grep_cli::decompress|crates\cli\src\decompress.rs:482: xz: could not find executable in PATH
DEBUG|grep_cli::decompress|crates\cli\src\decompress.rs:482: brotli: could not find executable in PATH
DEBUG|grep_cli::decompress|crates\cli\src\decompress.rs:482: zstd: could not find executable in PATH
DEBUG|grep_cli::decompress|crates\cli\src\decompress.rs:482: zstd: could not find executable in PATH
DEBUG|grep_cli::decompress|crates\cli\src\decompress.rs:482: uncompress: could not find executable in PATH

I somehow assume --no-mmap should be the default for the 32bit version to avoid wrong results.

@BurntSushi
Copy link
Owner

Yup. The next release will have mmap disabled in all non-64-bit systems.

I don't know why this used to work. The only semi-related change in ripgrep 13 that I can think of is that it is now statically linking vcruntime: #1613

I'm not sure why or how that would change things here, but it's simple enough to just disable mmap on 32-bit.

BurntSushi added a commit that referenced this issue Jun 26, 2021
@datatraveller1
Copy link
Author

Thank you very much! I'll add a few more comments just to ensure there are no misunderstandings with the fix:
I plan to use the MS Windows 32bit binary release both for my PC (32bit) and my laptop (64bit) like it used to work with version 12.1.1.
Like on Windows 32bit, the current 13.0.0 32bit binary release does not work correctly on Windows 64bit unless the --no-mmap option is used. Only the current 13.0.0 64bit binary release works correctly on Windows 64bit for big files.
So, the patch (disable mmap) should impact the target rust 32bit binary release and not the Windows version currently used by the user (32bit or 64bit). I'm not a Rust programmer but it seems to me the fix takes care of that.

@BurntSushi
Copy link
Owner

Thank you for copying your comment from email. :-) I'll copy my response as well:

The patch I pushed should impact the target used to build the binary.
A cfg! is conditional compilation.

I don't use Windows, so it's a bit annoying to test this patch myself. I have a Windows laptop somewhere in my house,
but it's always a big production to pull it out because Windows
insists on spending hours running Windows Update.

If anyone has an easy way to build ripgrep's master branch for 32-bit Windows and test it on a >4GB file with a match at the end, that would be most appreciated!

@ghost
Copy link

ghost commented Sep 22, 2021

I think the underlying reason is RazrFalcon/memmap2-rs@5e27122 introduced into memmap2 version 0.3.0 used here as of

version = "0.3.0"
and fixed by RazrFalcon/memmap2-rs@9aa838a which is part of version 0.3.1 (but version 0.5.0 is current).

@ghost
Copy link

ghost commented Sep 22, 2021

If anyone has an easy way to build ripgrep's master branch for 32-bit Windows and test it on a >4GB file with a match at the end, that would be most appreciated!

I am not sure if that would have sufficed to catch this, but when working on the memmap2 code, I have repeatedly resorted to cross building using MinGW and then testing the result under Wine. I am wary of the testing power of the memmap2 to Win32 to POSIX API conversion though...

@adamreichold
Copy link
Contributor

I am not sure if that would have sufficed to catch this, but when working on the memmap2 code, I have repeatedly resorted to cross building using MinGW and then testing the result under Wine. I am wary of the testing power of the memmap2 to Win32 to POSIX API conversion though...

I seems it would suffice, but there is a small wrinkle to this for cross building for 32-bit Windows: Due to LLVM/Rust assuming SEH-style unwinding but most Linux distributions 32-bit MinGW toolchains using SjLj-style unwinding, one will run into linker errors related to _Unwind_Resume, c.f. rust-lang/rust#12859 Some distributions like Fedora since version 32 however ship cross toolchains using SEH-style unwinding and using such a distribution seems to work as expected, c.f. #2000

BurntSushi pushed a commit that referenced this issue May 19, 2023
memmap2 v0.3.0 introduced a regression when trying to map files larger than 4GB
on 32-bit architectures[1] which was subsequently fixed in v0.3.1[2].

This commit bumps locked version of the memmap2 dependency to the current v0.5.0
and reverts fdfc418 to re-enable mmap on 32-bit
architectures as a different approach to fixing [3].

This was tested to report matches from the end of a 5GB file using MinGW and Wine.

Ref #1911, PR #2000 

[1] RazrFalcon/memmap2-rs@5e27122
[2] RazrFalcon/memmap2-rs@9aa838a
[3] #1911
netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this issue Nov 28, 2023
14.0.2 (2023-11-27)
===================
This is a patch release with a few small bug fixes.

Bug fixes:

* [BUG #2654](BurntSushi/ripgrep#2654):
  Fix `deb` release sha256 sum file.
* [BUG #2658](BurntSushi/ripgrep#2658):
  Fix partial regression in the behavior of `--null-data --line-regexp`.
* [BUG #2659](BurntSushi/ripgrep#2659):
  Fix Fish shell completions.
* [BUG #2662](BurntSushi/ripgrep#2662):
  Fix typo in documentation for `-i/--ignore-case`.


14.0.1 (2023-11-26)
===================
This a patch release meant to fix `cargo install ripgrep` on Windows.

Bug fixes:

* [BUG #2653](BurntSushi/ripgrep#2653):
  Include `pkg/windows/Manifest.xml` in crate package.


14.0.0 (2023-11-26)
===================
ripgrep 14 is a new major version release of ripgrep that has some new
features, performance improvements and a lot of bug fixes.

The headlining feature in this release is hyperlink support. In this release,
they are an opt-in feature but may change to an opt-out feature in the future.
To enable them, try passing `--hyperlink-format default`. If you use [VS Code],
then try passing `--hyperlink-format vscode`. Please [report your experience
with hyperlinks][report-hyperlinks], positive or negative.

[VS Code]: https://code.visualstudio.com/
[report-hyperlinks]: BurntSushi/ripgrep#2611

Another headlining development in this release is that it contains a rewrite
of its regex engine. You generally shouldn't notice any changes, except for
some searches may get faster. You can read more about the [regex engine rewrite
on my blog][regex-internals]. Please [report your performance improvements or
regressions that you notice][report-perf].

[report-perf]: BurntSushi/ripgrep#2652

Finally, ripgrep switched the library it uses for argument parsing. Users
should not notice a difference in most cases (error messages have changed
somewhat), but flag overrides should generally be more consistent. For example,
things like `--no-ignore --ignore-vcs` work as one would expect (disables all
filtering related to ignore rules except for rules found in version control
systems such as `git`).

[regex-internals]: https://blog.burntsushi.net/regex-internals/

**BREAKING CHANGES**:

* `rg -C1 -A2` used to be equivalent to `rg -A2`, but now it is equivalent to
  `rg -B1 -A2`. That is, `-A` and `-B` no longer completely override `-C`.
  Instead, they only partially override `-C`.

Build process changes:

* ripgrep's shell completions and man page are now created by running ripgrep
with a new `--generate` flag. For example, `rg --generate man` will write a
man page in `roff` format on stdout. The release archives have not changed.
* The optional build dependency on `asciidoc` or `asciidoctor` has been
dropped. Previously, it was used to produce ripgrep's man page. ripgrep now
owns this process itself by writing `roff` directly.

Performance improvements:

* [PERF #1746](BurntSushi/ripgrep#1746):
  Make some cases with inner literals faster.
* [PERF #1760](BurntSushi/ripgrep#1760):
  Make most searches with `\b` look-arounds (among others) much faster.
* [PERF #2591](BurntSushi/ripgrep#2591):
  Parallel directory traversal now uses work stealing for faster searches.
* [PERF #2642](BurntSushi/ripgrep#2642):
  Parallel directory traversal has some contention reduced.

Feature enhancements:

* Added or improved file type filtering for Ada, DITA, Elixir, Fuchsia, Gentoo,
  Gradle, GraphQL, Markdown, Prolog, Raku, TypeScript, USD, V
* [FEATURE #665](BurntSushi/ripgrep#665):
  Add a new `--hyperlink-format` flag that turns file paths into hyperlinks.
* [FEATURE #1709](BurntSushi/ripgrep#1709):
  Improve documentation of ripgrep's behavior when stdout is a tty.
* [FEATURE #1737](BurntSushi/ripgrep#1737):
  Provide binaries for Apple silicon.
* [FEATURE #1790](BurntSushi/ripgrep#1790):
  Add new `--stop-on-nonmatch` flag.
* [FEATURE #1814](BurntSushi/ripgrep#1814):
  Flags are now categorized in `-h/--help` output and ripgrep's man page.
* [FEATURE #1838](BurntSushi/ripgrep#1838):
  An error is shown when searching for NUL bytes with binary detection enabled.
* [FEATURE #2195](BurntSushi/ripgrep#2195):
  When `extra-verbose` mode is enabled in zsh, show extra file type info.
* [FEATURE #2298](BurntSushi/ripgrep#2298):
  Add instructions for installing ripgrep using `cargo binstall`.
* [FEATURE #2409](BurntSushi/ripgrep#2409):
  Added installation instructions for `winget`.
* [FEATURE #2425](BurntSushi/ripgrep#2425):
  Shell completions (and man page) can be created via `rg --generate`.
* [FEATURE #2524](BurntSushi/ripgrep#2524):
  The `--debug` flag now indicates whether stdin or `./` is being searched.
* [FEATURE #2643](BurntSushi/ripgrep#2643):
  Make `-d` a short flag for `--max-depth`.
* [FEATURE #2645](BurntSushi/ripgrep#2645):
  The `--version` output will now also contain PCRE2 availability information.

Bug fixes:

* [BUG #884](BurntSushi/ripgrep#884):
  Don't error when `-v/--invert-match` is used multiple times.
* [BUG #1275](BurntSushi/ripgrep#1275):
  Fix bug with `\b` assertion in the regex engine.
* [BUG #1376](BurntSushi/ripgrep#1376):
  Using `--no-ignore --ignore-vcs` now works as one would expect.
* [BUG #1622](BurntSushi/ripgrep#1622):
  Add note about error messages to `-z/--search-zip` documentation.
* [BUG #1648](BurntSushi/ripgrep#1648):
  Fix bug where sometimes short flags with values, e.g., `-M 900`, would fail.
* [BUG #1701](BurntSushi/ripgrep#1701):
  Fix bug where some flags could not be repeated.
* [BUG #1757](BurntSushi/ripgrep#1757):
  Fix bug when searching a sub-directory didn't have ignores applied correctly.
* [BUG #1891](BurntSushi/ripgrep#1891):
  Fix bug when using `-w` with a regex that can match the empty string.
* [BUG #1911](BurntSushi/ripgrep#1911):
  Disable mmap searching in all non-64-bit environments.
* [BUG #1966](BurntSushi/ripgrep#1966):
  Fix bug where ripgrep can panic when printing to stderr.
* [BUG #2046](BurntSushi/ripgrep#2046):
  Clarify that `--pre` can accept any kind of path in the documentation.
* [BUG #2108](BurntSushi/ripgrep#2108):
  Improve docs for `-r/--replace` syntax.
* [BUG #2198](BurntSushi/ripgrep#2198):
  Fix bug where `--no-ignore-dot` would not ignore `.rgignore`.
* [BUG #2201](BurntSushi/ripgrep#2201):
  Improve docs for `-r/--replace` flag.
* [BUG #2288](BurntSushi/ripgrep#2288):
  `-A` and `-B` now only each partially override `-C`.
* [BUG #2236](BurntSushi/ripgrep#2236):
  Fix gitignore parsing bug where a trailing `\/` resulted in an error.
* [BUG #2243](BurntSushi/ripgrep#2243):
  Fix `--sort` flag for values other than `path`.
* [BUG #2246](BurntSushi/ripgrep#2246):
  Add note in `--debug` logs when binary files are ignored.
* [BUG #2337](BurntSushi/ripgrep#2337):
  Improve docs to mention that `--stats` is always implied by `--json`.
* [BUG #2381](BurntSushi/ripgrep#2381):
  Make `-p/--pretty` override flags like `--no-line-number`.
* [BUG #2392](BurntSushi/ripgrep#2392):
  Improve global git config parsing of the `excludesFile` field.
* [BUG #2418](BurntSushi/ripgrep#2418):
  Clarify sorting semantics of `--sort=path`.
* [BUG #2458](BurntSushi/ripgrep#2458):
  Make `--trim` run before `-M/--max-columns` takes effect.
* [BUG #2479](BurntSushi/ripgrep#2479):
  Add documentation about `.ignore`/`.rgignore` files in parent directories.
* [BUG #2480](BurntSushi/ripgrep#2480):
  Fix bug when using inline regex flags with `-e/--regexp`.
* [BUG #2505](BurntSushi/ripgrep#2505):
  Improve docs for `--vimgrep` by mentioning footguns and some work-arounds.
* [BUG #2519](BurntSushi/ripgrep#2519):
  Fix incorrect default value in documentation for `--field-match-separator`.
* [BUG #2523](BurntSushi/ripgrep#2523):
  Make executable searching take `.com` into account on Windows.
* [BUG #2574](BurntSushi/ripgrep#2574):
  Fix bug in `-w/--word-regexp` that would result in incorrect match offsets.
* [BUG #2623](BurntSushi/ripgrep#2623):
  Fix a number of bugs with the `-w/--word-regexp` flag.
* [BUG #2636](BurntSushi/ripgrep#2636):
  Strip release binaries for macOS.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants