Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignore behavior differs from git and ag #118

Closed
bard opened this issue Apr 23, 2021 · 13 comments
Closed

Ignore behavior differs from git and ag #118

bard opened this issue Apr 23, 2021 · 13 comments
Labels
enhancement New feature or request question A question that has or needs further clarification

Comments

@bard
Copy link

bard commented Apr 23, 2021

Possibly a bug in my expectations rather than the code.

Given this directory:

$ tree -a -I .git
.
├── foo
│   ├── bar
│   │   └── world.txt
│   └── world.txt
└── .gitignore

2 directories, 3 files

With these files:

$ cat foo/world.txt
hello
$ cat foo/bar/world.txt
hello

And this .gitignore:

$ cat .gitignore
bar

Git ignores bar:

$ git status --ignored
On branch master

No commits yet

Untracked files:
  (use "git add <file>..." to include in what will be committed)
	.gitignore
	foo/

Ignored files:
  (use "git add -f <file>..." to include in what will be committed)
	foo/bar/

nothing added to commit but untracked files present (use "git add" to track)

And so does ag:

$ ag hello
foo/world.txt

But ugrep doesn't:

$ ugrep --ignore-files hello
foo/world.txt:hello
foo/bar/world.txt:hello

Is it intended?

Version:

$ ugrep --version
ugrep 3.1.11 x86_64-pc-linux-gnu +avx2 +pcre2_jit +zlib +bzip2 +lzma +lz4
License BSD-3-Clause: <https://opensource.org/licenses/BSD-3-Clause>
Written by Robert van Engelen and others: <https://github.com/Genivia/ugrep>
@genivia-inc
Copy link
Member

$ cat .gitignore
bar

The bar entry in .gitignore can be specified as a directory:

$ cat .gitignore
bar/

The git documentation points out that directories are specified with a trailing slash /: The pattern foo/ will match a directory foo and paths underneath it, but will not match a regular file or a symbolic link foo (this is consistent with the way how pathspec works in general in Git)

Ugrep is more sensitive to distinguish files from directories, which gives you a bit more power to specifically ignore files and/or ignore directories. This should cause fewer (unpleasant) surprises that entire directories are ignored, because the directory names happen to accidentally match the file globs.

This distinction in ugrep comes from option -g (--include and --exclude), see the ugrep manual. For example, -g^bar/ ignores directories bar, whereas -g^bar ignores files bar.

We could change ugrep to ignore bar directories AND bar files when bar is specified in a .gitignore as ag does, but I really like it the way the current ugrep version makes a clear distinction between the two! This is also consistent with options -g and --include and --exclude and so easier to memorize IMHO.

@genivia-inc genivia-inc added the question A question that has or needs further clarification label Apr 23, 2021
@bard
Copy link
Author

bard commented Apr 24, 2021

Thanks @genivia-inc, the reasoning makes sense to me. I don't have an opinion about which behavior is better, and I'm fine with changing .gitignore files if I adopt ugrep. However, in the example above, even after replacing bar with bar/ in .gitignore, I still get foo/bar/world.txt among the results. Do I need any additional flags?

@genivia-inc
Copy link
Member

$ tree
.
└── foo
    ├── bar
    │   └── world.txt
    └── world.txt

2 directories, 2 files
$ cat .gitignore 
bar/
$ ugrep --ignore-files hello
foo/world.txt:hello
$ ugrep hello
foo/world.txt:hello
foo/bar/world.txt:hello

Works like a charm.

@bard
Copy link
Author

bard commented Apr 24, 2021

Ha, you're right— I hadn't provided --ignore-files last time. Muscle memory is a stubborn thing.

@ericonr
Copy link
Contributor

ericonr commented Jul 7, 2021

I'm not sure I agree with the reading of the gitignore man page, since it contains this line as well:

If there is a separator at the end of the pattern then the pattern will only match directories, otherwise the pattern can match both files and directories.

This means that ignoring build* where I want it to match both build directories and symlinks to build directories stored elsewhere, so that both ugrep and git respect it, requires a config file with two entries:

# enough for git
foo*
# necessary for ugrep
foo*/

I'd say documenting this divergence (or offering a command line option to support gitignore format fully) is reasonable. I understand if you find the current behavior more reasonable as a default.

I can open another issue if you'd like, but this one seemed recent enough for me to bring it up.

@genivia-inc
Copy link
Member

It is not that I personally find the current behavior more reasonable, it is a matter of safety (nothing went accidentally missing in the results) and POLS (principle of least surprise). You'll hear people say "I'm surprised that simple patterns like *.user in a .gitignore file seem to match files and folder names", see e.g. How to gitignore only files?

Judging from these comments and other more serious issues, git made a mistake to ignore both files and directories with a plain glob pattern. 1) Ignoring only files that match a glob pattern requires defining a second !pattern/ to allow directories to be traversed so that nothing is lost in the results. 2) Stuff not showing up (false negatives) is worse than stuff showing up (false positives). But stuff not showing up leads to a whole lot of guessing why (not to mention frustration to try to figure out). More stuff showing up in the search results can be (visually) ignored, or better the gitignore rules can be updated, e.g. build*/ instead of build* to more accurately search with both ugrep and git. 3) git contradicts the use of GNU/BSD grep --exclude (files only) and --exclude-dir (directories only). The same interpretation applies to GNU grep --exclude-from=FILE which is similar to .gitignore files but does not match both files and directories to exclude. Sometimes we should trust older well-established design judgments (grep) over newer "easier" design judgements (git).

This means that ignoring build* where I want it to match both build directories and symlinks to build directories stored elsewhere, so that both ugrep and git respect it, requires a config file with two entries:

That is incorrect. Symlinked directories are directories, not files. So if build*/ are directories and symlinked directories, then both are ignored as expected.

We need to be careful. Keeping things this way feels a whole lot safer and predictable and is still compatible with git, since git understands build*/ too. Anyway, when I have time I'll think about some kind of strict gitignore-files versus ignore-files option or something like that.

@ericonr
Copy link
Contributor

ericonr commented Jul 8, 2021

That is incorrect. Symlinked directories are directories, not files. So if build*/ are directories and symlinked directories, then both are ignored as expected.

I might not have been clear, it's git that requires the two entries, once I add the trailing slash. It treats symlinks as opaque.

We need to be careful. Keeping things this way feels a whole lot safer and predictable and is still compatible with git, since git understands build*/ too. Anyway, when I have time I'll think about some kind of strict gitignore-files versus ignore-files option or something like that.

Though I don't agree with it entirely, I do understand your reasoning here. I'd suggest that documenting the limitation is enough and adds less moving pieces to ugrep, if you want to avoid that.

@genivia-inc
Copy link
Member

A man update like this should help:

    --ignore-files[=FILE]
            Ignore files and directories matching the globs in each FILE that
            is encountered in recursive searches.  The default FILE is
            `.gitignore'.  Matching files and directories located in the
            directory of a FILE's location and in directories below are ignored
            by temporarily overriding the --exclude and --exclude-dir globs,
            as if --exclude-from=FILE is locally enforced.  Globbing is the
            same as --exclude-from=FILE and supports gitignore syntax, but
            directories are not automatically excluded from searches (use a
            glob ending with a `/' to identify directories to ignore, same as
            git).  Files and directories explicitly specified as command line
            arguments are never ignored.  This option may be repeated.

and also:

    --exclude-from=FILE
            Read the globs from FILE and skip files and directories whose name
            matches one or more globs.  A glob can use **, *, ?, and [...] as
            wildcards, and \ to quote a wildcard or backslash character
            literally.  When a glob contains a `/', full pathnames are matched.
            Otherwise basenames are matched.  When a glob ends with a `/',
            directories are excluded as if --exclude-dir is specified.
            Otherwise files are excluded.  A glob starting with a `!' overrides
            previously-specified exclusions by including matching files.  Lines
            starting with a `#' and empty lines in FILE are ignored.  When FILE
            is a `-', standard input is read.  This option may be repeated.

Likewise, --include-from=FILE is the dual of --exclude-from=FILE, just swap exclude for include. Both --exclude-from=FILE and --include-from=FILE are compatible with GNU grep which uses the simpler glob patterns (files only).

The man page section GLOBBING has one more sentence:

GLOBBING
       Globbing  is  used  by options -g, --include, --include-dir, --include-
       from, --exclude, --exclude-dir, --exclude-from  and  --ignore-files  to
       match  pathnames  and  basenames in recursive searches.  Glob arguments
       for these options should be quoted to prevent shell globbing.

       Globbing supports  gitignore  syntax  and  the  corresponding  matching
       rules,  except  that a glob normally matches files but not directories.
       If a glob ends in a path separator `/', then it matches directories but
       not  files,  as if --include-dir or --exclude-dir is specified.  When a
       glob contains a path separator `/', the full pathname is matched.  Oth-
       erwise  the  basename  of a file or directory is matched.  For example,
       *.h matches foo.h and bar/foo.h.  bar/*.h  matches  bar/foo.h  but  not
       foo.h  and not bar/bar/foo.h.  Use a leading `/' to force /*.h to match
       foo.h but not bar/foo.h.

@ericonr
Copy link
Contributor

ericonr commented Jul 14, 2021

Looks good to me, thanks!

@cglogic
Copy link

cglogic commented Sep 5, 2023

Just tried to use ugrep instead of abandoned long time ago ag and found this.

My point is pretty simple.
ugrep can use any rules for ignoring files.
But if it supports .gitignore it should interpret it in the same way git does.

I simply can't update .gitignore in every project I touched, just to be able use ugrep.

Feel free to ignore this comment. And thanks for the great program.

@genivia-inc
Copy link
Member

genivia-inc commented Sep 10, 2023

I have been thinking about this for a while now.

I tend to agree it's better to replicate .gitignore behavior since that's what is expected. But only for these files. So we can keep the grep-based --exclude-from rules intact for compatibility with GNU grep. One can keep arguing for or against the ugly decision that git took here to match both files and directories, but alas the world isn't perfect when humans make up the rules 😉

A change will make the documentation a bit more convoluted perhaps, since there will be two sets of rules.

PS. or should the rules be applied "gitignore-style" to all --ignore-files files? Like .gitignore by default, but also .ignore and what else the user wants? That makes sense I suppose.

@genivia-inc genivia-inc added the enhancement New feature or request label Sep 10, 2023
@genivia-inc
Copy link
Member

The change would be reflected in the man page and help:

--ignore-files[=FILE]
        Ignore files and directories matching the globs in each FILE that
        is encountered in recursive searches.  The default FILE is
        `.gitignore'.  Matching files and directories located in the
        directory of the FILE and in subdirectories below are ignored.
        Globbing syntax is the same as the --exclude-from=FILE gitignore
        syntax, but files and directories are excluded instead of only
        files.  Directories are specifically excluded when the glob ends in
        a `/'.  Files and directories explicitly specified as command line
        arguments are never ignored.  This option may be repeated to
        specify additional files.

where it suffices to say that "Globbing syntax is the same as the --exclude-from=FILE gitignore syntax, but files and directories are excluded instead of only files".

@genivia-inc genivia-inc reopened this Sep 10, 2023
@cglogic
Copy link

cglogic commented Sep 11, 2023

PS. or should the rules be applied "gitignore-style" to all --ignore-files files? Like .gitignore by default, but also .ignore and what else the user wants? That makes sense I suppose.

I have no strong opinion about other .ignore files. But if .gitignore is just a default .ignore file then replicate the same behavior over other .ignore files looks reasonable and expected behavior.

Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question A question that has or needs further clarification
Projects
None yet
Development

No branches or pull requests

4 participants