Skip to content

New sparse-checkout builtin and "cone" mode #316

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
81a1526
sparse-checkout: create builtin with 'list' subcommand
derrickstolee Aug 18, 2019
fcfe477
sparse-checkout: create 'init' subcommand
derrickstolee Aug 18, 2019
7ef5f2e
clone: add --sparse mode
derrickstolee Aug 18, 2019
248fc17
sparse-checkout: 'set' subcommand
derrickstolee Aug 19, 2019
7517982
sparse-checkout: add '--stdin' option to set subcommand
derrickstolee Sep 17, 2019
6431141
sparse-checkout: create 'disable' subcommand
derrickstolee Aug 20, 2019
0bc87c1
trace2: add region in clear_ce_flags
jeffhostetler Apr 30, 2019
3b9e11d
sparse-checkout: add 'cone' mode
derrickstolee Aug 19, 2019
eca0bbf
sparse-checkout: use hashmaps for cone patterns
derrickstolee Aug 19, 2019
951e622
sparse-checkout: init and set in cone mode
derrickstolee Aug 20, 2019
745910b
unpack-trees: hash less in cone mode
derrickstolee Aug 22, 2019
3d0f951
unpack-trees: add progress to clear_ce_flags()
derrickstolee Sep 24, 2019
2703b34
sparse-checkout: sanitize for nested folders
derrickstolee Oct 1, 2019
63e01c2
sparse-checkout: update working directory in-process
derrickstolee Oct 1, 2019
cc6773e
sparse-checkout: use in-process update for disable subcommand
derrickstolee Nov 21, 2019
a0dca00
sparse-checkout: write using lockfile
derrickstolee Oct 3, 2019
4097d8f
sparse-checkout: cone mode should not interact with .gitignore
derrickstolee Oct 7, 2019
c62d975
sparse-checkout: update working directory in-process for 'init'
derrickstolee Nov 21, 2019
7577ffc
sparse-checkout: check for dirty status
derrickstolee Nov 21, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -158,6 +158,7 @@
/git-show-branch
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Elijah Newren wrote (reply to this):

On Mon, Oct 7, 2019 at 1:08 PM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
> +SPARSE CHECKOUT
> +----------------
> +
> +"Sparse checkout" allows populating the working directory sparsely.
> +It uses the skip-worktree bit (see linkgit:git-update-index[1]) to tell
> +Git whether a file in the working directory is worth looking at. If
> +the skip-worktree bit is set, then the file is ignored in the working
> +directory. Git will not populate the contents of those files, which
> +makes a sparse checkout helpful when working in a repository with many
> +files, but only a few are important to the current user.
> +
> +The `$GIT_DIR/info/sparse-checkout` file is used to define the
> +skip-worktree reference bitmap. When Git updates the working
> +directory, it updates the skip-worktree bits in the index based
> +ont this file. The files matching the patterns in the file will

s/ont/on/

> +appear in the working directory, and the rest will not.
> +
> +## FULL PATTERN SET
> +
> +By default, the sparse-checkout file uses the same syntax as `.gitignore`
> +files.
> +
> +While `$GIT_DIR/info/sparse-checkout` is usually used to specify what
> +files are included, you can also specify what files are _not_ included,
> +using negative patterns. For example, to remove the file `unwanted`:
> +
> +----------------
> +/*
> +!unwanted
> +----------------
> +
> +Another tricky thing is fully repopulating the working directory when you
> +no longer want sparse checkout. You cannot just disable "sparse
> +checkout" because skip-worktree bits are still in the index and your working
> +directory is still sparsely populated. You should re-populate the working
> +directory with the `$GIT_DIR/info/sparse-checkout` file content as
> +follows:
> +
> +----------------
> +/*
> +----------------
> +
> +Then you can disable sparse checkout. Sparse checkout support in 'git
> +checkout' and similar commands is disabled by default. You need to
> +set `core.sparseCheckout` to `true` in order to have sparse checkout
> +support.

Looks like these disappear by the end of the series, so no need to
comment on them.  Thanks for all the fixes, other than the trivial
typo above, this patch looks good.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Elijah Newren wrote (reply to this):

On Tue, Oct 15, 2019 at 6:56 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
> +DESCRIPTION
> +-----------
> +
> +Initialize and modify the sparse-checkout configuration, which reduces
> +the checkout to a set of directories given by a list of prefixes.
> +
> +THIS COMMAND IS EXPERIMENTAL. THE BEHAVIOR MAY CHANGE.

I think the wording needs to be a bit more detailed; you copied the
wording from git-switch.txt, but usage of git-switch is not expected
to modify the behavior of other commands.  sparse-checkout, by
contrast, is designed to affect other commands: at the very least
checkout & switch, and likely will affect grep, diff, log, and a host
of others.  Perhaps something like:

THIS COMMAND IS EXPERIMENTAL.  ITS BEHAVIOR, AND THE BEHAVIOR OF OTHER
COMMANDS IN THE PRESENCE OF SPARSE-CHECKOUTS, WILL LIKELY CHANGE IN
THE FUTURE.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 10/16/2019 3:00 PM, Elijah Newren wrote:
> On Tue, Oct 15, 2019 at 6:56 AM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>> +DESCRIPTION
>> +-----------
>> +
>> +Initialize and modify the sparse-checkout configuration, which reduces
>> +the checkout to a set of directories given by a list of prefixes.
>> +
>> +THIS COMMAND IS EXPERIMENTAL. THE BEHAVIOR MAY CHANGE.
> 
> I think the wording needs to be a bit more detailed; you copied the
> wording from git-switch.txt, but usage of git-switch is not expected
> to modify the behavior of other commands.  sparse-checkout, by
> contrast, is designed to affect other commands: at the very least
> checkout & switch, and likely will affect grep, diff, log, and a host
> of others.  Perhaps something like:
> 
> THIS COMMAND IS EXPERIMENTAL.  ITS BEHAVIOR, AND THE BEHAVIOR OF OTHER
> COMMANDS IN THE PRESENCE OF SPARSE-CHECKOUTS, WILL LIKELY CHANGE IN
> THE FUTURE.

Thanks. I was taking your comment to mean "this builtin is experimental"
but what you really mean is "the sparse-checkout feature is (still)
experimental".

-Stolee

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, SZEDER Gábor wrote (reply to this):

On Tue, Oct 15, 2019 at 01:55:48PM +0000, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <dstolee@microsoft.com>
> 
> The sparse-checkout feature is mostly hidden to users, as its
> only documentation is supplementary information in the docs for
> 'git read-tree'. In addition, users need to know how to edit the
> .git/info/sparse-checkout file with the right patterns, then run
> the appropriate 'git read-tree -mu HEAD' command. Keeping the
> working directory in sync with the sparse-checkout file requires
> care.
> 
> Begin an effort to make the sparse-checkout feature a porcelain
> feature by creating a new 'git sparse-checkout' builtin. This
> builtin will be the preferred mechanism for manipulating the
> sparse-checkout file and syncing the working directory.
> 
> The documentation provided is adapted from the "git read-tree"
> documentation with a few edits for clarity in the new context.
> Extra sections are added to hint toward a future change to
> a more restricted pattern set.
> 
> Helped-by: Elijah Newren <newren@gmail.com>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  .gitignore                            |  1 +
>  Documentation/git-read-tree.txt       |  2 +-
>  Documentation/git-sparse-checkout.txt | 87 +++++++++++++++++++++++++++
>  Makefile                              |  1 +
>  builtin.h                             |  1 +
>  builtin/sparse-checkout.c             | 86 ++++++++++++++++++++++++++
>  git.c                                 |  1 +
>  t/t1091-sparse-checkout-builtin.sh    | 50 +++++++++++++++

You need to add an entry for the new command to 'command-list.txt' as
well, so it will be included in 'git help -a' and completion (since
it's intended to be a porcelain), etc.

>  8 files changed, 228 insertions(+), 1 deletion(-)
>  create mode 100644 Documentation/git-sparse-checkout.txt
>  create mode 100644 builtin/sparse-checkout.c
>  create mode 100755 t/t1091-sparse-checkout-builtin.sh


> diff --git a/Documentation/git-sparse-checkout.txt b/Documentation/git-sparse-checkout.txt
> new file mode 100644
> index 0000000000..46d3dc3cb1
> --- /dev/null
> +++ b/Documentation/git-sparse-checkout.txt
> @@ -0,0 +1,87 @@
> +git-sparse-checkout(1)
> +=======================

The ==== line is one character longer than the title; I think their
length should match.  The other day the Documentation CI job failed
because the length of these lines didn't match in one of the guides.
Strangely, it doesn't complain about this one.

> +
> +NAME
> +----
> +git-sparse-checkout - Initialize and modify the sparse-checkout
> +configuration, which reduces the checkout to a set of directories
> +given by a list of prefixes.
> +
> +
> +SYNOPSIS
> +--------
> +[verse]
> +'git sparse-checkout <subcommand> [options]'
> +
> +
> +DESCRIPTION
> +-----------
> +
> +Initialize and modify the sparse-checkout configuration, which reduces
> +the checkout to a set of directories given by a list of prefixes.
> +
> +THIS COMMAND IS EXPERIMENTAL. THE BEHAVIOR MAY CHANGE.
> +
> +
> +COMMANDS
> +--------
> +'list'::
> +	Provide a list of the contents in the sparse-checkout file.
> +
> +
> +SPARSE CHECKOUT
> +----------------

This is longer as well.

> +
> +"Sparse checkout" allows populating the working directory sparsely.
> +It uses the skip-worktree bit (see linkgit:git-update-index[1]) to tell
> +Git whether a file in the working directory is worth looking at. If
> +the skip-worktree bit is set, then the file is ignored in the working
> +directory. Git will not populate the contents of those files, which
> +makes a sparse checkout helpful when working in a repository with many
> +files, but only a few are important to the current user.
> +
> +The `$GIT_DIR/info/sparse-checkout` file is used to define the
> +skip-worktree reference bitmap. When Git updates the working
> +directory, it updates the skip-worktree bits in the index based
> +on this file. The files matching the patterns in the file will
> +appear in the working directory, and the rest will not.
> +
> +## FULL PATTERN SET
> +
> +By default, the sparse-checkout file uses the same syntax as `.gitignore`
> +files.
> +
> +While `$GIT_DIR/info/sparse-checkout` is usually used to specify what
> +files are included, you can also specify what files are _not_ included,
> +using negative patterns. For example, to remove the file `unwanted`:
> +
> +----------------
> +/*
> +!unwanted
> +----------------
> +
> +Another tricky thing is fully repopulating the working directory when you
> +no longer want sparse checkout. You cannot just disable "sparse
> +checkout" because skip-worktree bits are still in the index and your working
> +directory is still sparsely populated. You should re-populate the working
> +directory with the `$GIT_DIR/info/sparse-checkout` file content as
> +follows:
> +
> +----------------
> +/*
> +----------------
> +
> +Then you can disable sparse checkout. Sparse checkout support in 'git
> +checkout' and similar commands is disabled by default. You need to
> +set `core.sparseCheckout` to `true` in order to have sparse checkout
> +support.
> +
> +SEE ALSO
> +--------
> +
> +linkgit:git-read-tree[1]
> +linkgit:gitignore[5]
> +
> +GIT
> +---
> +Part of the linkgit:git[1] suite

> diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh
> new file mode 100755
> index 0000000000..a9b04b1a88
> --- /dev/null
> +++ b/t/t1091-sparse-checkout-builtin.sh
> @@ -0,0 +1,50 @@
> +#!/bin/sh
> +
> +test_description='sparse checkout builtin tests'
> +
> +. ./test-lib.sh
> +
> +test_expect_success 'setup' '
> +	git init repo &&
> +	(
> +		cd repo &&
> +		echo "initial" >a &&
> +		mkdir folder1 folder2 deep &&
> +		mkdir deep/deeper1 deep/deeper2 &&
> +		mkdir deep/deeper1/deepest &&
> +		cp a folder1 &&
> +		cp a folder2 &&
> +		cp a deep &&
> +		cp a deep/deeper1 &&
> +		cp a deep/deeper2 &&
> +		cp a deep/deeper1/deepest &&

There are six 'cp' invocations here.  Curious: does this mean that
it's important that all those files are the same?

> +		git add . &&
> +		git commit -m "initial commit"
> +	)
> +'
> +
> +test_expect_success 'git sparse-checkout list (empty)' '
> +	git -C repo sparse-checkout list >list 2>err &&
> +	test_line_count = 0 list &&

Nit: test_must_be_empty might be a bit more idiomatic, dunno.

> +	test_i18ngrep "this worktree is not sparse (sparse-checkout file may not exist)" err
> +'
> +
> +test_expect_success 'git sparse-checkout list (populated)' '
> +	test_when_finished rm -f repo/.git/info/sparse-checkout &&
> +	cat >repo/.git/info/sparse-checkout <<-EOF &&
> +		/folder1/*
> +		/deep/
> +		**/a
> +		!*bin*
> +	EOF
> +	git -C repo sparse-checkout list >list &&
> +	cat >expect <<-EOF &&
> +		/folder1/*
> +		/deep/
> +		**/a
> +		!*bin*
> +	EOF

OTOH, here the content of the 'sparse-checkout' file and 'expect' must
be the same, but one has to carefully look through four lines of
patterns to realize that they are indeed the same.  I think in this
case the explicitness of 'cp' would be better.

> +	test_cmp expect list
> +'
> +
> +test_done
> -- 
> gitgitgadget
> 

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 10/18/2019 12:07 PM, SZEDER Gábor wrote:
> On Tue, Oct 15, 2019 at 01:55:48PM +0000, Derrick Stolee via GitGitGadget wrote:
>> From: Derrick Stolee <dstolee@microsoft.com>
>>
>> The sparse-checkout feature is mostly hidden to users, as its
>> only documentation is supplementary information in the docs for
>> 'git read-tree'. In addition, users need to know how to edit the
>> .git/info/sparse-checkout file with the right patterns, then run
>> the appropriate 'git read-tree -mu HEAD' command. Keeping the
>> working directory in sync with the sparse-checkout file requires
>> care.
>>
>> Begin an effort to make the sparse-checkout feature a porcelain
>> feature by creating a new 'git sparse-checkout' builtin. This
>> builtin will be the preferred mechanism for manipulating the
>> sparse-checkout file and syncing the working directory.
>>
>> The documentation provided is adapted from the "git read-tree"
>> documentation with a few edits for clarity in the new context.
>> Extra sections are added to hint toward a future change to
>> a more restricted pattern set.
>>
>> Helped-by: Elijah Newren <newren@gmail.com>
>> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
>> ---
>>  .gitignore                            |  1 +
>>  Documentation/git-read-tree.txt       |  2 +-
>>  Documentation/git-sparse-checkout.txt | 87 +++++++++++++++++++++++++++
>>  Makefile                              |  1 +
>>  builtin.h                             |  1 +
>>  builtin/sparse-checkout.c             | 86 ++++++++++++++++++++++++++
>>  git.c                                 |  1 +
>>  t/t1091-sparse-checkout-builtin.sh    | 50 +++++++++++++++
> 
> You need to add an entry for the new command to 'command-list.txt' as
> well, so it will be included in 'git help -a' and completion (since
> it's intended to be a porcelain), etc.
> 
>>  8 files changed, 228 insertions(+), 1 deletion(-)
>>  create mode 100644 Documentation/git-sparse-checkout.txt
>>  create mode 100644 builtin/sparse-checkout.c
>>  create mode 100755 t/t1091-sparse-checkout-builtin.sh
> 
> 
>> diff --git a/Documentation/git-sparse-checkout.txt b/Documentation/git-sparse-checkout.txt
>> new file mode 100644
>> index 0000000000..46d3dc3cb1
>> --- /dev/null
>> +++ b/Documentation/git-sparse-checkout.txt
>> @@ -0,0 +1,87 @@
>> +git-sparse-checkout(1)
>> +=======================
> 
> The ==== line is one character longer than the title; I think their
> length should match.  The other day the Documentation CI job failed
> because the length of these lines didn't match in one of the guides.
> Strangely, it doesn't complain about this one.

Interesting. GitGitGadget runs a documentation build, and it has never
failed for me. Will fix.

>> +
>> +NAME
>> +----
>> +git-sparse-checkout - Initialize and modify the sparse-checkout
>> +configuration, which reduces the checkout to a set of directories
>> +given by a list of prefixes.
>> +
>> +
>> +SYNOPSIS
>> +--------
>> +[verse]
>> +'git sparse-checkout <subcommand> [options]'
>> +
>> +
>> +DESCRIPTION
>> +-----------
>> +
>> +Initialize and modify the sparse-checkout configuration, which reduces
>> +the checkout to a set of directories given by a list of prefixes.
>> +
>> +THIS COMMAND IS EXPERIMENTAL. THE BEHAVIOR MAY CHANGE.
>> +
>> +
>> +COMMANDS
>> +--------
>> +'list'::
>> +	Provide a list of the contents in the sparse-checkout file.
>> +
>> +
>> +SPARSE CHECKOUT
>> +----------------
> 
> This is longer as well.
> 
>> +
>> +"Sparse checkout" allows populating the working directory sparsely.
>> +It uses the skip-worktree bit (see linkgit:git-update-index[1]) to tell
>> +Git whether a file in the working directory is worth looking at. If
>> +the skip-worktree bit is set, then the file is ignored in the working
>> +directory. Git will not populate the contents of those files, which
>> +makes a sparse checkout helpful when working in a repository with many
>> +files, but only a few are important to the current user.
>> +
>> +The `$GIT_DIR/info/sparse-checkout` file is used to define the
>> +skip-worktree reference bitmap. When Git updates the working
>> +directory, it updates the skip-worktree bits in the index based
>> +on this file. The files matching the patterns in the file will
>> +appear in the working directory, and the rest will not.
>> +
>> +## FULL PATTERN SET
>> +
>> +By default, the sparse-checkout file uses the same syntax as `.gitignore`
>> +files.
>> +
>> +While `$GIT_DIR/info/sparse-checkout` is usually used to specify what
>> +files are included, you can also specify what files are _not_ included,
>> +using negative patterns. For example, to remove the file `unwanted`:
>> +
>> +----------------
>> +/*
>> +!unwanted
>> +----------------
>> +
>> +Another tricky thing is fully repopulating the working directory when you
>> +no longer want sparse checkout. You cannot just disable "sparse
>> +checkout" because skip-worktree bits are still in the index and your working
>> +directory is still sparsely populated. You should re-populate the working
>> +directory with the `$GIT_DIR/info/sparse-checkout` file content as
>> +follows:
>> +
>> +----------------
>> +/*
>> +----------------
>> +
>> +Then you can disable sparse checkout. Sparse checkout support in 'git
>> +checkout' and similar commands is disabled by default. You need to
>> +set `core.sparseCheckout` to `true` in order to have sparse checkout
>> +support.
>> +
>> +SEE ALSO
>> +--------
>> +
>> +linkgit:git-read-tree[1]
>> +linkgit:gitignore[5]
>> +
>> +GIT
>> +---
>> +Part of the linkgit:git[1] suite
> 
>> diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh
>> new file mode 100755
>> index 0000000000..a9b04b1a88
>> --- /dev/null
>> +++ b/t/t1091-sparse-checkout-builtin.sh
>> @@ -0,0 +1,50 @@
>> +#!/bin/sh
>> +
>> +test_description='sparse checkout builtin tests'
>> +
>> +. ./test-lib.sh
>> +
>> +test_expect_success 'setup' '
>> +	git init repo &&
>> +	(
>> +		cd repo &&
>> +		echo "initial" >a &&
>> +		mkdir folder1 folder2 deep &&
>> +		mkdir deep/deeper1 deep/deeper2 &&
>> +		mkdir deep/deeper1/deepest &&
>> +		cp a folder1 &&
>> +		cp a folder2 &&
>> +		cp a deep &&
>> +		cp a deep/deeper1 &&
>> +		cp a deep/deeper2 &&
>> +		cp a deep/deeper1/deepest &&
> 
> There are six 'cp' invocations here.  Curious: does this mean that
> it's important that all those files are the same?

The content of the "a" file is not important at all to the
tests. It only matters that there is a blob in each of these
trees.

> 
>> +		git add . &&
>> +		git commit -m "initial commit"
>> +	)
>> +'
>> +
>> +test_expect_success 'git sparse-checkout list (empty)' '
>> +	git -C repo sparse-checkout list >list 2>err &&
>> +	test_line_count = 0 list &&
> 
> Nit: test_must_be_empty might be a bit more idiomatic, dunno.
> 
>> +	test_i18ngrep "this worktree is not sparse (sparse-checkout file may not exist)" err
>> +'
>> +
>> +test_expect_success 'git sparse-checkout list (populated)' '
>> +	test_when_finished rm -f repo/.git/info/sparse-checkout &&
>> +	cat >repo/.git/info/sparse-checkout <<-EOF &&
>> +		/folder1/*
>> +		/deep/
>> +		**/a
>> +		!*bin*
>> +	EOF
>> +	git -C repo sparse-checkout list >list &&
>> +	cat >expect <<-EOF &&
>> +		/folder1/*
>> +		/deep/
>> +		**/a
>> +		!*bin*
>> +	EOF
> 
> OTOH, here the content of the 'sparse-checkout' file and 'expect' must
> be the same, but one has to carefully look through four lines of
> patterns to realize that they are indeed the same.  I think in this
> case the explicitness of 'cp' would be better.

That's fine. We explicitly set the contents in the line above.

>> +	test_cmp expect list
>> +'
>> +
>> +test_done
>> -- 
>> gitgitgadget
>>

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, SZEDER Gábor wrote (reply to this):

On Mon, Oct 21, 2019 at 01:56:10PM +0000, Derrick Stolee via GitGitGadget wrote:
> diff --git a/Documentation/git-sparse-checkout.txt b/Documentation/git-sparse-checkout.txt
> new file mode 100644
> index 0000000000..9d6ca22917
> --- /dev/null
> +++ b/Documentation/git-sparse-checkout.txt
> @@ -0,0 +1,89 @@
> +git-sparse-checkout(1)
> +======================
> +
> +NAME
> +----
> +git-sparse-checkout - Initialize and modify the sparse-checkout
> +configuration, which reduces the checkout to a set of directories

s/directories/paths/

> +given by a list of prefixes.

s/prefixes/patterns/

E.g. 'README' is a file, not a directory, and it's not interpreted as
a prefix:

  $ git sparse-checkout set README
  $ find | grep README
  ./README
  ./Documentation/ABI/README
  ./Documentation/virt/kvm/devices/README
  ./arch/m68k/fpsp040/README
  [...]


> +SYNOPSIS
> +--------
> +[verse]
> +'git sparse-checkout <subcommand> [options]'
> +
> +
> +DESCRIPTION
> +-----------
> +
> +Initialize and modify the sparse-checkout configuration, which reduces
> +the checkout to a set of directories given by a list of prefixes.

Likewise.

/git-show-index
/git-show-ref
/git-sparse-checkout
/git-stage
/git-stash
/git-status
Expand Down
10 changes: 8 additions & 2 deletions Documentation/config/core.txt
Original file line number Diff line number Diff line change
Expand Up @@ -593,8 +593,14 @@ core.multiPackIndex::
multi-pack-index design document].
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, SZEDER Gábor wrote (reply to this):

On Mon, Oct 21, 2019 at 01:56:17PM +0000, Derrick Stolee via GitGitGadget wrote:
> diff --git a/Documentation/git-sparse-checkout.txt b/Documentation/git-sparse-checkout.txt
> index f794d4797a..3931e4f2ad 100644
> --- a/Documentation/git-sparse-checkout.txt
> +++ b/Documentation/git-sparse-checkout.txt
> @@ -92,6 +92,56 @@ using negative patterns. For example, to remove the file `unwanted`:
>  ----------------
>  
>  
> +## CONE PATTERN SET
> +
> +The full pattern set allows for arbitrary pattern matches and complicated
> +inclusion/exclusion rules. These can result in O(N*M) pattern matches when
> +updating the index, where N is the number of patterns and M is the number
> +of paths in the index. To combat this performance issue, a more restricted
> +pattern set is allowed when `core.spareCheckoutCone` is enabled.
> +
> +The accepted patterns in the cone pattern set are:
> +
> +1. *Recursive:* All paths inside a directory are included.
> +
> +2. *Parent:* All files immediately inside a directory are included.
> +
> +In addition to the above two patterns, we also expect that all files in the
> +root directory are included. If a recursive pattern is added, then all
> +leading directories are added as parent patterns.
> +
> +By default, when running `git sparse-checkout init`, the root directory is
> +added as a parent pattern. At this point, the sparse-checkout file contains
> +the following patterns:
> +
> +```
> +/*
> +!/*/
> +```
> +
> +This says "include everything in root, but nothing two levels below root."
> +If we then add the folder `A/B/C` as a recursive pattern, the folders `A` and
> +`A/B` are added as parent patterns. The resulting sparse-checkout file is
> +now
> +
> +```
> +/*
> +!/*/
> +/A/
> +!/A/*/
> +/A/B/
> +!/A/B/*/
> +/A/B/C/
> +```

This is a man page, not markdown.  Those ## and ``` don't look as
expected in the generated documentation.


core.sparseCheckout::
Enable "sparse checkout" feature. See section "Sparse checkout" in
linkgit:git-read-tree[1] for more information.
Enable "sparse checkout" feature. See linkgit:git-sparse-checkout[1]
for more information.

core.sparseCheckoutCone::
Enables the "cone mode" of the sparse checkout feature. When the
sparse-checkout file contains a limited set of patterns, then this
mode provides significant performance advantages. See
linkgit:git-sparse-checkout[1] for more information.

core.abbrev::
Set the length object names are abbreviated to. If
Expand Down
8 changes: 7 additions & 1 deletion Documentation/git-clone.txt
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ SYNOPSIS
[--dissociate] [--separate-git-dir <git dir>]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Elijah Newren wrote (reply to this):

On Mon, Oct 7, 2019 at 1:08 PM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
> During the 'git sparse-checkout init' call, we must first look
> to see if HEAD is valid, since 'git clone' does not have a valid
> HEAD.

...does not have a valid HEAD by the time git_sparse_checkout_init() is called?

> The first checkout will create the HEAD ref and update the
> working directory correctly.

Is this checkout you reference a manual-initiated user checkout after
the clone, or the checkout performed as part of the clone?  (I'm
almost certain it's the latter, but your wording makes me question.)

[--depth <depth>] [--[no-]single-branch] [--no-tags]
[--recurse-submodules[=<pathspec>]] [--[no-]shallow-submodules]
[--[no-]remote-submodules] [--jobs <n>] [--] <repository>
[--[no-]remote-submodules] [--jobs <n>] [--sparse] [--] <repository>
[<directory>]

DESCRIPTION
Expand Down Expand Up @@ -156,6 +156,12 @@ objects from the source repository into a pack in the cloned repository.
used, neither remote-tracking branches nor the related
configuration variables are created.

--sparse::
Initialize the sparse-checkout file so the working
directory starts with only the files in the root
of the repository. The sparse-checkout file can be
modified to grow the working directory as needed.

--mirror::
Set up a mirror of the source repository. This implies `--bare`.
Compared to `--bare`, `--mirror` not only maps local branches of the
Expand Down
2 changes: 1 addition & 1 deletion Documentation/git-read-tree.txt
Original file line number Diff line number Diff line change
Expand Up @@ -436,7 +436,7 @@ support.
SEE ALSO
--------
linkgit:git-write-tree[1]; linkgit:git-ls-files[1];
linkgit:gitignore[5]
linkgit:gitignore[5]; linkgit:git-sparse-checkout[1];

GIT
---
Expand Down
161 changes: 161 additions & 0 deletions Documentation/git-sparse-checkout.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,161 @@
git-sparse-checkout(1)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Elijah Newren wrote (reply to this):

On Mon, Oct 7, 2019 at 1:08 PM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
> ++
> +The init subcommand also enables the 'extensions.worktreeConfig' setting
> +and sets the `core.sparseCheckout` setting in the worktree-specific config
> +file. This prevents the sparse-checkout feature from interfering with other
> +worktrees.

I'm afraid that might be mis-parsed by future readers.  Perhaps something like:

The init subcommand also enables the `core.sparseCheckout` setting.
To avoid interfering with other worktrees, it first enables the
`extensions.worktreeConfig` setting and makes sure to set the
`core.sparseCheckout` setting in the worktree-specific config file.

> +enum sparse_checkout_mode {
> +       MODE_NONE = 0,
> +       MODE_FULL = 1,
> +};

So MODE_FULL is "true" and MODE_NONE is "false".  MODE_NONE seems
confusing to me, but let's keep reading...

> +
> +static int sc_set_config(enum sparse_checkout_mode mode)
> +{
> +       struct argv_array argv = ARGV_ARRAY_INIT;
> +
> +       if (git_config_set_gently("extensions.worktreeConfig", "true")) {
> +               error(_("failed to set extensions.worktreeConfig setting"));
> +               return 1;
> +       }
> +
> +       argv_array_pushl(&argv, "config", "--worktree", "core.sparseCheckout", NULL);
> +
> +       if (mode)
> +               argv_array_pushl(&argv, "true", NULL);
> +       else
> +               argv_array_pushl(&argv, "false", NULL);

Wait, what?  MODE_FULL is used to specify that you want a sparse
checkout, and MODE_NONE is used to denote that you want a full (i.e.
non-sparse) checkout?  These are *very* confusing names.


> +static int sparse_checkout_init(int argc, const char **argv)
> +{
> +       struct pattern_list pl;
> +       char *sparse_filename;
> +       FILE *fp;
> +       int res;
> +
> +       if (sc_set_config(MODE_FULL))
> +               return 1;

Seems confusing here too.


Everything else in the patch looks good, though.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 10/11/2019 6:14 PM, Elijah Newren wrote:
> On Mon, Oct 7, 2019 at 1:08 PM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>> ++
>> +The init subcommand also enables the 'extensions.worktreeConfig' setting
>> +and sets the `core.sparseCheckout` setting in the worktree-specific config
>> +file. This prevents the sparse-checkout feature from interfering with other
>> +worktrees.
> 
> I'm afraid that might be mis-parsed by future readers.  Perhaps something like:
> 
> The init subcommand also enables the `core.sparseCheckout` setting.

I like the paragraph below, but the sentence above is repeated from
the earlier paragraph.

> To avoid interfering with other worktrees, it first enables the
> `extensions.worktreeConfig` setting and makes sure to set the
> `core.sparseCheckout` setting in the worktree-specific config file.
> 
>> +enum sparse_checkout_mode {
>> +       MODE_NONE = 0,
>> +       MODE_FULL = 1,
>> +};
> 
> So MODE_FULL is "true" and MODE_NONE is "false".  MODE_NONE seems
> confusing to me, but let's keep reading...
> 
>> +
>> +static int sc_set_config(enum sparse_checkout_mode mode)
>> +{
>> +       struct argv_array argv = ARGV_ARRAY_INIT;
>> +
>> +       if (git_config_set_gently("extensions.worktreeConfig", "true")) {
>> +               error(_("failed to set extensions.worktreeConfig setting"));
>> +               return 1;
>> +       }
>> +
>> +       argv_array_pushl(&argv, "config", "--worktree", "core.sparseCheckout", NULL);
>> +
>> +       if (mode)
>> +               argv_array_pushl(&argv, "true", NULL);
>> +       else
>> +               argv_array_pushl(&argv, "false", NULL);
> 
> Wait, what?  MODE_FULL is used to specify that you want a sparse
> checkout, and MODE_NONE is used to denote that you want a full (i.e.
> non-sparse) checkout?  These are *very* confusing names.

I understand they are confusing, hopefully it makes more sense with
the cone mode later.

* NONE == "No patterns at all"
* FULL == "all patterns allowed"
* CONE == "only cone patterns" (appears later)

Since this is just an internal detail, what if I switched it to

* MODE_NO_PATTERNS
* MODE_ALL_PATTERNS
* MODE_CONE_PATTERNS

Would that make more sense?

>> +static int sparse_checkout_init(int argc, const char **argv)
>> +{
>> +       struct pattern_list pl;
>> +       char *sparse_filename;
>> +       FILE *fp;
>> +       int res;
>> +
>> +       if (sc_set_config(MODE_FULL))
>> +               return 1;
> 
> Seems confusing here too.
> 
> 
> Everything else in the patch looks good, though.

Thanks,
-Stolee

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Elijah Newren wrote (reply to this):

On Mon, Oct 7, 2019 at 1:08 PM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> The 'git sparse-checkout set' subcommand takes a list of patterns
> as arguments and writes them to the sparse-checkout file. Then, it
> updates the working directory using 'git read-tree -mu HEAD'.
>
> The 'set' subcommand will replace the entire contents of the
> sparse-checkout file. The write_patterns_and_update() method is
> extracted from cmd_sparse_checkout() to make it easier to implement
> 'add' and/or 'remove' subcommands in the future.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  Documentation/git-sparse-checkout.txt |  5 ++++
>  builtin/sparse-checkout.c             | 35 ++++++++++++++++++++++++++-
>  t/t1091-sparse-checkout-builtin.sh    | 19 +++++++++++++++
>  3 files changed, 58 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/git-sparse-checkout.txt b/Documentation/git-sparse-checkout.txt
> index e095c4a98b..f4bd951550 100644
> --- a/Documentation/git-sparse-checkout.txt
> +++ b/Documentation/git-sparse-checkout.txt
> @@ -39,6 +39,11 @@ and sets the `core.sparseCheckout` setting in the worktree-specific config
>  file. This prevents the sparse-checkout feature from interfering with other
>  worktrees.
>
> +'set'::
> +       Write a set of patterns to the sparse-checkout file, as given as
> +       a list of arguments following the 'set' subcommand. Update the
> +       working directory to match the new patterns.
> +
>  SPARSE CHECKOUT
>  ----------------
>
> diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
> index 3ecb7ac2e7..52d4f832f3 100644
> --- a/builtin/sparse-checkout.c
> +++ b/builtin/sparse-checkout.c
> @@ -8,7 +8,7 @@
>  #include "strbuf.h"
>
>  static char const * const builtin_sparse_checkout_usage[] = {
> -       N_("git sparse-checkout [init|list]"),
> +       N_("git sparse-checkout [init|list|set] <options>"),
>         NULL
>  };
>
> @@ -140,6 +140,37 @@ static int sparse_checkout_init(int argc, const char **argv)
>         return update_working_directory();
>  }
>
> +static int write_patterns_and_update(struct pattern_list *pl)
> +{
> +       char *sparse_filename;
> +       FILE *fp;
> +
> +       sparse_filename = get_sparse_checkout_filename();
> +       fp = fopen(sparse_filename, "w");
> +       write_patterns_to_file(fp, pl);
> +       fclose(fp);
> +       free(sparse_filename);
> +
> +       return update_working_directory();
> +}
> +
> +static int sparse_checkout_set(int argc, const char **argv, const char *prefix)
> +{
> +       static const char *empty_base = "";
> +       int i;
> +       struct pattern_list pl;
> +       int result;
> +       memset(&pl, 0, sizeof(pl));
> +
> +       for (i = 1; i < argc; i++)
> +               add_pattern(argv[i], empty_base, 0, &pl, 0);
> +
> +       result = write_patterns_and_update(&pl);
> +
> +       clear_pattern_list(&pl);
> +       return result;
> +}
> +
>  int cmd_sparse_checkout(int argc, const char **argv, const char *prefix)
>  {
>         static struct option builtin_sparse_checkout_options[] = {
> @@ -162,6 +193,8 @@ int cmd_sparse_checkout(int argc, const char **argv, const char *prefix)
>                         return sparse_checkout_list(argc, argv);
>                 if (!strcmp(argv[0], "init"))
>                         return sparse_checkout_init(argc, argv);
> +               if (!strcmp(argv[0], "set"))
> +                       return sparse_checkout_set(argc, argv, prefix);
>         }
>
>         usage_with_options(builtin_sparse_checkout_usage,
> diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh
> index d4c145a3af..19e8673c6b 100755
> --- a/t/t1091-sparse-checkout-builtin.sh
> +++ b/t/t1091-sparse-checkout-builtin.sh
> @@ -101,4 +101,23 @@ test_expect_success 'clone --sparse' '
>         test_cmp expect dir
>  '
>
> +test_expect_success 'set sparse-checkout using builtin' '
> +       git -C repo sparse-checkout set "/*" "!/*/" "*folder*" &&
> +       cat >expect <<-EOF &&
> +               /*
> +               !/*/
> +               *folder*
> +       EOF
> +       git -C repo sparse-checkout list >actual &&
> +       test_cmp expect actual &&
> +       test_cmp expect repo/.git/info/sparse-checkout &&
> +       ls repo >dir  &&
> +       cat >expect <<-EOF &&
> +               a
> +               folder1
> +               folder2
> +       EOF
> +       test_cmp expect dir
> +'
> +
>  test_done
> --

Looks good, thanks for the fixes.  I'm still slightly worried about
folks not looking at the docs and calling sparse-checkout set without
calling init, and then being negatively surprised.  It's a minor
issue, but a warning might be helpful.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Elijah Newren wrote (reply to this):

On Fri, Oct 11, 2019 at 3:26 PM Elijah Newren <newren@gmail.com> wrote:
>
> On Mon, Oct 7, 2019 at 1:08 PM Derrick Stolee via GitGitGadget

> Looks good, thanks for the fixes.  I'm still slightly worried about
> folks not looking at the docs and calling sparse-checkout set without
> calling init, and then being negatively surprised.  It's a minor
> issue, but a warning might be helpful.

Looks like you added that to patch 5, so nevermind.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, SZEDER Gábor wrote (reply to this):

On Tue, Oct 15, 2019 at 01:55:53PM +0000, Derrick Stolee via GitGitGadget wrote:
> diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
> index b747b78d34..78a80ce119 100644
> --- a/builtin/sparse-checkout.c
> +++ b/builtin/sparse-checkout.c
> @@ -8,7 +8,7 @@
>  #include "strbuf.h"
>  
>  static char const * const builtin_sparse_checkout_usage[] = {
> -	N_("git sparse-checkout [init|list|set] <options>"),
> +	N_("git sparse-checkout [init|list|set|disable] <options>"),

In the synopsis [] means optional, but I think the subcommands are not
optional; however, the options are:

  git sparse-checkout (init|list|set|disable) [<options>]

Perhaps it would be even better if each subcommand had its own
synopsis line in the short help, see e.g. 'git notes -h' or 'git
remote -h'.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, SZEDER Gábor wrote (reply to this):

On Mon, Oct 21, 2019 at 01:56:11PM +0000, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <dstolee@microsoft.com>
> 
> Getting started with a sparse-checkout file can be daunting. Help
> users start their sparse enlistment using 'git sparse-checkout init'.
> This will set 'core.sparseCheckout=true' in their config, write
> an initial set of patterns to the sparse-checkout file, and update
> their working directory.

Reading this I was wandering what those "initial set of patterns"
might be ...

> Make sure to use the `extensions.worktreeConfig` setting and write
> the sparse checkout config to the worktree-specific config file.
> This avoids confusing interactions with other worktrees.
> 
> The use of running another process for 'git read-tree' is sub-
> optimal. This will be removed in a later change.
> 
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  Documentation/git-sparse-checkout.txt | 11 ++++
>  builtin/sparse-checkout.c             | 79 ++++++++++++++++++++++++++-
>  t/t1091-sparse-checkout-builtin.sh    | 41 ++++++++++++++
>  3 files changed, 130 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/git-sparse-checkout.txt b/Documentation/git-sparse-checkout.txt
> index 9d6ca22917..930a361567 100644
> --- a/Documentation/git-sparse-checkout.txt
> +++ b/Documentation/git-sparse-checkout.txt
> @@ -30,6 +30,17 @@ COMMANDS
>  'list'::
>  	Provide a list of the contents in the sparse-checkout file.
>  
> +'init'::
> +	Enable the `core.sparseCheckout` setting. If the
> +	sparse-checkout file does not exist, then populate it with
> +	patterns that match every file in the root directory and
> +	no other directories, then will remove all directories tracked
> +	by Git. Add patterns to the sparse-checkout file to
> +	repopulate the working directory.

... and then reading this I was wandering why these are those "initial
set of patterns".

> ++
> +To avoid interfering with other worktrees, it first enables the
> +`extensions.worktreeConfig` setting and makes sure to set the
> +`core.sparseCheckout` setting in the worktree-specific config file.
>  
>  SPARSE CHECKOUT
>  ---------------
> diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
> index 5717c9b2cb..77aa52ca01 100644
> --- a/builtin/sparse-checkout.c
> +++ b/builtin/sparse-checkout.c
> @@ -8,7 +8,7 @@
>  #include "strbuf.h"
>  
>  static char const * const builtin_sparse_checkout_usage[] = {
> -	N_("git sparse-checkout list"),
> +	N_("git sparse-checkout (init|list)"),
>  	NULL
>  };
>  
> @@ -59,6 +59,81 @@ static int sparse_checkout_list(int argc, const char **argv)
>  	return 0;
>  }
>  
> +static int update_working_directory(void)
> +{
> +	struct argv_array argv = ARGV_ARRAY_INIT;
> +	int result = 0;
> +	argv_array_pushl(&argv, "read-tree", "-m", "-u", "HEAD", NULL);
> +
> +	if (run_command_v_opt(argv.argv, RUN_GIT_CMD)) {
> +		error(_("failed to update index with new sparse-checkout paths"));
> +		result = 1;
> +	}
> +
> +	argv_array_clear(&argv);
> +	return result;
> +}
> +
> +enum sparse_checkout_mode {
> +	MODE_NO_PATTERNS = 0,
> +	MODE_ALL_PATTERNS = 1,
> +};
> +
> +static int sc_set_config(enum sparse_checkout_mode mode)

Nit: s/sc_//, perhaps?  I suppose that "sc" prefix stands for "sparse
checkout", but this is a static function in
'builtin/sparse-checkout.c', so it doesn't need a distinguising
prefix.  Even at the end of this patch series no other functions have
this "sc" prefix.

> +{
> +	struct argv_array argv = ARGV_ARRAY_INIT;

This 'argv_array' is not cleared at the end of the function, but...

> +	if (git_config_set_gently("extensions.worktreeConfig", "true")) {
> +		error(_("failed to set extensions.worktreeConfig setting"));
> +		return 1;
> +	}
> +
> +	argv_array_pushl(&argv, "config", "--worktree", "core.sparseCheckout", NULL);
> +
> +	if (mode)
> +		argv_array_pushl(&argv, "true", NULL);
> +	else
> +		argv_array_pushl(&argv, "false", NULL);
> +
> +	if (run_command_v_opt(argv.argv, RUN_GIT_CMD)) {
> +		error(_("failed to enable core.sparseCheckout"));
> +		return 1;
> +	}

Why the external 'git config' invocation?

  git_config_set_in_file_gently(git_path("config.worktree"),
                                "core.sparseCheckout",
                                mode ? "true" : "false")

> +
> +	return 0;
> +}
> +
> +static int sparse_checkout_init(int argc, const char **argv)
> +{
> +	struct pattern_list pl;
> +	char *sparse_filename;
> +	FILE *fp;
> +	int res;
> +
> +	if (sc_set_config(MODE_ALL_PATTERNS))
> +		return 1;
> +
> +	memset(&pl, 0, sizeof(pl));
> +
> +	sparse_filename = get_sparse_checkout_filename();
> +	res = add_patterns_from_file_to_list(sparse_filename, "", 0, &pl, NULL);
> +
> +	/* If we already have a sparse-checkout file, use it. */
> +	if (res >= 0) {
> +		free(sparse_filename);
> +		goto reset_dir;
> +	}
> +
> +	/* initial mode: all blobs at root */
> +	fp = xfopen(sparse_filename, "w");
> +	free(sparse_filename);
> +	fprintf(fp, "/*\n!/*/\n");

What if this fprintf() call were to fail?

> +	fclose(fp);
> +
> +reset_dir:
> +	return update_working_directory();
> +}
> +
>  int cmd_sparse_checkout(int argc, const char **argv, const char *prefix)
>  {
>  	static struct option builtin_sparse_checkout_options[] = {
> @@ -79,6 +154,8 @@ int cmd_sparse_checkout(int argc, const char **argv, const char *prefix)
>  	if (argc > 0) {
>  		if (!strcmp(argv[0], "list"))
>  			return sparse_checkout_list(argc, argv);
> +		if (!strcmp(argv[0], "init"))
> +			return sparse_checkout_init(argc, argv);
>  	}
>  
>  	usage_with_options(builtin_sparse_checkout_usage,
> diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh
> index 9b73d44907..cd56cc384b 100755
> --- a/t/t1091-sparse-checkout-builtin.sh
> +++ b/t/t1091-sparse-checkout-builtin.sh
> @@ -42,4 +42,45 @@ test_expect_success 'git sparse-checkout list (populated)' '
>  	test_cmp expect list
>  '
>  
> +test_expect_success 'git sparse-checkout init' '
> +	git -C repo sparse-checkout init &&
> +	cat >expect <<-EOF &&
> +		/*
> +		!/*/
> +	EOF
> +	test_cmp expect repo/.git/info/sparse-checkout &&
> +	git -C repo config --list >config &&
> +	test_i18ngrep "core.sparsecheckout=true" config &&

We have the 'test_cmp_config' helper function to check the expected
value of configuration variables.

> +	ls repo >dir  &&
> +	echo a >expect &&
> +	test_cmp expect dir
> +'
> +
> +test_expect_success 'git sparse-checkout list after init' '
> +	git -C repo sparse-checkout list >actual &&
> +	cat >expect <<-EOF &&
> +		/*
> +		!/*/
> +	EOF
> +	test_cmp expect actual
> +'
> +
> +test_expect_success 'init with existing sparse-checkout' '
> +	echo "*folder*" >> repo/.git/info/sparse-checkout &&
> +	git -C repo sparse-checkout init &&
> +	cat >expect <<-EOF &&
> +		/*
> +		!/*/
> +		*folder*
> +	EOF
> +	test_cmp expect repo/.git/info/sparse-checkout &&
> +	ls repo >dir  &&
> +	cat >expect <<-EOF &&
> +		a
> +		folder1
> +		folder2
> +	EOF
> +	test_cmp expect dir
> +'
> +
>  test_done
> -- 
> gitgitgadget
> 

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, SZEDER Gábor wrote (reply to this):

On Tue, Nov 19, 2019 at 03:33:08PM +0100, SZEDER Gábor wrote:
> > +	argv_array_pushl(&argv, "config", "--worktree", "core.sparseCheckout", NULL);
> > +
> > +	if (mode)
> > +		argv_array_pushl(&argv, "true", NULL);
> > +	else
> > +		argv_array_pushl(&argv, "false", NULL);
> > +
> > +	if (run_command_v_opt(argv.argv, RUN_GIT_CMD)) {
> > +		error(_("failed to enable core.sparseCheckout"));
> > +		return 1;
> > +	}
> 
> Why the external 'git config' invocation?
> 
>   git_config_set_in_file_gently(git_path("config.worktree"),
>                                 "core.sparseCheckout",
>                                 mode ? "true" : "false")

Having slept on it, I think it would be even better to pass NULL
instead of "false".  "false", well, just sets the configuration
variable to false, but a NULL would properly unset it.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, SZEDER Gábor wrote (reply to this):

On Mon, Oct 21, 2019 at 01:56:13PM +0000, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <dstolee@microsoft.com>
> 
> The 'git sparse-checkout set' subcommand takes a list of patterns
> as arguments and writes them to the sparse-checkout file. Then, it
> updates the working directory using 'git read-tree -mu HEAD'.
> 
> The 'set' subcommand will replace the entire contents of the
> sparse-checkout file. The write_patterns_and_update() method is
> extracted from cmd_sparse_checkout() to make it easier to implement
> 'add' and/or 'remove' subcommands in the future.

Yeah, an 'add' subcommand would be great, because 'set' throwing away
all the existing patterns can lead to some frustration:

  # Let's clone with sparse checkout
  $ git clone --sparse ...
  $ cd clone
  $ less README
  $ git sparse-checkout '/some/dir/*'
  # Erm, what was the next step?
  $ less README
  README: No such file or directory
  # Uhoh.

Having said that, being forced to use 'git sparse-checkout set' and
specify all patterns at once does have its own benefits: I needed like
6 subdirectories to build that subset of a big project that I was
interested in, and now all the necessary patterns are saved in my Bash
history, and I will be able to easily dig out the complete command in
the future.  That wouldn't work with using the 'add' subcommand to add
one pattern at a time.

> diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh
> index cb74715ca6..bf2dc55bb1 100755
> --- a/t/t1091-sparse-checkout-builtin.sh
> +++ b/t/t1091-sparse-checkout-builtin.sh
> @@ -96,4 +96,36 @@ test_expect_success 'clone --sparse' '
>  	test_cmp expect dir
>  '
>  
> +test_expect_success 'set enables config' '
> +	git init empty-config &&
> +	(
> +		cd empty-config &&
> +		test_commit test file &&
> +		test_path_is_missing .git/config.worktree &&
> +		test_must_fail git sparse-checkout set nothing &&

This command prints the same error message twice:

  + test_must_fail git sparse-checkout set nothing
  error: Sparse checkout leaves no entry on working directory
  error: Sparse checkout leaves no entry on working directory


> +		test_i18ngrep "sparseCheckout = false" .git/config.worktree &&

The contents of a configuration file surely can't ever be translated,
can it?!

Please use 'test_cmp_config'.

> +		git sparse-checkout set "/*" &&
> +		test_i18ngrep "sparseCheckout = true" .git/config.worktree
> +	)
> +'
> +
> +test_expect_success 'set sparse-checkout using builtin' '
> +	git -C repo sparse-checkout set "/*" "!/*/" "*folder*" &&
> +	cat >expect <<-EOF &&
> +		/*
> +		!/*/
> +		*folder*
> +	EOF
> +	git -C repo sparse-checkout list >actual &&
> +	test_cmp expect actual &&
> +	test_cmp expect repo/.git/info/sparse-checkout &&
> +	ls repo >dir  &&
> +	cat >expect <<-EOF &&
> +		a
> +		folder1
> +		folder2
> +	EOF
> +	test_cmp expect dir
> +'
> +
>  test_done
> -- 
> gitgitgadget
> 

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 11/19/2019 10:46 AM, SZEDER Gábor wrote:
> On Mon, Oct 21, 2019 at 01:56:13PM +0000, Derrick Stolee via GitGitGadget wrote:
>> From: Derrick Stolee <dstolee@microsoft.com>
>>
>> The 'git sparse-checkout set' subcommand takes a list of patterns
>> as arguments and writes them to the sparse-checkout file. Then, it
>> updates the working directory using 'git read-tree -mu HEAD'.
>>
>> The 'set' subcommand will replace the entire contents of the
>> sparse-checkout file. The write_patterns_and_update() method is
>> extracted from cmd_sparse_checkout() to make it easier to implement
>> 'add' and/or 'remove' subcommands in the future.
> 
> Yeah, an 'add' subcommand would be great, because 'set' throwing away
> all the existing patterns can lead to some frustration:

I plan to extend this feature when this series lands.

> Having said that, being forced to use 'git sparse-checkout set' and
> specify all patterns at once does have its own benefits: I needed like
> 6 subdirectories to build that subset of a big project that I was
> interested in, and now all the necessary patterns are saved in my Bash
> history, and I will be able to easily dig out the complete command in
> the future.  That wouldn't work with using the 'add' subcommand to add
> one pattern at a time.

In general, this "set" command is created first because tools can interact
with it more easily than "add" and "remove". Users would probably prefer the
"add" and "remove" way to interact.
 
>> diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh
>> index cb74715ca6..bf2dc55bb1 100755
>> --- a/t/t1091-sparse-checkout-builtin.sh
>> +++ b/t/t1091-sparse-checkout-builtin.sh
>> @@ -96,4 +96,36 @@ test_expect_success 'clone --sparse' '
>>  	test_cmp expect dir
>>  '
>>  
>> +test_expect_success 'set enables config' '
>> +	git init empty-config &&
>> +	(
>> +		cd empty-config &&
>> +		test_commit test file &&
>> +		test_path_is_missing .git/config.worktree &&
>> +		test_must_fail git sparse-checkout set nothing &&
> 
> This command prints the same error message twice:
> 
>   + test_must_fail git sparse-checkout set nothing
>   error: Sparse checkout leaves no entry on working directory
>   error: Sparse checkout leaves no entry on working directory

At this commit, I do not see two identical lines, but instead the second
line says

	error: failed to update index with new sparse-checkout paths

(that "paths" should probably be "patterns")

Thanks,
-Stolee

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, SZEDER Gábor wrote (reply to this):

On Thu, Nov 21, 2019 at 08:04:35AM -0500, Derrick Stolee wrote:
> On 11/19/2019 10:46 AM, SZEDER Gábor wrote:
> > On Mon, Oct 21, 2019 at 01:56:13PM +0000, Derrick Stolee via GitGitGadget wrote:
> >> From: Derrick Stolee <dstolee@microsoft.com>
> >>
> >> The 'git sparse-checkout set' subcommand takes a list of patterns
> >> as arguments and writes them to the sparse-checkout file. Then, it
> >> updates the working directory using 'git read-tree -mu HEAD'.
> >>
> >> The 'set' subcommand will replace the entire contents of the
> >> sparse-checkout file. The write_patterns_and_update() method is
> >> extracted from cmd_sparse_checkout() to make it easier to implement
> >> 'add' and/or 'remove' subcommands in the future.
> > 
> > Yeah, an 'add' subcommand would be great, because 'set' throwing away
> > all the existing patterns can lead to some frustration:
> 
> I plan to extend this feature when this series lands.
> 
> > Having said that, being forced to use 'git sparse-checkout set' and
> > specify all patterns at once does have its own benefits: I needed like
> > 6 subdirectories to build that subset of a big project that I was
> > interested in, and now all the necessary patterns are saved in my Bash
> > history, and I will be able to easily dig out the complete command in
> > the future.  That wouldn't work with using the 'add' subcommand to add
> > one pattern at a time.
> 
> In general, this "set" command is created first because tools can interact
> with it more easily than "add" and "remove". Users would probably prefer the
> "add" and "remove" way to interact.

Makes sense.

However, I'd like to add that in the meantime I got to like the 'set'
subcommand more and more.  I enabled-disabled sparse checkout a lot of
times while testing and trying to poke holes, and to be able to set up
everything with only one single command that I can easily recall from
the shell's history was a great help.

> >> diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh
> >> index cb74715ca6..bf2dc55bb1 100755
> >> --- a/t/t1091-sparse-checkout-builtin.sh
> >> +++ b/t/t1091-sparse-checkout-builtin.sh
> >> @@ -96,4 +96,36 @@ test_expect_success 'clone --sparse' '
> >>  	test_cmp expect dir
> >>  '
> >>  
> >> +test_expect_success 'set enables config' '
> >> +	git init empty-config &&
> >> +	(
> >> +		cd empty-config &&
> >> +		test_commit test file &&
> >> +		test_path_is_missing .git/config.worktree &&
> >> +		test_must_fail git sparse-checkout set nothing &&
> > 
> > This command prints the same error message twice:
> > 
> >   + test_must_fail git sparse-checkout set nothing
> >   error: Sparse checkout leaves no entry on working directory
> >   error: Sparse checkout leaves no entry on working directory
> 
> At this commit, I do not see two identical lines, but instead the second
> line says
> 
> 	error: failed to update index with new sparse-checkout paths

You're right, I must have looked at and copied the results from a test
run of the full patch series or even from 'next'.  The error message
changes with commit efd9c53b6d (sparse-checkout: update working
directory in-process, 2019-10-21) to what I shown above.

Interestingly, at one point in my testing I managed to get a different
error three times:

  $ ~/src/git/git sparse-checkout set foo
  error: Entry 'dir/bar' not uptodate. Cannot update sparse checkout.
  error: Entry 'dir/bar' not uptodate. Cannot update sparse checkout.
  error: Entry 'dir/bar' not uptodate. Cannot update sparse checkout.

However, when I later re-run the same sequence of commands that lead
to that three errors I got that error only twice, and couldn't
reproduce it since.

Style nit, seeing "error: Sparse..." and "error: Entry...": error
messages should start with a lower-case letter.

> (that "paths" should probably be "patterns")

That would be better indeed, but that error message goes away by the
end of the series anyway...

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, SZEDER Gábor wrote (reply to this):

On Mon, Oct 21, 2019 at 01:56:15PM +0000, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <dstolee@microsoft.com>
> 
> The instructions for disabling a sparse-checkout to a full
> working directory are complicated and non-intuitive. Add a
> subcommand, 'git sparse-checkout disable', to perform those
> steps for the user.
> 
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  Documentation/git-sparse-checkout.txt | 27 ++++++++++++---------------
>  builtin/sparse-checkout.c             | 26 +++++++++++++++++++++++++-
>  t/t1091-sparse-checkout-builtin.sh    | 15 +++++++++++++++
>  3 files changed, 52 insertions(+), 16 deletions(-)
> 
> diff --git a/Documentation/git-sparse-checkout.txt b/Documentation/git-sparse-checkout.txt
> index b933043b3d..f794d4797a 100644
> --- a/Documentation/git-sparse-checkout.txt
> +++ b/Documentation/git-sparse-checkout.txt
> @@ -48,6 +48,10 @@ To avoid interfering with other worktrees, it first enables the
>  	working directory to match the new patterns. Enable the
>  	core.sparseCheckout config setting if it is not already enabled.
>  
> +'disable'::
> +	Remove the sparse-checkout file, set `core.sparseCheckout` to
> +	`false`, and restore the working directory to include all files.

I think it would make sense to leave the sparse-checkout file behind
as-is, and only update the coonfiguration and the working tree.  That
way users could quickly go back to their last sparse checkout setup by
simply running 'git sparse-checkout init'.

  
> +static int sparse_checkout_disable(int argc, const char **argv)
> +{
> +	char *sparse_filename;
> +	FILE *fp;
> +
> +	if (sc_set_config(MODE_ALL_PATTERNS))
> +		die(_("failed to change config"));
> +
> +	sparse_filename = get_sparse_checkout_filename();
> +	fp = xfopen(sparse_filename, "w");
> +	fprintf(fp, "/*\n");
> +	fclose(fp);
> +
> +	if (update_working_directory())
> +		die(_("error while refreshing working directory"));
> +
> +	unlink(sparse_filename);
> +	free(sparse_filename);
> +
> +	return sc_set_config(MODE_NO_PATTERNS);

Hrm.  So disabling the sparse-checkout calls the same helper function
to write the configuration as initializing or setting, but with a
different parameter.  However, the error message in that function is:

  error(_("failed to enable core.sparseCheckout"));

So if something goes wrong while disabling the sparse-checkout, the
user might get an error saying "error: failed to enable
core.sparseCheckout".  That doesn't look quite right, does it?

> +}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 11/19/2019 11:15 AM, SZEDER Gábor wrote:
> On Mon, Oct 21, 2019 at 01:56:15PM +0000, Derrick Stolee via GitGitGadget wrote:
>> From: Derrick Stolee <dstolee@microsoft.com>
>>
>> The instructions for disabling a sparse-checkout to a full
>> working directory are complicated and non-intuitive. Add a
>> subcommand, 'git sparse-checkout disable', to perform those
>> steps for the user.
>>
>> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
>> ---
>>  Documentation/git-sparse-checkout.txt | 27 ++++++++++++---------------
>>  builtin/sparse-checkout.c             | 26 +++++++++++++++++++++++++-
>>  t/t1091-sparse-checkout-builtin.sh    | 15 +++++++++++++++
>>  3 files changed, 52 insertions(+), 16 deletions(-)
>>
>> diff --git a/Documentation/git-sparse-checkout.txt b/Documentation/git-sparse-checkout.txt
>> index b933043b3d..f794d4797a 100644
>> --- a/Documentation/git-sparse-checkout.txt
>> +++ b/Documentation/git-sparse-checkout.txt
>> @@ -48,6 +48,10 @@ To avoid interfering with other worktrees, it first enables the
>>  	working directory to match the new patterns. Enable the
>>  	core.sparseCheckout config setting if it is not already enabled.
>>  
>> +'disable'::
>> +	Remove the sparse-checkout file, set `core.sparseCheckout` to
>> +	`false`, and restore the working directory to include all files.
> 
> I think it would make sense to leave the sparse-checkout file behind
> as-is, and only update the coonfiguration and the working tree.  That
> way users could quickly go back to their last sparse checkout setup by
> simply running 'git sparse-checkout init'.
> 
>   
>> +static int sparse_checkout_disable(int argc, const char **argv)
>> +{
>> +	char *sparse_filename;
>> +	FILE *fp;
>> +
>> +	if (sc_set_config(MODE_ALL_PATTERNS))
>> +		die(_("failed to change config"));
>> +
>> +	sparse_filename = get_sparse_checkout_filename();
>> +	fp = xfopen(sparse_filename, "w");
>> +	fprintf(fp, "/*\n");
>> +	fclose(fp);
>> +
>> +	if (update_working_directory())
>> +		die(_("error while refreshing working directory"));
>> +
>> +	unlink(sparse_filename);
>> +	free(sparse_filename);
>> +
>> +	return sc_set_config(MODE_NO_PATTERNS);
> 
> Hrm.  So disabling the sparse-checkout calls the same helper function
> to write the configuration as initializing or setting, but with a
> different parameter.  However, the error message in that function is:
> 
>   error(_("failed to enable core.sparseCheckout"));
> 
> So if something goes wrong while disabling the sparse-checkout, the
> user might get an error saying "error: failed to enable
> core.sparseCheckout".  That doesn't look quite right, does it?

Both of your comments are valid, but will be easier to implement
later in the series, after "sparse-checkout: update working directory
in-process" which previously did not interact with the "disable"
command. I'll add a new commit that adds that integration point
and should resolve your concerns raised here.

-Stolee

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, SZEDER Gábor wrote (reply to this):

On Mon, Oct 21, 2019 at 01:56:11PM +0000, Derrick Stolee via GitGitGadget wrote:
> Getting started with a sparse-checkout file can be daunting. Help
> users start their sparse enlistment using 'git sparse-checkout init'.
> This will set 'core.sparseCheckout=true' in their config, write
> an initial set of patterns to the sparse-checkout file, and update
> their working directory.

Enabling sparse-checkout can remove modified files:

  $ mkdir dir
  $ touch foo dir/bar
  $ git add .
  $ git commit -m Initial
  [master (root-commit) ecc81bd] Initial
   2 files changed, 0 insertions(+), 0 deletions(-)
   create mode 100644 dir/bar
   create mode 100644 foo
  $ echo changes >dir/bar
  $ ~/src/git/git sparse-checkout init
  error: Entry 'dir/bar' not uptodate. Cannot update sparse checkout.
  error: failed to update index with new sparse-checkout paths
  $ cat dir/bar 
  changes

So far so good, my changes are still there.  Unfortunately, however:

  $ git add -u
  $ ~/src/git/git sparse-checkout init
  $ echo $?
  0
  $ ls
  foo

Uh-oh, my changes are gone.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, SZEDER Gábor wrote (reply to this):

On Thu, Nov 21, 2019 at 12:49:36PM +0100, SZEDER Gábor wrote:
> On Mon, Oct 21, 2019 at 01:56:11PM +0000, Derrick Stolee via GitGitGadget wrote:
> > Getting started with a sparse-checkout file can be daunting. Help
> > users start their sparse enlistment using 'git sparse-checkout init'.
> > This will set 'core.sparseCheckout=true' in their config, write
> > an initial set of patterns to the sparse-checkout file, and update
> > their working directory.
> 
> Enabling sparse-checkout can remove modified files:
> 
>   $ mkdir dir
>   $ touch foo dir/bar
>   $ git add .
>   $ git commit -m Initial
>   [master (root-commit) ecc81bd] Initial
>    2 files changed, 0 insertions(+), 0 deletions(-)
>    create mode 100644 dir/bar
>    create mode 100644 foo
>   $ echo changes >dir/bar
>   $ ~/src/git/git sparse-checkout init
>   error: Entry 'dir/bar' not uptodate. Cannot update sparse checkout.
>   error: failed to update index with new sparse-checkout paths

And after this it leaves the 'sparse-checkout' file behind, which may
or may not be desired:

  $ cat .git/info/sparse-checkout 
  /*
  !/*/

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 11/21/2019 6:49 AM, SZEDER Gábor wrote:
> On Mon, Oct 21, 2019 at 01:56:11PM +0000, Derrick Stolee via GitGitGadget wrote:
>> Getting started with a sparse-checkout file can be daunting. Help
>> users start their sparse enlistment using 'git sparse-checkout init'.
>> This will set 'core.sparseCheckout=true' in their config, write
>> an initial set of patterns to the sparse-checkout file, and update
>> their working directory.
> 
> Enabling sparse-checkout can remove modified files:
> 
>   $ mkdir dir
>   $ touch foo dir/bar
>   $ git add .
>   $ git commit -m Initial
>   [master (root-commit) ecc81bd] Initial
>    2 files changed, 0 insertions(+), 0 deletions(-)
>    create mode 100644 dir/bar
>    create mode 100644 foo
>   $ echo changes >dir/bar
>   $ ~/src/git/git sparse-checkout init
>   error: Entry 'dir/bar' not uptodate. Cannot update sparse checkout.
>   error: failed to update index with new sparse-checkout paths
>   $ cat dir/bar 
>   changes
> 
> So far so good, my changes are still there.  Unfortunately, however:
> 
>   $ git add -u
>   $ ~/src/git/git sparse-checkout init
>   $ echo $?
>   0
>   $ ls
>   foo
> 
> Uh-oh, my changes are gone.

Here, your changes are not "lost", they are staged, correct? A "git status"
should report that your changes are ready for committing. This seems to be
the expected behavior.

As to your other message about leaving the sparse-checkout file even when
the 'init' command fails, I'll fix that by using the in-process mechanism.

-Stolee

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, SZEDER Gábor wrote (reply to this):

On Thu, Nov 21, 2019 at 09:28:59AM -0500, Derrick Stolee wrote:
> On 11/21/2019 6:49 AM, SZEDER Gábor wrote:
> > On Mon, Oct 21, 2019 at 01:56:11PM +0000, Derrick Stolee via GitGitGadget wrote:
> >> Getting started with a sparse-checkout file can be daunting. Help
> >> users start their sparse enlistment using 'git sparse-checkout init'.
> >> This will set 'core.sparseCheckout=true' in their config, write
> >> an initial set of patterns to the sparse-checkout file, and update
> >> their working directory.
> > 
> > Enabling sparse-checkout can remove modified files:
> > 
> >   $ mkdir dir
> >   $ touch foo dir/bar
> >   $ git add .
> >   $ git commit -m Initial
> >   [master (root-commit) ecc81bd] Initial
> >    2 files changed, 0 insertions(+), 0 deletions(-)
> >    create mode 100644 dir/bar
> >    create mode 100644 foo
> >   $ echo changes >dir/bar
> >   $ ~/src/git/git sparse-checkout init
> >   error: Entry 'dir/bar' not uptodate. Cannot update sparse checkout.
> >   error: failed to update index with new sparse-checkout paths
> >   $ cat dir/bar 
> >   changes
> > 
> > So far so good, my changes are still there.  Unfortunately, however:
> > 
> >   $ git add -u
> >   $ ~/src/git/git sparse-checkout init
> >   $ echo $?
> >   0
> >   $ ls
> >   foo
> > 
> > Uh-oh, my changes are gone.
> 
> Here, your changes are not "lost", they are staged, correct? 

Well, yes, the changes were staged, so they must be in the object
database somewhere, the user just has to go through the dangling
objects reported by 'git fsck' to find and restore it...  So from the
perspective of an ordinary user they are lost.

> A "git status"
> should report that your changes are ready for committing. This seems to be
> the expected behavior.

No, unfortunately enabling sparse-checkout did throw the staged
changes away:

  ~/test (master +)$ ~/src/git/git sparse-checkout init
  ~/test (master)$ git status 
  On branch master
  nothing to commit, working tree clean

Note also the shell prompt going from "you have staged changes" to
"working tree is clean".

But wait, I thought that only changes to files that are excluded from
the sparse-checkout are thrown away, but as it turns out it throws
away changes to files that are included in the sparse-checkout:

  ~/test (master #)$ echo original >foo
  ~/test (master #%)$ git add .
  ~/test (master +)$ git commit -m Initial
  [master (root-commit) 04cb2a2] Initial
   1 file changed, 1 insertion(+)
   create mode 100644 foo
  ~/test (master)$ echo changes >foo 
  ~/test (master *)$ ~/src/git/git sparse-checkout init
  ~/test (master *)$ cat foo 
  changes

So far so good, but:

  ~/test (master *)$ ~/src/git/git sparse-checkout disable
  ~/test (master *)$ git add .
  ~/test (master +)$ ~/src/git/git sparse-checkout init
  ~/test (master)$ cat foo 
  original

This is not really sparse-checkout-specific, because this is what 'git
read-tree -um HEAD' always does on its own:

  ~/test (master)$ echo changes >foo 
  ~/test (master *)$ git read-tree -um HEAD
  ~/test (master *)$ cat foo 
  changes
  ~/test (master *)$ git add -u
  ~/test (master +)$ git read-tree -um HEAD
  ~/test (master)$ cat foo 
  original

These issues are present at the end of the patch series as well, but
that is sort-of expected, because the later commit message states
that:

    Remove this extra process call by creating a direct call to
    unpack_trees() in the same way 'git read-tree -mu HEAD' does.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 11/21/2019 10:27 AM, SZEDER Gábor wrote:
> On Thu, Nov 21, 2019 at 09:28:59AM -0500, Derrick Stolee wrote:
>> On 11/21/2019 6:49 AM, SZEDER Gábor wrote:
>>> On Mon, Oct 21, 2019 at 01:56:11PM +0000, Derrick Stolee via GitGitGadget wrote:
>>>> Getting started with a sparse-checkout file can be daunting. Help
>>>> users start their sparse enlistment using 'git sparse-checkout init'.
>>>> This will set 'core.sparseCheckout=true' in their config, write
>>>> an initial set of patterns to the sparse-checkout file, and update
>>>> their working directory.
>>>
>>> Enabling sparse-checkout can remove modified files:
>>>
>>>   $ mkdir dir
>>>   $ touch foo dir/bar
>>>   $ git add .
>>>   $ git commit -m Initial
>>>   [master (root-commit) ecc81bd] Initial
>>>    2 files changed, 0 insertions(+), 0 deletions(-)
>>>    create mode 100644 dir/bar
>>>    create mode 100644 foo
>>>   $ echo changes >dir/bar
>>>   $ ~/src/git/git sparse-checkout init
>>>   error: Entry 'dir/bar' not uptodate. Cannot update sparse checkout.
>>>   error: failed to update index with new sparse-checkout paths
>>>   $ cat dir/bar 
>>>   changes
>>>
>>> So far so good, my changes are still there.  Unfortunately, however:
>>>
>>>   $ git add -u
>>>   $ ~/src/git/git sparse-checkout init
>>>   $ echo $?
>>>   0
>>>   $ ls
>>>   foo
>>>
>>> Uh-oh, my changes are gone.
>>
>> Here, your changes are not "lost", they are staged, correct? 
> 
> Well, yes, the changes were staged, so they must be in the object
> database somewhere, the user just has to go through the dangling
> objects reported by 'git fsck' to find and restore it...  So from the
> perspective of an ordinary user they are lost.
> 
>> A "git status"
>> should report that your changes are ready for committing. This seems to be
>> the expected behavior.
> 
> No, unfortunately enabling sparse-checkout did throw the staged
> changes away:
> 
>   ~/test (master +)$ ~/src/git/git sparse-checkout init
>   ~/test (master)$ git status 
>   On branch master
>   nothing to commit, working tree clean
> 
> Note also the shell prompt going from "you have staged changes" to
> "working tree is clean".
> 
> But wait, I thought that only changes to files that are excluded from
> the sparse-checkout are thrown away, but as it turns out it throws
> away changes to files that are included in the sparse-checkout:
> 
>   ~/test (master #)$ echo original >foo
>   ~/test (master #%)$ git add .
>   ~/test (master +)$ git commit -m Initial
>   [master (root-commit) 04cb2a2] Initial
>    1 file changed, 1 insertion(+)
>    create mode 100644 foo
>   ~/test (master)$ echo changes >foo 
>   ~/test (master *)$ ~/src/git/git sparse-checkout init
>   ~/test (master *)$ cat foo 
>   changes
> 
> So far so good, but:
> 
>   ~/test (master *)$ ~/src/git/git sparse-checkout disable
>   ~/test (master *)$ git add .
>   ~/test (master +)$ ~/src/git/git sparse-checkout init
>   ~/test (master)$ cat foo 
>   original
> 
> This is not really sparse-checkout-specific, because this is what 'git
> read-tree -um HEAD' always does on its own:
> 
>   ~/test (master)$ echo changes >foo 
>   ~/test (master *)$ git read-tree -um HEAD
>   ~/test (master *)$ cat foo 
>   changes
>   ~/test (master *)$ git add -u
>   ~/test (master +)$ git read-tree -um HEAD
>   ~/test (master)$ cat foo 
>   original
> 
> These issues are present at the end of the patch series as well, but
> that is sort-of expected, because the later commit message states
> that:
> 
>     Remove this extra process call by creating a direct call to
>     unpack_trees() in the same way 'git read-tree -mu HEAD' does.

Thanks for the additional details.

This series intended to make the existing sparse-checkout behavior
more useful to users by not requiring manual edits of the sparse-checkout
file followed by 'git read-tree' commands. However, there do appear
to be some serious improvements that we can make in the future.

Keeping staged changes seems important, and we can address that in
the near future.

-Stolee

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, SZEDER Gábor wrote (reply to this):

On Thu, Nov 21, 2019 at 10:36:10AM -0500, Derrick Stolee wrote:
> > But wait, I thought that only changes to files that are excluded from
> > the sparse-checkout are thrown away, but as it turns out it throws
> > away changes to files that are included in the sparse-checkout:

For completeness, 'git sparse-checkout disable' throws away staged
changes as well, as it, too, runs 'git read-tree -um HEAD' (or its
equivalent).

> Thanks for the additional details.
> 
> This series intended to make the existing sparse-checkout behavior
> more useful to users by not requiring manual edits of the sparse-checkout
> file followed by 'git read-tree' commands. However, there do appear
> to be some serious improvements that we can make in the future.
> 
> Keeping staged changes seems important, and we can address that in
> the near future.

I think that at least for now 'git sparse-checkout' should flat out
refuse to init/set/disable if the working tree is not clean (but still
allow 'list', as that's a read-only operation), like the patch below.
Yeah, that way it wouldn't work in cases that appear to be safe
(unstaged changes), but it would prevent the data loss until we
carefully consider the circumstances under which these operations can
be safely allowed.


  -- >8 --

diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
index 76f65d8edd..4b05625c4c 100644
--- a/builtin/sparse-checkout.c
+++ b/builtin/sparse-checkout.c
@@ -12,6 +12,7 @@
 #include "lockfile.h"
 #include "resolve-undo.h"
 #include "unpack-trees.h"
+#include "wt-status.h"
 
 static char const * const builtin_sparse_checkout_usage[] = {
 	N_("git sparse-checkout (init|list|set|disable) <options>"),
@@ -188,6 +189,10 @@ static int sparse_checkout_init(int argc, const char **argv)
 			     builtin_sparse_checkout_init_options,
 			     builtin_sparse_checkout_init_usage, 0);
 
+	repo_read_index(the_repository);
+	require_clean_work_tree(the_repository,
+				N_("initialize sparse-checkout"), NULL, 1, 0);
+
 	if (init_opts.cone_mode) {
 		mode = MODE_CONE_PATTERNS;
 		core_sparse_checkout_cone = 1;
@@ -386,6 +391,10 @@ static int sparse_checkout_set(int argc, const char **argv, const char *prefix)
 			     builtin_sparse_checkout_set_usage,
 			     PARSE_OPT_KEEP_UNKNOWN);
 
+	repo_read_index(the_repository);
+	require_clean_work_tree(the_repository,
+				N_("set up sparse-checkout"), NULL, 1, 0);
+
 	if (core_sparse_checkout_cone) {
 		struct strbuf line = STRBUF_INIT;
 		hashmap_init(&pl.recursive_hashmap, pl_hashmap_cmp, NULL, 0);
@@ -437,6 +446,10 @@ static int sparse_checkout_disable(int argc, const char **argv)
 	char *sparse_filename;
 	FILE *fp;
 
+	repo_read_index(the_repository);
+	require_clean_work_tree(the_repository,
+				N_("disable sparse-checkout"), NULL, 1, 0);
+
 	if (sc_set_config(MODE_ALL_PATTERNS))
 		die(_("failed to change config"));
 

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Elijah Newren wrote (reply to this):

On Thu, Nov 21, 2019 at 8:37 AM SZEDER G=C3=A1bor <szeder.dev@gmail.com> wr=
ote:
>
> On Thu, Nov 21, 2019 at 10:36:10AM -0500, Derrick Stolee wrote:
> > > But wait, I thought that only changes to files that are excluded from
> > > the sparse-checkout are thrown away, but as it turns out it throws
> > > away changes to files that are included in the sparse-checkout:
>
> For completeness, 'git sparse-checkout disable' throws away staged
> changes as well, as it, too, runs 'git read-tree -um HEAD' (or its
> equivalent).
>
> > Thanks for the additional details.
> >
> > This series intended to make the existing sparse-checkout behavior
> > more useful to users by not requiring manual edits of the sparse-checko=
ut
> > file followed by 'git read-tree' commands. However, there do appear
> > to be some serious improvements that we can make in the future.
> >
> > Keeping staged changes seems important, and we can address that in
> > the near future.
>
> I think that at least for now 'git sparse-checkout' should flat out
> refuse to init/set/disable if the working tree is not clean (but still
> allow 'list', as that's a read-only operation), like the patch below.
> Yeah, that way it wouldn't work in cases that appear to be safe
> (unstaged changes), but it would prevent the data loss until we
> carefully consider the circumstances under which these operations can
> be safely allowed.

A big +1 for this from me.

We had an unfortunately large number of mis-merging and dataloss bugs
in the merge machinery that were slowly fixed over the course of more
than a decade[1], due to the fact that builtin/merge required
index=3D=3DHEAD and did so by placing a comment in the code notifying
folks that the individual merge strategies were responsible to enforce
it -- and, in practice, they *all* forgot to do so unless and until we
discovered bugs.  So, count me as a strongly in favor of just
preventatively enforcing safe conditions and then later relaxing them
in special conditions if it can be proven safe.

[1] https://public-inbox.org/git/20190725174611.14802-4-newren@gmail.com/

>   -- >8 --
>
> diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
> index 76f65d8edd..4b05625c4c 100644
> --- a/builtin/sparse-checkout.c
> +++ b/builtin/sparse-checkout.c
> @@ -12,6 +12,7 @@
>  #include "lockfile.h"
>  #include "resolve-undo.h"
>  #include "unpack-trees.h"
> +#include "wt-status.h"
>
>  static char const * const builtin_sparse_checkout_usage[] =3D {
>         N_("git sparse-checkout (init|list|set|disable) <options>"),
> @@ -188,6 +189,10 @@ static int sparse_checkout_init(int argc, const char=
 **argv)
>                              builtin_sparse_checkout_init_options,
>                              builtin_sparse_checkout_init_usage, 0);
>
> +       repo_read_index(the_repository);
> +       require_clean_work_tree(the_repository,
> +                               N_("initialize sparse-checkout"), NULL, 1=
, 0);
> +
>         if (init_opts.cone_mode) {
>                 mode =3D MODE_CONE_PATTERNS;
>                 core_sparse_checkout_cone =3D 1;
> @@ -386,6 +391,10 @@ static int sparse_checkout_set(int argc, const char =
**argv, const char *prefix)
>                              builtin_sparse_checkout_set_usage,
>                              PARSE_OPT_KEEP_UNKNOWN);
>
> +       repo_read_index(the_repository);
> +       require_clean_work_tree(the_repository,
> +                               N_("set up sparse-checkout"), NULL, 1, 0)=
;
> +
>         if (core_sparse_checkout_cone) {
>                 struct strbuf line =3D STRBUF_INIT;
>                 hashmap_init(&pl.recursive_hashmap, pl_hashmap_cmp, NULL,=
 0);
> @@ -437,6 +446,10 @@ static int sparse_checkout_disable(int argc, const c=
har **argv)
>         char *sparse_filename;
>         FILE *fp;
>
> +       repo_read_index(the_repository);
> +       require_clean_work_tree(the_repository,
> +                               N_("disable sparse-checkout"), NULL, 1, 0=
);
> +
>         if (sc_set_config(MODE_ALL_PATTERNS))
>                 die(_("failed to change config"));

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 11/21/2019 12:01 PM, Elijah Newren wrote:
> On Thu, Nov 21, 2019 at 8:37 AM SZEDER Gábor <szeder.dev@gmail.com> wrote:
>>
>> On Thu, Nov 21, 2019 at 10:36:10AM -0500, Derrick Stolee wrote:
>>>> But wait, I thought that only changes to files that are excluded from
>>>> the sparse-checkout are thrown away, but as it turns out it throws
>>>> away changes to files that are included in the sparse-checkout:
>>
>> For completeness, 'git sparse-checkout disable' throws away staged
>> changes as well, as it, too, runs 'git read-tree -um HEAD' (or its
>> equivalent).
>>
>>> Thanks for the additional details.
>>>
>>> This series intended to make the existing sparse-checkout behavior
>>> more useful to users by not requiring manual edits of the sparse-checkout
>>> file followed by 'git read-tree' commands. However, there do appear
>>> to be some serious improvements that we can make in the future.
>>>
>>> Keeping staged changes seems important, and we can address that in
>>> the near future.
>>
>> I think that at least for now 'git sparse-checkout' should flat out
>> refuse to init/set/disable if the working tree is not clean (but still
>> allow 'list', as that's a read-only operation), like the patch below.
>> Yeah, that way it wouldn't work in cases that appear to be safe
>> (unstaged changes), but it would prevent the data loss until we
>> carefully consider the circumstances under which these operations can
>> be safely allowed.
> 
> A big +1 for this from me.
> 
> We had an unfortunately large number of mis-merging and dataloss bugs
> in the merge machinery that were slowly fixed over the course of more
> than a decade[1], due to the fact that builtin/merge required
> index==HEAD and did so by placing a comment in the code notifying
> folks that the individual merge strategies were responsible to enforce
> it -- and, in practice, they *all* forgot to do so unless and until we
> discovered bugs.  So, count me as a strongly in favor of just
> preventatively enforcing safe conditions and then later relaxing them
> in special conditions if it can be proven safe.
> 
> [1] https://public-inbox.org/git/20190725174611.14802-4-newren@gmail.com/

Sounds good. Thanks, both.

-Stolee

======================

NAME
----
git-sparse-checkout - Initialize and modify the sparse-checkout
configuration, which reduces the checkout to a set of paths
given by a list of atterns.


SYNOPSIS
--------
[verse]
'git sparse-checkout <subcommand> [options]'


DESCRIPTION
-----------

Initialize and modify the sparse-checkout configuration, which reduces
the checkout to a set of paths given by a list of patterns.

THIS COMMAND IS EXPERIMENTAL. ITS BEHAVIOR, AND THE BEHAVIOR OF OTHER
COMMANDS IN THE PRESENCE OF SPARSE-CHECKOUTS, WILL LIKELY CHANGE IN
THE FUTURE.


COMMANDS
--------
'list'::
Provide a list of the contents in the sparse-checkout file.

'init'::
Enable the `core.sparseCheckout` setting. If the
sparse-checkout file does not exist, then populate it with
patterns that match every file in the root directory and
no other directories, then will remove all directories tracked
by Git. Add patterns to the sparse-checkout file to
repopulate the working directory.
+
To avoid interfering with other worktrees, it first enables the
`extensions.worktreeConfig` setting and makes sure to set the
`core.sparseCheckout` setting in the worktree-specific config file.

'set'::
Write a set of patterns to the sparse-checkout file, as given as
a list of arguments following the 'set' subcommand. Update the
working directory to match the new patterns. Enable the
core.sparseCheckout config setting if it is not already enabled.
+
When the `--stdin` option is provided, the patterns are read from
standard in as a newline-delimited list instead of from the arguments.

'disable'::
Disable the `core.sparseCheckout` config setting, and restore the
working directory to include all files. Leaves the sparse-checkout
file intact so a later 'git sparse-checkout init' command may
return the working directory to the same state.

SPARSE CHECKOUT
---------------

"Sparse checkout" allows populating the working directory sparsely.
It uses the skip-worktree bit (see linkgit:git-update-index[1]) to tell
Git whether a file in the working directory is worth looking at. If
the skip-worktree bit is set, then the file is ignored in the working
directory. Git will not populate the contents of those files, which
makes a sparse checkout helpful when working in a repository with many
files, but only a few are important to the current user.

The `$GIT_DIR/info/sparse-checkout` file is used to define the
skip-worktree reference bitmap. When Git updates the working
directory, it updates the skip-worktree bits in the index based
on this file. The files matching the patterns in the file will
appear in the working directory, and the rest will not.

To enable the sparse-checkout feature, run `git sparse-checkout init` to
initialize a simple sparse-checkout file and enable the `core.sparseCheckout`
config setting. Then, run `git sparse-checkout set` to modify the patterns in
the sparse-checkout file.

To repopulate the working directory with all files, use the
`git sparse-checkout disable` command.


FULL PATTERN SET
----------------

By default, the sparse-checkout file uses the same syntax as `.gitignore`
files.

While `$GIT_DIR/info/sparse-checkout` is usually used to specify what
files are included, you can also specify what files are _not_ included,
using negative patterns. For example, to remove the file `unwanted`:

----------------
/*
!unwanted
----------------


CONE PATTERN SET
----------------

The full pattern set allows for arbitrary pattern matches and complicated
inclusion/exclusion rules. These can result in O(N*M) pattern matches when
updating the index, where N is the number of patterns and M is the number
of paths in the index. To combat this performance issue, a more restricted
pattern set is allowed when `core.spareCheckoutCone` is enabled.

The accepted patterns in the cone pattern set are:

1. *Recursive:* All paths inside a directory are included.

2. *Parent:* All files immediately inside a directory are included.

In addition to the above two patterns, we also expect that all files in the
root directory are included. If a recursive pattern is added, then all
leading directories are added as parent patterns.

By default, when running `git sparse-checkout init`, the root directory is
added as a parent pattern. At this point, the sparse-checkout file contains
the following patterns:

----------------
/*
!/*/
----------------

This says "include everything in root, but nothing two levels below root."
If we then add the folder `A/B/C` as a recursive pattern, the folders `A` and
`A/B` are added as parent patterns. The resulting sparse-checkout file is
now

----------------
/*
!/*/
/A/
!/A/*/
/A/B/
!/A/B/*/
/A/B/C/
----------------

Here, order matters, so the negative patterns are overridden by the positive
patterns that appear lower in the file.

If `core.sparseCheckoutCone=true`, then Git will parse the sparse-checkout file
expecting patterns of these types. Git will warn if the patterns do not match.
If the patterns do match the expected format, then Git will use faster hash-
based algorithms to compute inclusion in the sparse-checkout.

SEE ALSO
--------

linkgit:git-read-tree[1]
linkgit:gitignore[5]

GIT
---
Part of the linkgit:git[1] suite
1 change: 1 addition & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -1125,6 +1125,7 @@ BUILTIN_OBJS += builtin/shortlog.o
BUILTIN_OBJS += builtin/show-branch.o
BUILTIN_OBJS += builtin/show-index.o
BUILTIN_OBJS += builtin/show-ref.o
BUILTIN_OBJS += builtin/sparse-checkout.o
BUILTIN_OBJS += builtin/stash.o
BUILTIN_OBJS += builtin/stripspace.o
BUILTIN_OBJS += builtin/submodule--helper.o
Expand Down
1 change: 1 addition & 0 deletions builtin.h
Original file line number Diff line number Diff line change
Expand Up @@ -225,6 +225,7 @@ int cmd_shortlog(int argc, const char **argv, const char *prefix);
int cmd_show(int argc, const char **argv, const char *prefix);
int cmd_show_branch(int argc, const char **argv, const char *prefix);
int cmd_show_index(int argc, const char **argv, const char *prefix);
int cmd_sparse_checkout(int argc, const char **argv, const char *prefix);
int cmd_status(int argc, const char **argv, const char *prefix);
int cmd_stash(int argc, const char **argv, const char *prefix);
int cmd_stripspace(int argc, const char **argv, const char *prefix);
Expand Down
27 changes: 27 additions & 0 deletions builtin/clone.c
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@ static const char *real_git_dir;
static char *option_upload_pack = "git-upload-pack";
static int option_verbosity;
static int option_progress = -1;
static int option_sparse_checkout;
static enum transport_family family;
static struct string_list option_config = STRING_LIST_INIT_NODUP;
static struct string_list option_required_reference = STRING_LIST_INIT_NODUP;
Expand Down Expand Up @@ -146,6 +147,8 @@ static struct option builtin_clone_options[] = {
OPT_PARSE_LIST_OBJECTS_FILTER(&filter_options),
OPT_BOOL(0, "remote-submodules", &option_remote_submodules,
N_("any cloned submodules will use their remote-tracking branch")),
OPT_BOOL(0, "sparse", &option_sparse_checkout,
N_("initialize sparse-checkout file to include only files at root")),
OPT_END()
};

Expand Down Expand Up @@ -733,6 +736,27 @@ static void update_head(const struct ref *our, const struct ref *remote,
}
}

static int git_sparse_checkout_init(const char *repo)
{
struct argv_array argv = ARGV_ARRAY_INIT;
int result = 0;
argv_array_pushl(&argv, "-C", repo, "sparse-checkout", "init", NULL);

/*
* We must apply the setting in the current process
* for the later checkout to use the sparse-checkout file.
*/
core_apply_sparse_checkout = 1;

if (run_command_v_opt(argv.argv, RUN_GIT_CMD)) {
error(_("failed to initialize sparse-checkout"));
result = 1;
}

argv_array_clear(&argv);
return result;
}

static int checkout(int submodule_progress)
{
struct object_id oid;
Expand Down Expand Up @@ -1106,6 +1130,9 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
if (option_required_reference.nr || option_optional_reference.nr)
setup_reference();

if (option_sparse_checkout && git_sparse_checkout_init(repo))
return 1;

remote = remote_get(option_origin);

strbuf_addf(&default_refspec, "+%s*:%s*", src_ref_prefix,
Expand Down
2 changes: 1 addition & 1 deletion builtin/read-tree.c
Original file line number Diff line number Diff line change
Expand Up @@ -185,7 +185,7 @@ int cmd_read_tree(int argc, const char **argv, const char *cmd_prefix)

if (opts.reset || opts.merge || opts.prefix) {
if (read_cache_unmerged() && (opts.prefix || opts.merge))
die("You need to resolve your current index first");
die(_("You need to resolve your current index first"));
stage = opts.merge = 1;
}
resolve_undo_clear();
Expand Down
Loading