Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hide sensitive content on admin panel progress monitor #19218

Merged
merged 9 commits into from
Mar 27, 2022

Conversation

lunny
Copy link
Member

@lunny lunny commented Mar 26, 2022

No description provided.

@lunny lunny added type/bug topic/security Something leaks user information or is otherwise vulnerable. Should be fixed! skip-changelog This PR is irrelevant for the (next) changelog, for example bug fixes for unreleased features. backport/v1.16 labels Mar 26, 2022
@lunny lunny added this to the 1.17.0 milestone Mar 26, 2022
modules/git/command.go Outdated Show resolved Hide resolved
@GiteaBot GiteaBot added the lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. label Mar 26, 2022
@GiteaBot GiteaBot added lgtm/need 1 This PR needs approval from one additional maintainer to be merged. and removed lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. labels Mar 26, 2022
@zeripath
Copy link
Contributor

Honestly I am suspicious that the correct thing to do here is to simply provide the correct description for these processes.

There are only three of them and we should be fine from there on.

@wxiaoguang
Copy link
Contributor

Honestly I am suspicious that the correct thing to do here is to simply provide the correct description for these processes.

There are only three of them and we should be fine from there on.

@zeripath To keep the backport simple and avoid conflicts with other RunInDir refactoring, I am sure we do not need the AddURLArgument.

The only change should be done in RunWithContext:

	desc := c.desc
	if desc == "" {
		args := c.args
		var argSensitiveUrlIndices []int
		for i, arg := range c.args {
			if strings.Index(arg, "://") != -1 && strings.IndexByte(arg, '@') != -1 {
				argSensitiveUrlIndices = append(argSensitiveUrlIndices, i)
			}
		}
		if len(argSensitiveUrlIndices) > 0 {
			args = make([]string, len(c.args))
			copy(args, c.args)
			for _, urlArgIndex := range argSensitiveUrlIndices {
				args[urlArgIndex] = util.NewStringURLSanitizer(args[urlArgIndex], true).Replace(args[urlArgIndex])
			}
		}
		desc = fmt.Sprintf("%s %s [repo_path: %s]", c.name, strings.Join(args, " "), rc.Dir)
	}

@lunny For 1.16 backport, the code above is enough, that's the minimal change.

For 1.17 , I think we can use a more general SanitizeCredentialURLs, which can be used to process all URLs, expand the details, including tests and demo. Then remove most legacy NewStringURLSanitizeXxx, which is too complex.

Why it's faster: it doesn't do the heavy url.Parse, and it only allocates one strings.Builder{} at most.

Expand below ⬇️

func SanitizeCredentialURLs(msg string) string {
	schemeSepPos := strings.Index(msg, "://")
	if schemeSepPos == -1 && strings.IndexByte(msg, '@') == -1 {
		return msg // fast return if there is no URL scheme and no userinfo
	}
	bd := strings.Builder{}
	for schemeSepPos != -1 {
		schemeSepPos += 3 // skip the "://"
		sepAtPos := -1    // the possible '@' position: "https://foo@[^here]host"
		sepEndPos := -1   // the possible end position: "The https://host[^here] in log for test"
	sepLoop:
		for sepEndPos = schemeSepPos; sepEndPos < len(msg); sepEndPos++ {
			c := msg[sepEndPos]
			if ('A' <= c && c <= 'Z') || ('a' <= c && c <= 'z') || ('0' <= c && c <= '9') {
				continue
			}
			switch c {
			case '@':
				sepAtPos = sepEndPos
			case '-', '.', '_', ':', '~', '!', '$', '&', '\'', '(', ')', '*', '+', ',', ';', '=', '%':
				continue // due to RFC 3986, userinfo can contain - . _ ~ ! $ & ' ( ) * + , ; = : and any percent-encoded chars
			default:
				break sepLoop
			}
		}
		if sepAtPos != -1 {
			bd.WriteString(msg[:schemeSepPos])
			bd.WriteString(userPlaceholder)
			bd.WriteString(msg[sepAtPos:sepEndPos])
		} else {
			bd.WriteString(msg[:sepEndPos])
		}
		msg = msg[sepEndPos:]
		schemeSepPos = strings.Index(msg, "://")
	}
	bd.WriteString(msg)
	return bd.String()
}


func TestSanitizeCredentialURLs(t *testing.T) {
	cases := []struct {
		input    string
		expected string
	}{
		{
			"https://github.com/go-gitea/test_repo.git",
			"https://github.com/go-gitea/test_repo.git",
		},
		{
			"https://mytoken@github.com/go-gitea/test_repo.git",
			"https://" + userPlaceholder + "@github.com/go-gitea/test_repo.git",
		},
		{
			"https://user:password@github.com/go-gitea/test_repo.git",
			"https://" + userPlaceholder + "@github.com/go-gitea/test_repo.git",
		},
		{
			"URLs in log https://u:b@h and https://u:b@h:80/, with https://h.com and u@h.com",
			"URLs in log https://" + userPlaceholder + "@h and https://" + userPlaceholder + "@h:80/, with https://h.com and u@h.com",
		},
	}

	for n, c := range cases {
		result := SanitizeCredentialURLs(c.input)
		assert.Equal(t, c.expected, result, "case %d: error should match", n)
	}
}

@lunny lunny force-pushed the lunny/hide_process_sensitive branch from e81accd to 2dff616 Compare March 27, 2022 07:41
@lunny
Copy link
Member Author

lunny commented Mar 27, 2022

@wxiaoguang @zeripath both done.

modules/git/command.go Outdated Show resolved Hide resolved
Copy link
Contributor

@wxiaoguang wxiaoguang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM.

I still worry that these changes to RunInDir will conflict with an other PR a a lot, while we might have a chance to avoid these conflicts at the moment. (Just an opinion, either is fine)

@GiteaBot GiteaBot added lgtm/done This PR has enough approvals to get merged. There are no important open reservations anymore. and removed lgtm/need 1 This PR needs approval from one additional maintainer to be merged. labels Mar 27, 2022
@zeripath
Copy link
Contributor

zeripath commented Mar 27, 2022

@wxiaoguang that sanitizer function looks potentially interesting but I think you could improve further.

  1. String builders still cause a lot of allocations and still have the final alloc to make the string. We have an actual max size here and so can actually get away with 2 allocations. Create an initial []byte len(input)+Len(placeholder) then at the end string it.
  2. I have a feeling that all of those string slices also create new strings - if you use unsafe to cast the inputs to []byte initially and read them only from there on you can totally avoid that. (util.StringToReadOnlyBytes from goldmark util)
  3. If possible avoid the iteration byte-by-byte through the whole string by using bytes.indexof. Counterintuitively this can be much quicker than iterating character by character. I think this is because bytes.indexof can often avoid the related bounds checks. A well considered regexp maybe able to do so similarly. Running bytes.indexof twice can still be quicker than iteration.

Signed-off-by: Andrew Thornton <art27@cantab.net>
@zeripath
Copy link
Contributor

I've fixed the conflicts and lint.

@zeripath
Copy link
Contributor

I think there's still potential for an argument to be incorrectly assessed as an URL when it is not an URL here - this would then lead to that argument being turned into an unprocessable url - but... this is just a label that is supposed to be helpful to explain what this process is about and some mangling should not be a problem.

Copy link
Contributor

@zeripath zeripath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just need to adjust the descs. Do not merge yet.

modules/git/repo.go Outdated Show resolved Hide resolved
modules/git/repo.go Outdated Show resolved Hide resolved
Signed-off-by: Andrew Thornton <art27@cantab.net>
modules/git/repo.go Outdated Show resolved Hide resolved
@zeripath zeripath merged commit c29fbc6 into go-gitea:main Mar 27, 2022
@zeripath
Copy link
Contributor

Please send backport

@lunny lunny deleted the lunny/hide_process_sensitive branch March 27, 2022 12:13
lunny added a commit to lunny/gitea that referenced this pull request Mar 27, 2022
Sanitize urls within git process descriptions.

Co-authored-by: wxiaoguang <wxiaoguang@gmail.com>
Co-authored-by: Andrew Thornton <art27@cantab.net>
@lunny lunny added the backport/done All backports for this PR have been created label Mar 27, 2022
zeripath added a commit that referenced this pull request Mar 27, 2022
…) (#19231)

* Hide sensitive content on admin panel progress monitor (#19218)

Sanitize urls within git process descriptions.

Co-authored-by: wxiaoguang <wxiaoguang@gmail.com>
Co-authored-by: Andrew Thornton <art27@cantab.net>

* Do not include global arguments in process manager (#19226)

Backport #19226

The git command by default adds a number of global arguments. These are not
helpful to be displayed in the process manager and so should be skipped for
default process descriptions.

Signed-off-by: Andrew Thornton <art27@cantab.net>

Co-authored-by: wxiaoguang <wxiaoguang@gmail.com>
Co-authored-by: Andrew Thornton <art27@cantab.net>
@wxiaoguang
Copy link
Contributor

@wxiaoguang that sanitizer function looks potentially interesting but I think you could improve further.

1. String builders still cause a lot of allocations and still have the final alloc to make the string. We have an actual max size here and so can actually get away with 2 allocations. Create an initial []byte len(input)+Len(placeholder) then at the end string it.

2. I have a feeling that all of those string slices also create new strings - if you use unsafe to cast the inputs to []byte initially and read them only from there on you can totally avoid that. (util.StringToReadOnlyBytes from goldmark util)

3. If possible avoid the iteration byte-by-byte through the whole string by using bytes.indexof. Counterintuitively this can be much quicker than iterating character by character. I think this is because bytes.indexof can often avoid the related bounds checks. A well considered regexp maybe able to do so similarly. Running bytes.indexof twice can still be quicker than iteration.

Most done in

Chianina pushed a commit to Chianina/gitea that referenced this pull request Mar 28, 2022
Sanitize urls within git process descriptions.

Co-authored-by: wxiaoguang <wxiaoguang@gmail.com>
Co-authored-by: Andrew Thornton <art27@cantab.net>
@go-gitea go-gitea locked and limited conversation to collaborators Apr 28, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
backport/done All backports for this PR have been created lgtm/done This PR has enough approvals to get merged. There are no important open reservations anymore. skip-changelog This PR is irrelevant for the (next) changelog, for example bug fixes for unreleased features. topic/security Something leaks user information or is otherwise vulnerable. Should be fixed! type/bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants