Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added new agent capability AllowWorkDirectoryRepositories so repositories can be checked out below the work directory level too on self hosted agents #3475

Conversation

echalone
Copy link
Contributor

@echalone echalone commented Jul 29, 2021

Added new agent capability AllowWorkDirectoryRepositories. Can be activated by passing "--allowWorkDirectoryRepositories" when configuring the agent, the default behavior is for the capability to not be activated.

The feature request for this new capability can be found here: #3951
The pull request for the documentation of this new feature, which has to be approved after the feature has been implemented, can be found here: MicrosoftDocs/azure-devops-docs#11154

If the allowWorkDirectoryRepositories capability was activated then repositories can be checked out below the work directory level too, and not just below the build directory level, by adding "../" at the beginning of the checkout path. So "../Repositories/RepoName" would then be a valid repository checkout path (but not any higher, so "../../Repositories/RepoName" is still not allowed). This is to potentially save enormous amounts of agent drive space on self hosted agents with large repositories which are shared between pipelines.

The .agent configuration file for an agent with this capability activated would for example then look like this:

{
  "agentId": 1,
  "agentName": "TestAgent",
  "poolId": 1,
  "poolName": "default",
  "serverUrl": "https://dev.azure.com/testurl/",
  "workFolder": "_work",
  "allowWorkDirectoryRepositories": true,
}

This setting is also visible in the agent capability page as setting "AllowWorkDirectoryRepositories".

A valid yaml checkout step checking out the repository below the working directory would for exmample then look like this (provided allowWorkDirectoryRepositories was enabled in the .agent configuration file):

- checkout: self
  path: ../Repositories/MyRepo

This allows for agent scoped repositories instead of just pipeline scoped repositories and can potentially save a lot of space on self hosted agents with multiple pipelines using the same large repository/repositories. For example: We have one 50 GB and an additional 10 GB repository which we would have to check out and use drive space for each pipeline we use. It's just not possible to use pipelines the way they are intended then and we have to work with tricks to not fill up our drives immediatly (like triggering the same pipeline with differnt settings through other pipelines). This however would finally provide a clean way to save enormous amounts of agent drive space, and in the end also money, on self hosted agents (just imagine having to check out 60 GB for every pipeline or just once per agent). It would also allow us to finally use pipelines as intended and create one for every needed setting we have. So we really need this feature, that's why I implemented it as an option and really hope you'll incorporate into the next version(s).

The default behavior remains allowing only pipeline scoped repositories (below the build directory level) and it is highly recommended to not specifically allow work directory repositories for public agents as build pipelines could interfere with each other and it could pose a security risk by allowing users with access to yaml pipelines defintions to change any files below the work directory. I also added UnitTests for this functionality when the capability is activated and for the default behaviour (capability is not activated) which should be to continue to not allow work directory repositories.

… in .agent config file to true or false, the default behavior is false.

If AllowWorkDirectoryRepositories capability is set to true then repositories can be checked out below the work directory level too, and not just below the build directory level, by adding "../" at the beginning of the checkout path. So "../Repositories/RepoName" would then be a valid repository checkout path (but not any higher, so "../../Repositories/RepoName" is still not allowed). This allows for agent scoped repositories instead of just pipeline scoped repositories and can potentially save a lot of space on self hosted agents with multiple pipelines using the same large repository/repositories. The default behavior remains allowing only pipeline scoped repositories (below the build directory level) and it is highly recommended to not specifically allow work directory repositories for public agents as build pipelines could interfere with each other and it could pose a security risk by being able to change any files below the work directory. Also added UnitTests for this functionality.
…oot directory (build or working directory, depending on whether AllowWorkingDirectoryRepositories capability is activated) cannot be used as the checkout path for a repository, only directories below them (this is already tested in other UnitTests)
@echalone echalone changed the title User/echalone/work folder repositories capability Added new agent capability AllowWorkDirectoryRepositories so repositories can be checked out below the work directory level too on self hosted agents Jul 29, 2021
…lity_MasterConflictFix

# Conflicts:
#	src/Agent.Plugins/RepositoryPlugin.cs
#	src/Microsoft.VisualStudio.Services.Agent/ConfigurationStore.cs
@echalone
Copy link
Contributor Author

echalone commented Jan 26, 2022

Fixed merge conflicts, pls maybe get to this at least when PR #3473 has been closed. It's also already lying around half a year and is a much wanted capability. Again, if it helps (with budgets or something like that), we're a Microsoft Gold Partner. If you need any information about our Gold Partnership for accounting or whatnot so you can invest a bit of time just let me know what you need.
Marking @anatolybolshakov and @DaniilShmelev just so it's noticed.

@anatolybolshakov
Copy link
Contributor

@echalone thanks! Let us review and test this PR

@AndreyIvanov42
Copy link
Contributor

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@@ -18,6 +18,9 @@ public class ConfigureAgent : ConfigureOrRemoveBase
[Option(Constants.Agent.CommandLine.Flags.AddMachineGroupTags)]
public bool AddMachineGroupTags { get; set; }

[Option(Constants.Agent.CommandLine.Flags.AllowWorkDirectoryRepositories)]
public bool AllowWorkDirectoryRepositories { get; set; }

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really necessary to have this setting? Since to my understanding the (public) hosted agents will create a "fresh" agent each time, so there is no risk of having anyone else's code on you agent. If the hosted agents might still have code from another organisation on it this would be a very serious security issue.

@PaulVrugt
Copy link

Do we really need to "shared" repository to be a setting? Since a build agent can only process a single job at a time, there are no concurrency issues. When my assumption about the hosted agents having a fresh server each build there is no risk of accessing other's code. If this is the case there is an argument to be made to have repositories shared over different pipelines by default

@anatolybolshakov
Copy link
Contributor

Could it probably make sense to save repositories in some specific folder inside of _work folder in this case (for example '_sharedRepos')? This should help to avoid conflicts between repos names and agent internals.

@AndreyIvanov42
Copy link
Contributor

AndreyIvanov42 commented May 20, 2022

This changes can potentially break the agent's work and imposes a restriction on the naming of the path parameter for the checkout task (By example 1, SourceRootMapping, _task, etc - these are the names of the directories that the agent uses in its work). The following scheme will be invalid:

image

@AndreyIvanov42
Copy link
Contributor

AndreyIvanov42 commented May 20, 2022

Could it probably make sense to save repositories in some specific folder inside of _work folder in this case (for example '_sharedRepos')? This should help to avoid conflicts between repos names and agent internals.

In this case, it makes sense to rename this setting to make it more transparent to users.

@echalone
Copy link
Contributor Author

It's true that you'd have to be careful not to use one of the potentially used directories, though I would have left this in the responsibility of the user... if you wanna use this option you gotta know what you do so to say.
If I thought about this correctly then we would need another additional parameter during the checkout step that would tell us that this is a shared directory. This would pose a problem as the Yaml definition for pipelines is strictly defined and would need extension first (and this option value would need to be forwarded to the agent then)... there would also be no ability to use this feature for us then until a new Azure DevOps Server On-Premis version with this new Yaml definition would then be available. Currently there hasn't been a new Azure DevOps Server On-Premis release since 2020 for example.

@PaulVrugt
Copy link

So what's next for this? I guess we need someone to decide if we want to actually build this functionality, and if so, how it should be implemented. @AndreyIvanov42 could you weigh in here and tell us if and how we should build this?

@echalone
Copy link
Contributor Author

echalone commented Jun 1, 2022

Right, so, in my opinion there are three options, in any case you would have to enable this option when installing the agent, so it won't be a breaking change or impact Microsoft hosted agents:

  1. Go with it as I wrote it, in this case the user has to know not to use a subdirectory that's in use by the agent (like 1, 2, 3, _work, etc.), but we could simply add this to the documentation (personally I think that'd be good enough)
  2. Change the programming so that the path has to be at least "..\_shared\" or some other fixed name for the first subdirectory. Though the user could just make sure they use it this way themselves in the first option of course. This would need a bit of reworking and rewriting of the UnitTests. I don't know when I'd find time for this or if you'd want to do this yourselfes.
  3. Add an aditional option to the yaml checkout step that would tell us it's a shared directory and would automatically put it in the _shared subdirectory or something like that. However this would need the most additional effort as well as a change to the yaml checkout step definition which would not arrive anytime soon in an on-premises Azure DevOps Server which we're using for example. It would also need aditional coordination with whatever team is maintining the yaml checkout step definition and it would need a complete rework of the functionality. So I'd rather not got with this option ^^

@PaulVrugt
Copy link

What if we simply go with moving the path to which to repo is checked out to a "shared" location like _repos in the agent folder. In what scenario would this break any existing functionality? If we also change the $(Build.SourcesDirectory) to point to this folder, the change should be transparent, except when people would use the $(Build.SourcesDirectory) like $(Build.SourcesDirectory)\..\SomeOtherFolder which doesn't really make sense.

The hosted agents should also be fine, because a fresh agent is used for each job, and this fresh agent doesn't have any source from prior builds.

This way we wouldn't even need a setting at all

@echalone
Copy link
Contributor Author

echalone commented Jun 1, 2022

I think it could potentially break the agents of other people who do not use a fresh agent every time and may have two repositories with the same name but which are not actually the same repos. But that's for you guys to decide if that's acceptable. If so, then yes that's another way.
Question is how this would impact the currently already wrong primary repository detection and its fix which I'd be fixing with this PR: #3473
And how it would impact the potential option to set another repository as the default working directory which would be this PR (though this isn't too important as its just some additional feature that's not really needed by us): #3479
However, this option would need a complete rewrite too I guess and therefore a lot of new effort.

@echalone
Copy link
Contributor Author

I can now additional confirm that some extensions, for example the SonarQube extension, will have problems when the sources are not checked out at the working directory of a build, which is per default either the path to the default repository or _work\1\s in a multi-checkout scenario, since SonarQube expects the sources to be below the working directory. So any change in the direction of shared repositories by default would need to consider this. This PR which would integrate a function to select a repository as the default working directory would come in handy here anyways for multi-checkouts, and it would also determine System.DefaultWorkingDirectory to be used as the variable for working directory of extensions instead of Build.SourcesDirectory (which currently always have the same values): #3479

Also, the primary repository detection, which could interfere with correct working directory detection, has an error which will be fixed with this PR: #3473

@echalone
Copy link
Contributor Author

echalone commented Sep 1, 2022

Changed link to issue since it was wrongfully closed by a stale-bot

@aleksandrlevochkin
Copy link
Contributor

Hi @echalone thank you for your work! We have decided to implement basically what you wrote, with the difference that it is activated by an agent knob instead of a capability, as this approach seems simpler and more flexible. Here is the PR: #4423 Feel free to participate if you don't mind
As for your other PRs, we'll also take a look at them once there's no higher-prioritized tasks.

@echalone
Copy link
Contributor Author

echalone commented Sep 7, 2023

Hi @echalone thank you for your work! We have decided to implement basically what you wrote, with the difference that it is activated by an agent knob instead of a capability, as this approach seems simpler and more flexible. Here is the PR: #4423 Feel free to participate if you don't mind As for your other PRs, we'll also take a look at them once there's no higher-prioritized tasks.

Thank you :)

@echalone
Copy link
Contributor Author

PR #4423 that has been forked from this one has been merged with master, so closing this PR now since it's done its duty

@echalone echalone closed this Oct 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants