Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFD 0169: Automatic Updates for Agents #40190
RFD 0169: Automatic Updates for Agents #40190
Changes from 5 commits
33031d8
c453124
796fa9e
1b75941
e2811de
a119c60
2a8cdc7
1f3278d
05aad92
ed4780d
5bb6056
74a452e
63c9a35
a0a912f
6371c82
1022633
af20fe2
7fd207d
27774cb
57fc557
bc28150
3da6525
4a81d9d
da27831
994865d
052c490
be4956b
c1784a7
511bf59
a1316cd
f6bab8b
6f55658
d3e5b09
3555212
f820b52
88bdda4
f98258c
00a1ea0
7dd1144
345d103
a022fd5
9e6090f
3f5721c
46a7a2a
6f62e3d
b86a1ce
7587fa5
0e90455
3cabeb8
0d492f8
de53461
34a82cd
5a62d6b
0362cd1
1070927
7b384ff
4b03c02
beb7c97
a6403ee
568e0fe
797b790
1b90a34
dc20017
c065060
9bcd324
e748820
4f93a7f
aff1df3
c91977f
ec8d675
7c89fb6
2b95f8e
ce6de47
26c43b0
7d0f618
69d758c
430b7a4
4ac0e9c
e87e3dc
fecefc7
e7b1c10
2a5515e
4b33f2a
0a0d658
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where will the code for tool live? I'm strongly of the opinion that if it's not versioned with Teleport then it shouldn't be in this repo or teleport.e, rather, it should be in it's own repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
teleport-updater
binary would also ship in the Teleport tarball, so it could be versioned with Teleport.That said, it might nice if the
teleport-upgrader
package was versioned separately from Teleport, since the package version won't match the version of Teleport installed (or the version of the updater that is updated by the updater, after the initial Teleport update).I could see this working either way. No strong opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused here - the RFD says:
But here you've said:
and
Is the tool going to be versioned with Teleport or not? If not, and if it'll be used to install Teleport, then why would it ship inside a Teleport tarball?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should version the updater with Teleport. That's how we version all assets today. Let's stick to that scheme.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@russjones is there a benefit to keeping this versioned with Teleport? This has caused problems in the past (in fact I spent hours today working on a problem caused by this), and there is precedence for not versioning customer-facing tools/products with Teleport (see shared-workflows tools, the projects we've forked, and TAG).
Here are the benefits I see of pulling this code out of teleport/teleport.e:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey folks this discussion somehow disappeared, so I'm commenting here in hopes that it shows back up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't be under
/usr/local
- the FHS specifically states that the contents of/usr/local
"needs to be safe from being overwritten when the system software is updated". A more appropriate directory would be/usr/bin
or/usr/sbin
, depending on whether or not we consider these binaries to be "used exclusively by the system administrator".There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read "system software" as software installed and upgraded via an OS-provided tool like a package manager. I view these updater-managed versions of Teleport as installed by a user who invoked
teleport-updater enable
to create a local installation of Teleport.For this RFD, updating system software using the OS package manager will not touch /usr/local.
I selected
/usr/local
over/usr
to conform with:Seems better not to conflict at a file-level with the non-auto-updating
teleport
package, which is system software that has a binary in /usr/bin.Not strongly opinionated, but the OS won't be tracking these files via the package DB, so
/usr/local/bin
seems closer to the spec to me. Also, it's not unheard of for/usr
to be read-only outside of system updates, and/usr/local
to remain read-write.https://refspecs.linuxfoundation.org/FHS_3.0/fhs/ch04s09.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. However this whole tool/endpoints/backing infra (on our end) is a package manager system.
Right - which is exactly why we should place it there. This RFD is for part of a package management system, and one of the functions of this system is to provide Teleport upgrades.
IMO in this case we should either:
If customers complain about this then we could provide an override for installation directory via the config file. I haven't seen a ton of cases (not that they don't exist) of
/usr
being read only with/usr/local
still writable.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might make sense to poll some customers and get them to weight in on this (and possibly some other updater implementation details).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The prefix at
/usr
is traditionally under the exclusive control of the system package manager, with/usr/local
being for manually installed programs and/opt/<something>
for non-distro packages of various kinds. I would not expect a third party package manager to put anything in/usr/bin
or/usr/lib
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey folks, this customer ticket just got added to Internal Tools goals for next quarter. Given that this shows at least one customer making the same argument that I am, I'm reopening this for discussion. I'd like to apply the same decision (one way or the other) to both this RFD and the customer's ticket.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The customer is arguing that the teleport package should install the
teleport
binary to/usr/bin
instead of/usr/local/bin
, which I agree better conforms to the LSB spec.Analogously, the upgrader installed by the upgrader package should live in
/usr/bin/teleport-updater
.However, the teleport binaries that are installed by the user invoking
teleport-updater
commands are not managed by the system package manager, so it would not match the spec for them to land in/usr/bin
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't matter whether a package comes from a package manager provided by the distro, or one that we write. To quote this customer's ticket:
We've established that Teleport's auto updates system is a package management system.
teleport-updater
is the local package manager. When the command is invoked, either manually or automatically (as is the primary intention of this work), it installs, upgrades, and removesteleport
packages. This is not the user building Teleport themselves andcp
ing the local build to a bin directory - this is our software managing, end to end, the lifecycle of our packages. Therefore, the binaries should be placed under/usr/bin
.On top of all of that, the linked ticket shows that customers expect
teleport
to be in/usr/bin
. It shouldn't matter to our customers how they install Teleport on their OS. The behavior (including the location of files) should be the same, regardless of installation method (for a given OS). Customers are telling us that they don't wantteleport
installed to this path, so we shouldn't be installing it to this path regardless of installation method.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree that the FHS intends to suggest that any software that installs binaries counts as package manager. By this definition,
make
would be a packager manager, andmake install
should also install into/usr/bin
.The linked ticket is referring to the Teleport system package, not a user-configured updater that installs Teleport from tarballs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm in agreement about this. FHS makes no mention of package managers at all. However by any reasonable definition of a package manager,
teleport-updater
is one, as a part of Teleport's auto updates package management system.Here are several definitions of a "package manager" from several independent sources:
By any of these definitions the auto updates system is a package management system, and the
teleport-updater
tool is a package manager.Of course you're correct here - it is in reference to the current Teleport package, and could not be in reference to a new system that has yet to be developed or deployed to customers. If you replace
RPM(\/| and )DEB
withteleport-updater
, then it's clear that this issue will apply when theteleport-updater
package manager is updating/installingteleport
as well.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, we still need a way to bootstrap the first installation.
Something that adds teleport repo public key, the repo metadata and then installs and runs the updater.
Is that going to be a new script that will be used everywhere we install teleport?
We already have the oneoff script which could serve this purpose (with some little modifications).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a soft-onboarding, I would assume it would be possible for us to host our own "version" server to return this payload?
i.e:
This once again would allow us to opt into updates for certain instances whilst not updating others (until we have the ability to bucket instances)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@thameezb For your use-case, I would recommend sticking your current self-managed flow. Only change I would recommend is using
tctl autoupdate watch
to watch for updates and push that version out to your repositories.You can switch over to the new system once we have bucketed rollout.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would the new teleport packages still be published to the existing teleport
stable/cloud
repos? As currently we use the standard teleport-ent-updater (except that it points to our own version server and not teleport's version server or our CP)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If possible it'd be great to not publish/reduce how much we publish to that component. It would reduce publishing time quite a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then our current flow would break (as current version of teleport-ent-updater uses
stable/cloud
to install packages).There would need to be a better migration path until bucketing is supported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fheinecke not sure why comments are not being added to the thread above, but we would require the packages to exist on
stable/cloud
as that is whatteleport-ent-updater
usesThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to allow cloud customers to configure this? It would allow us to safely opt into versions at our own pace (until we can bucket instances into PROD/PREPROD etc)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cloud performs cluster upgrades, and we'd like to ensure that the cluster and agent versions are always extensively tested and compatible. Uncoordinated cluster and agent upgrades could lead to incompatible versions and lost access to resources.
It sounds like you want to rollout new agent versions across the same cluster in buckets. Is this correct?
If so, I do have an idea for bucketed agent upgrades. It would look something like this:
teleport-updater enable --bucket 2
tctl
incluster_maintenance_config
/v1/webapi/ping
would return separate times for each bucketWould this work for your use case?
I may open a separate RFD to add this in the future (depending on user feedback)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is correct yes.
It can work (
stable
andlatest
buckets would be sufficient for us TBH), where all of our PREPROD agents are enrolled intolatest
and our PROD intostable
. This is currently the system we use with our custom Teleport version serverHowever due to past issues with Teleport agent auto-upgrading, we would not be able to onboard onto this new approach unless we have the ability to control the roll-out of updates to our agents. Either as defined above, or something akin to #40190 (comment), while the approach defined above allows for bucketing agent updates
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you create another section called
### Installer
?One of the biggest problems I saw during during our push to get customers to switch to automatic updates was the number of different ways we have to install Teleport.
I'd like us to consolidate on a single way to install agents that 90% of our customers use. Ideally anything we build should work for self-hosted and Cloud similar to client tools updates.
The good news I think you have the answer.
This works for Cloud, but would also work for self-hosted customers. We just have to work through the edge cases.
Otherwise, what's left?
curl | bash
on our downloads page. Please enumerate all the scripts/locations that will need to be updated.teleport-updater
package can support OSS, Enterprise, and Enterprise FIPs installations.teleport-updater
is available for all the different architectures we support (I think this is already the case).Let's make sure that's in-scope for this RFD.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can put this information into the
ping
endpoint. Then you no longer have to worry about which binary you need to install (OSS, Enterprise, FIPS Enterprise), you just install theteleport-updater
and it downloads and runs this right version.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something we should consider: the
apt-get install && teleport-updater enable
approach works fine if you running the commands on the host (or running something like Ansible), what would we recommend to someone that builds an AMI using this new install method?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I've covered everything above:
server_edition
field is added toping
, along with associated changes to "Runtime" section logicLet me know if this looks good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we make this new installer the default for everything (which I think is a great approach, we must reduce complexity) I would suggest going further by making the new binary the only recommended way of installing teleport, even without Automatic Updates.
We could name it
teleport-version-manager
or whatever name that doesn't imply it requires AUs and expose the following commands:tvm update
as already described in the RFDtvm follow
/tvm unfollow
to follow updates from a proxy, this is the current RFDenable
/disable
.tvm install <X.Y.Z|teleport.example.com>
would be an additional command not enabling AUs, but installing a static version (given, or from a proxy).I think this would answer
We would recommend "install tvm, and run
tvm install teleport.example.com
(for non-AU setups) ortvm follow teleport.example.com
(for AU) on startup".There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tvm use
sounds better to me, but I don't have a strong opinion.Happy to rename. I'll wait for a few others to chime in, and change it when we have some consensus -- names can be contentious 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even though I really want to refactor our installation process, looking at our capacity for Q2, I don't think we can.
So I am in-favor of keeping it simple and solving the issue for Cloud customers first then coming back in Q3 and fixing our installation problems as a whole.
Here is what I am thinking.
teleport-updater
to not introduce too many changes at once.What do you guys think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Works for me. Maybe there are two phases here:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we postpone the simplification we must absolutely ensure we do it the next quarter.
The installation methods are already a mess and we're adding new methods, instructions and scripts. This is not maintainable and self-hosted customers are already complaining about how hard it is to just install teleport. This is not an issue we can solve with docs. We've tried for the last year and the installation docs only became worse and more obscure since I joined Teleport 2 years ago.
I'm against adding new ways without removing the old ones but I fear that we'll do it anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did we scope out self-hosted installations?
Is there extra work for those scenarios? Except for docs around the
autoupdate_version
resource?Adding more complexity to distinguish between cloud/self-hosted in the installer scripts seems we are shifting the complexity to something which is way hard to test/maintain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we build some kind of observability around this and use this as a success criteria?
I think we already have some kind of environment variable that teleport can pick up to report on which kind of en it's running, maybe we could extend this mechanism and have the unified updater set metadata.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional telemetry notes: we have an existing env var set by the updater and reported by the agent in the heartbeats. This allows detecting which agent is enrolled into AUs and raise alerts if some agents are not. The new updater must continue setting this variable, or we need to have the agent check for
updater.yaml
and report ifenabled
is true.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will add a section on telemetry. What is the env var?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After implementing the updater we should spend some time to test extensively how the upgrade from the existing APT setups.
This list should also describe the cloud rollout strategy for existing customers (with or without AUs) so we don't cause the same tenant fragmentation as the last time.