Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Contribution] qubes-updates-cache #1957

Open
andrewdavidwong opened this issue May 5, 2016 · 85 comments
Open

[Contribution] qubes-updates-cache #1957

andrewdavidwong opened this issue May 5, 2016 · 85 comments
Labels
bounty This issue has a public bounty associated with it. C: contrib package C: updates community dev This is being developed by a member of the community rather than a core Qubes developer. P: major Priority: major. Between "default" and "critical" in severity. S: partial Status: partial. Work on this issue is partially complete, but it is not actively being worked on. T: enhancement Type: enhancement. A new feature that does not yet exist or improvement of existing functionality.

Comments

@andrewdavidwong
Copy link
Member

andrewdavidwong commented May 5, 2016

Community Dev: @rustybird
PoC: https://github.com/rustybird/qubes-updates-cache


It's common for users to have multiple TemplateVMs that download many of the same packages when being individually updated. Caching these packages (e.g., in the UpdateVM) would allow us to download a package only once, then make it available to all the TemplateVMs which need it (and perhaps even to dom0), thereby saving bandwidth.

This has come up on the mailing lists several times over the years:

Here's a blog post about setting up a squid caching proxy for DNF updates on baremetal Fedora:

@andrewdavidwong andrewdavidwong added T: enhancement Type: enhancement. A new feature that does not yet exist or improvement of existing functionality. C: core P: major Priority: major. Between "default" and "critical" in severity. labels May 5, 2016
@andrewdavidwong andrewdavidwong changed the title Cache package updates Cache updates May 5, 2016
@ghost
Copy link

ghost commented May 5, 2016

It's indeed a common problem when deploying fedora vms/containers, or with server farms. Debian has apt-cacher(ng) but fedora doesn't have something similar.

Solutions that came up:

Anyway, instead of having specific tools for each distro it would be wiser to have a generic solution.
So - all in all, the squid solution may be the best one, with cache misses rate being something to investigate.

@marmarek
Copy link
Member

marmarek commented May 5, 2016

Actually apt-cacher-ng works for Fedora too :)
Maybe we can simply use it instead of tinyproxy as update proxy?

@ghost
Copy link

ghost commented May 5, 2016

apt-cacher-ng works on fedora for mirroring debian stuff, but does it really work for mirroring (d)rpms/metadata downloaded with yum/dnf ?

From the doc [1]: "6.3 Fedora Core - Attempts to add apt-cacher-ng support ended up in pain and the author lost any motivation in further research on this subject. "

[1] https://www.unix-ag.uni-kl.de/~bloch/acng/html/distinstructions.html#hints-fccore

@marmarek
Copy link
Member

marmarek commented May 5, 2016

Yes, I've seen this. But in practice it works. The only problem is
dynamic mirror selection - it may make caching difficult (when each time
different mirror is selected).

Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

@adrelanos
Copy link
Member

Marek Marczykowski-Górecki:

Actually apt-cacher works for Fedora too :)
Maybe we can simply use it instead of tinyproxy as update proxy?

Can it also let through non-apt traffic? Specifically I am wondering
about tb-updater.

@marmarek
Copy link
Member

marmarek commented May 5, 2016

Can it also let through non-apt traffic? Specifically I am wondering
about tb-updater.

That's interesting question - if you have apt-cacher-ng instance handy,
it worth a try. Anyway it has quite flexible configuration, so probably
doable.

Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

@adrelanos
Copy link
Member

I don't think there is a generic solution that works at the same time
well enough for both, Debian and Fedora based. Why do we need a generic
all at once solution anyhow? Here is what I suggest:

  • Let's keep tinyproxy as is. As fallback. And for misc traffic.
    (tb-updater, user custom stuff and what not.)
  • Let's install apt-cacher-ng and the fedora caching proxy by default in
    the UpdateVM.
  • Let's configure Debian based VMs to use apt-cacher-ng.
  • Let's configure Fedora based VMs to use the fedora caching proxy.

What do you think?

@marmarek

Can it also let through non-apt traffic? Specifically I am wondering
about tb-updater.

That's interesting question - if you have apt-cacher-ng instance handy,
it worth a try. Anyway it has quite flexible configuration, so probably
doable.

I've read all the config, and tired, does not seem possible but never
mind as per my above suggestion.

@marmarek
Copy link
Member

marmarek commented May 7, 2016

It will require more resources (memory), somehow wasted when one use for example only Debian templates. But maybe it is possible to activate those services on demand (socket activation comes to my mind). It will be even easier for qrexec-based updates proxy.

@ghost
Copy link

ghost commented May 7, 2016

@adrelanos

Why do we need a generic all at once solution anyhow

I'm all for 100% caching success rate with a specific mechanism for each distro, but do Qubes developpers/contributors have time to develop/support that feature ?
If yes, that's cool ; otherwise, a solution like squid would be easy to implement, and since it's distro agnostic it will help not only the supported distros (fedora, debian, arch?), but also other distributions that users install in HVMs (even windows then). The problems/unknowns with squid are the cache miss rate, the cache disk usage in order to minimize those, and the use of different mirrors with yum (although I find out that I usually always connect to the same one).

@qjoo
Copy link

qjoo commented May 7, 2016

I'm using polipo proxy => tor to cache updates. I also modified the repo configuration to use one specific update server instead of dynamically selecting it. I'm planing to document my setup and will post a link here.

@kalkin
Copy link
Member

kalkin commented May 7, 2016

Just wanted to throw in https://github.com/yevmel/squid-rpm-cache I planned to setup a dedicated squid vm and use the above mentioned config/plugin to cache rpms, but never found the time for it.

The problems/unknowns with squid are the cache miss rate, the cache disk usage in order to minimize those, and the use of different mirrors with yum (although I find out that I usually always connect to the same one).

Currently i just use my NAS which has a "normal" squid running as caching proxy. I have an ansible script which generates me my templates. In the templates I replaced the metalink parameter with baseurl to the nearest Fedora mirror, in /etc/yum.repos.d/fedora.repo. In /etc/yum.conf I replaced the proxy option with my NAS proxy and allowed TempalteVMs to connect to it.

@marmarek
Copy link
Member

marmarek commented May 7, 2016

My experience with squid is horrible in terms of resources (RAM, I/O usage) for small setups. Looks like an overkill for just downloading updates from a few templates from time to time.

@adrelanos
Copy link
Member

I don't like saying this, but we should also consider making this an additional, non-default option or wontfix also. I like apt-cacher-ng very much and use it myself. However, introducing it by default into Qubes would lead to new issues, more users having issues with upgrading due to added technical complexity. There are corner cases where apt-cacher-ng introduces new issues, such as showing Hash Sum mismatch errors during apt-get update.

@ghost
Copy link

ghost commented May 10, 2016

@marmarek

FWIW I have squid installed on an embedded router (RB450g) for a 25+ people office and it's been running for literally ages without any problem. There's a strict bandwidth control (delay pools), which is usually the biggest offender in terms of resources, but squid's memory usage has constantly been < 20 Mo and highest CPU usage < 6%. Granted, the office's uplink speed is low - in the megabits/s range - but the resources available for updateVM are in another league compared to the embedded stuff and the setup - only caching - is not fancy.

tl;dr, squid is not as bad as it used to be years ago.

@adrelanos

The issues you mention reinforce my concern that it will be too time-consuming for Qubes devs to support distro-specific solutions. A simple generic one, even if not optimal is still better than nothing at all, rather than "wontfix".
Plus, users kalkin and qjoo seem to have a working solution, why not try those ?

just my 2c - not pushing for anything, you guys are doing a great work !

@andrewdavidwong
Copy link
Member Author

At the very least, we should provide some documentation (or suggestions or pointers in the documentation) regarding something like @taradiddles's router solution. Qubes users are more likely than the average Linux user to have multiple machines (in this case, virtual) downloading exactly the same updates.

@Rudd-O
Copy link

Rudd-O commented May 10, 2016

Looks like what you want is Squid with an adaptive disk cache size (for storing packages in the volatile /var/cache/squid directory), and configured with no memory cache. Since the config file can be in a different place and the unit file can be overridden to specify the Qubes specific config file, it may work very well for this purpose. Squid is goddamn good these days, and it supports regex-based filters (plus you can block methods other than GET, and you can support proxy caching FTP sites).

OTOH, it's always a security footprint issue to run a larger codebase for a cache. Also, Squid caching can be ineffective if multiple VMs download files from different mirrors (remember that the decision of which mirror to use is left practically at random to the VM calling onto the Squid proxy to do its job).

For those reasons, it may be wise to investigate solutions that do a better job of proxy caching using a content-addressable store, or matching file names.

@Rudd-O
Copy link

Rudd-O commented May 10, 2016

Perhaps a custom Go-based (to prevent security vulns) cache that can listen for requests using the net/http module, and proxy them to the VMs? This has potential to be a very efficient solution too, as a Go program would have a minuscule memory footprint.

@kalkin
Copy link
Member

kalkin commented May 11, 2016

@Rudd-O Have a look at this https://github.com/mojaves/yumreproxyd

@Rudd-O
Copy link

Rudd-O commented May 12, 2016

Looking. Note we need something like that for Debian as well.

@Rudd-O
Copy link

Rudd-O commented May 12, 2016

The code is not idiomatic Go and there are some warts there that I would fix before including it anywhere. Just as a small example on https://github.com/mojaves/yumreproxyd/blob/master/yumreproxy/yumreproxy.go#L33 you can see he is using a nil value as a sort of a bool. That is not correct -- the return type should be (bool, struct).

@Rudd-O
Copy link

Rudd-O commented May 12, 2016

https://github.com/mojaves/yumreproxyd/blob/master/yumreproxy/yumreproxy.go#L73 <- also problematic. TODO: path sanitization is not what you want in secure software.

But the BIGGEST problem, is that the program appears not to give a shit about concurrency. Save into cache and serve from cache can have a race, and no locking is performed, nor are channels being used there. Big fat red flag. The right way to do that by communicating with the Cache aspect of the application through channels -- send request to the Cache, await for response, if not available, then download file, send storage to the Cache, await for response.

Also, all content types returned are application/rpm. That's wrong in many cases.

BUT, that only means that project can be extended or rewritten, and it should not be very difficult to do so.

andrewdavidwong added a commit that referenced this issue May 31, 2016
@rustybird
Copy link

rustybird commented Jun 6, 2016

I just uploaded the Squid-based https://github.com/rustybird/qubes-updates-cache (posted to qubes-devel too)

andrewdavidwong added a commit that referenced this issue Jun 6, 2016
@rustybird
Copy link

The latest commit (-57 lines, woo) reworks qubes-updates-cache to act as a drop-in replacement for qubes-updates-proxy. No changes to the client templates are needed at all now.

andrewdavidwong added a commit that referenced this issue Jun 8, 2016
@fepitre
Copy link
Member

fepitre commented Jan 13, 2021

I can try to propose to resurrect it from Fedora side. I'm not sure to have the current bandwidth for but I could give it a try if it is worth to do it.

@fepitre
Copy link
Member

fepitre commented Jan 16, 2021

Just to let you know that I've updated apt-cacher-ng and as it was orphaned for 8+ weeks, I've requested a re-review: https://bugzilla.redhat.com/show_bug.cgi?id=1916884. On the Fedora devel list there is already one user which is pretty enthusiast about this resurrection. You can already test it by using my COPR repository fepitre/fedora. I'm currently using it with success on Fedora 32 AppVM.

@unman
Copy link
Member

unman commented Jan 16, 2021 via email

@fepitre
Copy link
Member

fepitre commented Jan 16, 2021

When you say "with success" do you mean "with success for Fedora"? Can you post your acng.conf file, because Fedora remains a PITA. Are you rewriting the sources files in the templates?

Yes success with Fedora 32. I'm currently running the build provided on my COPR repository. I'm using the almost default conf: https://gist.github.com/fepitre/fd490e04fe92bd023f77f0e03984b05c and I'm having success on caching Debian repositories. FYI, I'm not using the qubes-updates-cache. I've simply setup apt-cacher-ng into an AppVM then qvm-connect-tcp :aptcachervm:3142 from another Debian AppVM and use proxy setting to localhost:3142.

@unman
Copy link
Member

unman commented Jan 17, 2021 via email

@SvenSemmler
Copy link

@unman how would I check if fedora packages are cached? I am using apt-cacher-ng in a debian based qube based on your notes and it appears to work. The download speed of the packages in the fedora templates and the presence of the mirror sites as subdirectories in /var/cache make me think it works, but how can I be sure? ... and would that be unexpected? The discussion in this issue makes me doubt.

@unman
Copy link
Member

unman commented Jan 18, 2021 via email

@SvenSemmler
Copy link

SvenSemmler commented Jan 18, 2021 via email

@tlaurion
Copy link
Contributor

tlaurion commented Aug 16, 2022

I think apt-cache as bundled per @unman deserves way more attention.

Even more since shaker project (which salts otherwise complex use cases) including this caching proxy.

https://forum.qubes-os.org/t/simple-set-up-of-new-qubes-and-software/13064

https://qubes.3isec.org/tasks.html

@marmarek : @unman now releases the package under his own repo, alongside other spec files that deploys salt scripts and actually deploys them as post install scripts when packages are installed from his repo.

cacher is one if those packaged salt scripts.
This idea, and his qubes-task project, to install packages for specific use cases, deserves attention. @fepitre?

Spec files under main project https://github.com/unman/shaker/blob/main/cacher.spec

As referred in closed issue unman/shaker#5 (comment), what is missing, without qubes integration, is for the qubes updater to reapply the wildcard sls so that repositories entered as https are transformed be cached when applying updates on cloned templates on next run of qubes updater.

No problem on vanilla install if a user installs a repo and software on a single template with https links; it will pass. But a hook is missing from qubes updater so that on next update iteration, the links are transformed on templates, applying qubesctl prior of running updates: https://github.com/unman/shaker/blob/main/cacher/change_templates.sls

@marmarek: is there any chance of some collaboration happenning between qubes and shaker projects?

This idea (packaged salt recipes, applied at install) is a life changer.

And this issue (in rhe too 10 most commented opened issue) tells this cacher is needed by a lot of people. And now exists in just works (tm) mode.

https://qubes.3isec.org/rpm/r4.1/current/dom0/fc32/

@ben-grande
Copy link

I like the apt-cacher-ng approach in general. It sounds much cleaner on the architecture level than intercepting connections transparently, which can work only for non-TLS connections. Currently I see two issues with it, hopefully easy to solve:

  • requirement to modify repository config (different URL, especially for TLS)
  • risk of caching broken updates (normally when dnf downloads the package and it doesn't match the checksum, it retries from a different mirror - I guess with apt-cacher-ng it would simply re-download the same thing from the cache then...)

As for the repository config modification, it should be possible to do it in salt formula used for the update (update.qubes-vm). But there is still and issue with updating the config by package manager then - if we modify the file, any updates to the config shipped via deb/rpm will need to be applied manually (using .rpmnew or .dpkg-new files). This doesn't sound like a big issue, but still something to worry about - if we ignore it, we may risk using outdated repository configs. Perhaps we should have some check for that and issue a warning to the user like "manual config file update is required"?

How do you imagine this notification? Is this a package manager hook that runs at last and notifiy in DomU or the qube sends a feature request to Dom0?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bounty This issue has a public bounty associated with it. C: contrib package C: updates community dev This is being developed by a member of the community rather than a core Qubes developer. P: major Priority: major. Between "default" and "critical" in severity. S: partial Status: partial. Work on this issue is partially complete, but it is not actively being worked on. T: enhancement Type: enhancement. A new feature that does not yet exist or improvement of existing functionality.
Projects
None yet
Development

No branches or pull requests