-
Notifications
You must be signed in to change notification settings - Fork 199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ansible fails with too many open files on Fedora #280
Comments
I guess this is happening on the target machine? If you set "mitogen_task_isolation=fork" I presume it will abate. Just setting up a CentOS VM now. Would love an example task that lead to ^ and e.g. how many times it was executed. |
It looks like either Ansible's dnf.py isn't calling a finalization method of dnf, or dnf isn't calling a finalization method of the underlying rpm library. DNF doesn't appear to directly manipulate those solvx files, it's (seemingly?) handled only by the RPM library. |
Ansible dnf.py is definitely missing a call to dnf.Base.close(), it's relying on del to call it, which is wrong to begin with, but easily defeated if /anything/ else holds a reference to the Base instance. 99% sure mitogen_task_isolation=fork on all your dnf tasks will cure this, but that's not a real fix |
Yeah, it happens on the target machine. I suspect something like:
Would do the trick to reproduce it. I suspect we're initializing a new dnf state of some kind each time, and getting a new set of the above open files. Target machines are Fedora 27 and 28. Worth reporting to ansible, or do they not care about module cleanup at this point? |
Okay, I added a base.close() in dnf.py before every module.{exit,fail}_json() call and that helps. Still leaking some file descriptors, but not as bad:
lots of modules involved above... |
hawkey.log looks like it belongs to https://github.com/rpm-software-management/hawkey which is a binding for managing .solv files. Looks like DNF itself might be a little suspect |
Filed ansible/ansible#41810 |
Looking at I'm going to have another pass to find where those references leak, but even if we find them and can work around it, there is no promise they won't sneak back later -- this library's design is somewhat lacking in places. A real fix to this likely requires API changes in hawkey, meanwhile forking will need to be enabled automatically for existing deployed versions when this module is executed. |
You've dug much deeper than I was willing to. FWIW - I've filed https://bugzilla.redhat.com/show_bug.cgi?id=1594016 against dnf if you want to pass along your findings. |
It just generates far too much spam, and its final decision is obvious since a followup load_module() will exist for positive matches.
Thanks for all your help with this! As it impacts existing versions of Ansible, the solution I've committed will be needed until the relevant versions of hawkey and DNF disappear (i.e. circa 2028 ;)). It's the first case where always-fork is definitely required, been expecting it for some time. |
adding new server configration files for El Salvador
Not sure where the ultimate fault lies - possibly in ansilble or dnf - but because mitogen produces a long running process unclosed file descriptors build up. I'm seeing lots of:
Now to try to figure out a simple reproducer...
The text was updated successfully, but these errors were encountered: