Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Capacity) Agent upgrade hangs more than 1 hour (makeself chown -R on agent_storage) #1815

Closed
guymguym opened this issue Aug 23, 2016 · 7 comments · Fixed by #3421
Closed
Assignees
Labels
Comp-Capacity High system capacity & capacity reporting Severity 2 Product is usable but severely limited. Task is unable to operate or caused other sw to fail Type:Bug

Comments

@guymguym
Copy link
Member

Environment info

  • Version: 0.5.0-f6eff7a
  • Deployment: GCloud
  • Customer: QA

Actual behaviour

  1. Upgrade agent with 1.3TB in /usr/local/noobaa/agent_storage
  2. After unpacking the files the upgrade got stuck running recursive chown on the entire /var/local/noobaa that includes agent_storage:
[root@agent-instance-for-146-148-102-481469438071228 ~]# ps -ef | grep 2130
root      2130  1911  0 18:36 ?        00:00:00 /bin/sh ./noobaa-installer --keep --target /usr/local/noobaa
root      2132  2130  4 18:36 ?        00:01:38 chown -R 0 .

Here is the explanation from the docs https://github.com/megastep/makeself

--nochown : 
  By default, a "chown -R" command is run on the target directory after extraction,
  so that all files belong to the current user. 
  This is mostly needed if you are running as root,
  as tar will then try to recreate the initial user ownerships.
  You may disable this behavior with this flag.

Expected behavior

  1. Upgrade should be as fast as possible

Steps to reproduce

  1. Upgrade capacity system

Screenshots or Logs or other output that would be helpful

(If large, please upload as attachment)

@guymguym
Copy link
Member Author

BTW It actually called chown -R 0 tat took ~ 1 hour,
and then also chgrp -R 0 which took another ~ 1 hour.
Not sure what is the effect of using --nochown flag with makeself.

I think I would prefer to make a more robust fix - such as moving the agent_storage dir away from the code location to the root of the mount point, so for the / mount point that should be /.noobaa_storage so that it will not interfere with any software upgrades.

We also seem to only extract the new package and overwrite the existing files, but files that were removed in the package will not cleanup. Upgrading the agent should be done by extracting to a new folder and then replacing the old folder completely, just like in the server.

@nimrod-becker nimrod-becker added this to the 0.5.1 milestone Aug 23, 2016
@nimrod-becker nimrod-becker modified the milestones: 0.5.1, 0.5.2 Sep 22, 2016
@nimrod-becker nimrod-becker added Comp-Capacity High system capacity & capacity reporting GA-Dev labels Oct 9, 2016
@nimrod-becker nimrod-becker modified the milestones: 0.5.4, 0.6.1 Oct 31, 2016
@tamireran
Copy link
Contributor

@guymguym do we know why did it take so much time?

@tamireran
Copy link
Contributor

@dannyzaken check this one, we may find it reasonable to use the nochown megastep/makeself#66

@guymguym
Copy link
Member Author

@tamireran
It takes a somewhat reasonable time to go over 3 million files and do chown on them.
For comparison in #1650 just stat took 3 minutes, so that's a ratio of 20x between chown and stat.
chown is rather expensive as it needs to read and update all these inodes and write them to the disk and is pretty much serial, while stat can fetch a lot of inodes together from the disk and cache them.

I still don't think that the right solution is to use --nochown.

I think we should move the agent_storage to the root of the drive, like we do on any other drive.

@tamireran
Copy link
Contributor

@guymguym both sounds reasonable solutions. nochown sounds like an immediate solution

@guymguym
Copy link
Member Author

True.
But notice that our agent upgrade is only overriding files.
So it gets dirty as time goes by - see #2040 - which is a ticking bomb.
Moving the agent_storage to drive root will allow to simply perform the agent upgrade by extracting to temp folder and then rename to change new code folder to current.

@tamireran
Copy link
Contributor

+1

@nimrod-becker nimrod-becker assigned guymguym and unassigned dannyzaken Dec 26, 2016
@nimrod-becker nimrod-becker modified the milestones: 0.7, GA Jan 4, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Comp-Capacity High system capacity & capacity reporting Severity 2 Product is usable but severely limited. Task is unable to operate or caused other sw to fail Type:Bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants