Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Packer waits exactly 1hr, and during this period there are no debugging logs emitted, then build starts and succeeds. #10981

Open
adamhorden opened this issue Apr 29, 2021 · 8 comments
Assignees
Labels
bug hcl2-dag sync to jira For issues that need to be imported to Packer internal JIRA backlog

Comments

@adamhorden
Copy link

adamhorden commented Apr 29, 2021

Packer Version

packer -v
1.7.3

Built from source.

We have called this bug HeisenBug 🐛 internally as Packer will fail randomly, where other sequent runs on CI, will build an image successfully. We built this HCL module and it worked perfectly with v1.6.6 but fails on v1.7.0. It is been called from Terraform. I have moved to using packer init.

The issue here is time taken.

2021/04/28 23:56:13
2021/04/29 00:51:42

What is Packer doing during this time? This is on CICD and we noticed builds just waiting around an hour.

Log is here:

https://gist.github.com/adamhorden/95ac55c86621d0f061f5f36c798e911c

(Please note this log is from a debugging build, but waiting 1hr before getting a failure with configuration is less than ideal, but this was the only log file I have to hand, but they all wait exactly 1hr)

Nothing is emitted in the debug logs during this time, and eventually we do get a build. What I do not understand is why Packer is waiting this amount of time? This happens with multiple providers, I have tested with qemu and vsphere-iso.

Hopefully the log above will help.

As per:

#10818

I am using the same set of HCL for testing.

Thanks for all of the hard work that has gone into brining HCL into packer 😁 .

@adamhorden adamhorden added the bug label Apr 29, 2021
@SwampDragons
Copy link
Contributor

Can you share your template?

@SwampDragons
Copy link
Contributor

@pearkes pearkes assigned pearkes and SwampDragons and unassigned pearkes May 12, 2021
@SwampDragons
Copy link
Contributor

This template repository is really hard to read; the logs show that the build is freezing after trying to launch the vsphere-iso builder but I can't find a vsphere source in your template directory. It looks like it's failing somewhere in the Prepare but given that I can't figure out how the source is being configured it's hard for me to come up with a repro case.

Can you please share a simple source config for the vsphere builder that is able to reproduce the issue?

@sylviamoss any ideas here?

@sylviamoss
Copy link
Contributor

@adamhorden does this happens for every build?
I'm with @SwampDragons here and I'd like to see a simplified template that reproduces the issue. I can't reproduce it with the vsphere template I have.

@sylviamoss
Copy link
Contributor

So, Packer takes rounds and rounds to evaluate all of the locals that depended on another local. I noticed you have a very complex locals configuration where some locals dependent on one or more other locals.
I added some logs to the locals evaluation to see if packer gets "stuck" there. Could you run the build again with PACKER_LOG=1 set using the packer binaries from the link below?

https://app.circleci.com/pipelines/github/hashicorp/packer/10240/workflows/a18cc279-39a9-4416-9550-4fdeec68789c/jobs/124365/artifacts

@adamhorden
Copy link
Author

adamhorden commented May 18, 2021

Thanks for all the help with this 😁 .

I have a smaller reproducible test case I will commit to help debugging. This happens with any Builder, this be Amazon, qemu, vSphere, so is not dependant on say vSphere. I tried to show that case in my original HCL here by using Amazon and qemu:

https://github.com/hordenengineering/hashicorp_packer_bug_001

My feeling this is due to local expansion as @sylviamoss has pointed out. This is complex as it allows a nice integration for us with Terraform by passing around a complex data structure. This allows us to have a nice mapping with what Terraform manages for running a build and what Packer needs by using this data structure. We do the same in Terraform and my thinking was if Packer supports HCL now, I should be able to use the same approach.

This does not happen on all builds. It happens at random. During this 1Hr wait we noticed CICD eating as much CPU and memory as possible. We call this HeisenBug 🐛 internally as Packer will fail randomly but sequent runs of Packer will produce a build.

I will run a build with:

https://app.circleci.com/pipelines/github/hashicorp/packer/10240/workflows/a18cc279-39a9-4416-9550-4fdeec68789c/jobs/124365/artifacts

shortly, to help debugging.

Thanks again for all the hard work with Packer.

Adam Horden

@ghost ghost removed stage/waiting-reply labels May 18, 2021
@sylviamoss
Copy link
Contributor

Hey there, after discussing internally with the team we came to a conclusion and hopefully that will help you.

There's not much we can do right now to improve the performance when evaluating the locals block. HCL reads the block as a map, which is not ordered, and that explains why this is a Heisenbug. Terraform handles the same configuration perfectly because they have a dependency graph.

While Packer doesn't have a dependency graph, I suggest you write your locals using the singular local block instead. You should be able to write the same configuration, but in this case, Packer will read the variable as a single block into an ordered slice, respecting the order of your configuration.

Could you try this out and let me know the results?

Thanks for your patience!

@nywilken nywilken added sync to jira For issues that need to be imported to Packer internal JIRA backlog and removed stage/waiting-reply labels Sep 30, 2022
@github-actions
Copy link

This issue has been synced to JIRA for planning.

JIRA ID: HPR-753

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug hcl2-dag sync to jira For issues that need to be imported to Packer internal JIRA backlog
Projects
None yet
Development

No branches or pull requests

7 participants
@adamhorden @pearkes @SwampDragons @azr @nywilken @sylviamoss and others