Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build-azure-win2008r2-x64 / build-softlayer-win2012r2-x64 nightly curl failures #1169

Closed
M-Davies opened this issue Feb 24, 2020 · 23 comments
Closed
Assignees
Milestone

Comments

@M-Davies
Copy link

M-Davies commented Feb 24, 2020

origin	https://github.com/adoptopenjdk/openjdk-jdk8u.git (fetch)
origin	https://github.com/adoptopenjdk/openjdk-jdk8u.git (push)
jdk8
origin	https://github.com/adoptopenjdk/openjdk-jdk8u.git (fetch)
Resetting the git openjdk source repository at /tmp/openjdk-jdk8u-windows-x86-32-hotspot/workspace/build/src in 10 seconds...
Pulling latest changes from git openjdk source repository
error: RPC failed; curl 18 transfer closed with outstanding read data remaining
fatal: The remote end hung up unexpectedly
fatal: early EOF
fatal: index-pack failed

on build-azure-win2008r2-x64-2 and build-azure-win2008r2-x64-1.

@M-Davies
Copy link
Author

Also fails for jdk8/windows-x64/hotspot builds

@Willsparker
Copy link
Contributor

This appears to be a git issue : https://stackoverflow.com/questions/38618885/error-rpc-failed-curl-transfer-closed-with-outstanding-read-data-remaining

It'll may be due to a slow internet connection on the machine overrunning a default git timeout - that would explain the intermittent nature of it.

@M-Davies
Copy link
Author

Thanks @Willsparker . In that case, if we increased the git timeout settings on the machine, it might resolve the issue (the default timeout is around 5mins). https://www.git-scm.com/docs/git-config/1.7.8#git-config-httplowSpeedLimithttplowSpeedTime

@sxa555 do you have access to this machine to try increasing the timeout to 10mins?

@Willsparker
Copy link
Contributor

Alternatively, if we can reproduce the error outside of the build script, i.e. git clone https://github.com/adoptopenjdk/openjdk-jdk8u.git you can set these environment variables to get a better idea of what's breaking:

GIT_TRACE=1 GIT_CURL_VERBOSE=1

@Willsparker Willsparker self-assigned this Feb 25, 2020
@Willsparker
Copy link
Contributor

Right, I've set the http.lowspeedtime to 600 seconds (using git config --global https.lowSpeedTime 600). I timed the git clone https://github.com/adoptopenjdk/openjdk-jdk8u.git and it ended up being ~ 4 mins 15 seconds, which could end up being over 5 minutes with the overhead of running the whole scripts and Jenkins, however we'll see if this has fixed the issue - I was unable to reproduce it on the machine unfortunately.

@sxa sxa added this to the February 2020 milestone Feb 25, 2020
@Willsparker
Copy link
Contributor

Last night's build failed on -1 as well ... great. I'd like to first ensure the fix for -2 fixed the issue before putting it onto -1 as well.

@Willsparker
Copy link
Contributor

On the bright side, I've managed to recreate the issue...
at /tmp/openjdk-jdk8u-windows-x64-hotspot/workspace/src and running git pull .

The -v option on git pull doesn't add any additional info.
I've found that it only errors on git pull, not cloning the repo, so it may be worth removing the repos, and seeing if the new repo has the git pull error as well.

@Willsparker
Copy link
Contributor

Willsparker commented Feb 27, 2020

Okay, so, I've done the suggestion that many an online resource was suggesting and put both of the parameters https.postBuffer = 524288000 and http.postBuffer = 524288000 on it, and the git fetch works on both of them locally - as this is an intermittent issue, I don't know if this has officially fixed it.

If the issue carries on, I'll make sure to remove those configs as well, as I don't want to mess with default values if I can help it.

@Willsparker
Copy link
Contributor

Willsparker commented Mar 2, 2020

Unfortunately setting the environments haven't worked so I've unset the above variables.

Ref : adoptium/temurin-build#1236

I've rename the /tmp/openjdk-jdk8u-windows-x64-hotspot/workspace/build/src to ../src-1169 and cloned a new copy of the openjdk-jdk8u repo, on both machines. If that works, I'll delete the old copies of the repos later on.

@M-Davies
Copy link
Author

M-Davies commented Mar 3, 2020

@Willsparker FYI, build-azure-win2008r2-x64-1 needs the same fix as -2. It failed last night with the same issue https://ci.adoptopenjdk.net/view/Failing%20Builds/job/build-scripts/job/jobs/job/jdk8u/job/jdk8u-windows-x64-hotspot/509/

@M-Davies M-Davies changed the title build-azure-win2008r2-x64-2 nightly curl failure build-azure-win2008r2-x64 nightly curl failures Mar 3, 2020
@Willsparker
Copy link
Contributor

Willsparker commented Mar 3, 2020

I renamed the directories on both so I suppose that didn't fix it

@Willsparker
Copy link
Contributor

https://ci.adoptopenjdk.net/job/build-scripts/job/jobs/job/jdk8u/job/jdk8u-windows-x64-hotspot/514/

Upgrading Git from 2.17 to 2.21 has fixed the issue 👍

@karianna karianna modified the milestones: February 2020, March 2020 Mar 6, 2020
@M-Davies
Copy link
Author

M-Davies commented Mar 6, 2020

Thanks @Willsparker ❤️

@Willsparker
Copy link
Contributor

@M-Davies Sorry... Issue's recurring. Here's what I can gather:

  • build-azure-win2008r2-x64-1 appears to only be failing when running the jdk8u-windows-x86-32-hotspot job, not the jdk8u-windows-x64-hotspot job. I've already removed the x86_32 repo.

  • build-azure-win2008r2-x64-2 seems to not be connecting to the Jenkins agent.

@Willsparker Willsparker reopened this Mar 9, 2020
@M-Davies
Copy link
Author

M-Davies commented Mar 9, 2020

  • build-azure-win2008r2-x64-1 appears to only be failing when running the jdk8u-windows-x86-32-hotspot job, not the jdk8u-windows-x64-hotspot job. I've already removed the x86_32 repo.

It appears to pull the repository fine within grinders https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/2465/console. Since you removed the x86_32 repo, I take it it was fetching not pulling?

  • build-azure-win2008r2-x64-2 seems to not be connecting to the Jenkins agent.

That must be a recent problem. It connected fine last week https://ci.adoptopenjdk.net/job/build-scripts/job/jobs/job/jdk8u/job/jdk8u-windows-x86-32-hotspot/498/console

@Willsparker
Copy link
Contributor

Connection issue is fixed - the Jenkins service didn't want to restart. Disabling and enabling the machine on ci.adoptopenjdk.net and then starting the service got it going again.

The build scripts say it's fetching, but both git fetch and git pull within a failing repository will error with the same issue.

@karianna
Copy link
Contributor

@Willsparker - so this can be closed now?

@Willsparker
Copy link
Contributor

Sadly not - I've been looking at using git-for-windows instead of Cygwin's Git for the machines instead, however that causes it's own problems :

ERROR: Error fetching remote repo 'origin'
hudson.plugins.git.GitException: Failed to fetch from https://github.com/AdoptOpenJDK/openjdk-build.git
	at hudson.plugins.git.GitSCM.fetchFrom(GitSCM.java:909)
	at hudson.plugins.git.GitSCM.retrieveChanges(GitSCM.java:1131)
	at hudson.plugins.git.GitSCM.checkout(GitSCM.java:1167)

@Willsparker
Copy link
Contributor

https://ci.adoptopenjdk.net/job/build-scripts/job/jobs/job/jdk8u/job/jdk8u-windows-x64-hotspot/527/
https://ci.adoptopenjdk.net/job/build-scripts/job/jobs/job/jdk8u/job/jdk8u-windows-x64-hotspot/528/

It seems the the hotspot-x64 repository has worked a couple of times in a row - I'll keep an eye on it. The hotspot - x86_32 repository however has still been consistently failing.

@Willsparker Willsparker changed the title build-azure-win2008r2-x64 nightly curl failures build-azure-win2008r2-x64 / build-softlayer-win2012r2-x64 nightly curl failures Mar 30, 2020
@sxa
Copy link
Member

sxa commented Mar 31, 2020

Can you summarise the current situation on this one please @Willsparker as we need this to be reliable for the quarterly release in a couple of weeks - can it be replicated easily or is it still random enough that it can't be progressed, and do we have any solution or any other ideas to progress this?

@Willsparker
Copy link
Contributor

Okay:
The build-azure-win2008r2-x64 machines are currently affected, however they're soon to be replaced so I'm not so concerned about those. The concerning issue is build-softlayer-win2012r2-x64 was displaying this issue intermittently- I just checked and it appears this doesn't seem to be the case anymore. I wasn't able to recreate it locally on that machine, but on the 2008 machines, it occured when running git fetch / git fetch --tags, or git pull on the sources repos at /tmp/openjdk-jdk8u-windows-*/workspace/build/src
So whilst nothing has been done specifically to fix it, it hasn't affected us in the last few days given adoptium/temurin-build#1638 has been merged. I'm going to close this issue on that basis, however, if this affects us in the future I can see 2 easy-ish workarounds:

  1. Change the build scripts to use ssh to run the builds ( ref: https://adoptopenjdk.slack.com/archives/C53GHCXL4/p1585572400036400?thread_ts=1585568882.031500&cid=C53GHCXL4 ), and change the build machines (and playbooks) to accommodate this.
  2. Change the build scripts to reclone the source repositories at the beginning of each build (ref: https://github.com/AdoptOpenJDK/openjdk-build/issues/1641 ). In my experience the repos in question wouldn't fail when being re-cloned, however I couldn't tell you why.

Neither of these will fix the issue, as I believe the issue is due to networking issues with the boxes, according to the Stack Overflow entries mentioned earlier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants