Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

General CI discussion #707

Open
real-or-random opened this issue Jan 6, 2020 · 27 comments
Open

General CI discussion #707

real-or-random opened this issue Jan 6, 2020 · 27 comments

Comments

@real-or-random
Copy link
Contributor

real-or-random commented Jan 6, 2020

When thinking about #705 I realized it's maybe a good idea to take a step back and look at this in the context of a general CI revamp. There are multiple things that would be nice and I think some of these probably have a higher priority than coverage reports:

Is there anything else you would like to see?

If we want to work on this, this is a good time to think about the CI platform. I don't mind Travis and it's reasonably stable in the past months but if we think that a different (or additional) provider is a better fit, it's better to discuss this before we start working on these issues.

By the way, Bitcoin is currently looking into GitHub Actions for Windows at least:
bitcoin/bitcoin#17803

@gmaxwell
Copy link
Contributor

gmaxwell commented Jan 7, 2020

I think BE is basically solved in travis since there is s390 (lol). It sounds like the arm is arm64 which is useful but less interesting than 32bit arm.

The big gaps I see in travis:

  • You cannot usefully benchmark, so you can't watch for performance regressions-- the systems it runs on are wildly different in performance and wildly inconsistent in performance because they're VPSes which are concurrently running other loads.

  • Getting work product output not supported (seems like a workaround would be possible), meaning you can't get CI coverage analysis, or static analysis reports... except via integrating with other services which creates an unnecessary disparity between the free software tools the developers can use locally and the (usually commercial) services that the project ends up beholden to.

  • Travis CPU time is extremely limited (the fact that the travis executors are slow disguises how limited it is, e.g. it takes minutes for what would be tens of seconds locally). This means that stuff like automated fuzz testing at a scale which is likely to find interesting bugs is essentially off the table, at least as part of CI tests. This also clearly distorts testing priorities, with stuff like not using valgrind or cutting down tests just to fit under travis' limits.

To me the obvious thing to do is to run a jenkins instance. Jenkins is extremely simple to operate (in my experience much easier to use than travis, because you're less subject to trial and error-- particularly because if you're the group running it you can just ssh in and try the tests locally until they work). Jenkins supports remote executors (and you can confine jobs to only run on particular executors), so its trivial to have an executor that runs on dedicated hardware that allows benchmarking... or on fast systems.

The big challenge I see is that the travis ecosystem has created an expectation that PRs from random people will get CI run against them. And having random people's code run on your own hosts/networks is super scary. One option would be to just run the jenkins on PRs by whitelisted people and on merged commits (this is apparently a standard feature of the jenkins PR integration module, which I've never used-- my experience has been w/ just running it on merged commits). I can offer to run some executors (and can do arm7, big endian systems, whatever) on an isolated network that just vpns back to the jenkins install and has no network access. But the limitation there is that if I get hit by a bus you might have issues finding someone else willing to volunteer a half rack of assorted old hardware.

An extension of the "cant benchmark" is that the library really should have regression tests for sidechannel behaviour. Right now someone could flip a compiler flag, and the cmovs some being constant and you get an enormous sidechannel and no one would notice until it shows up in an attack (or hopefully, attack paper). There are a couple different ways of dealing with that which would allow CI integration of a sidechannel test, but most of them require being able to use test on bizarre hardware (e.g. with a jenkins remote executor). The one option that wouldn't would be running tests in a cycle accurate cpu simulator-- essentially like running in valgrind but could get cycle counts and dram access traces as a side effect which could be compared across different secrets to be sure they were constant, but when I went down that path a couple years ago none that I could find simulated dram. @tdaede might have more recent suggestions.

@gmaxwell
Copy link
Contributor

gmaxwell commented Feb 4, 2020

Travis hiding the output of ./tests failing really stinks, did it used to do that?

@real-or-random
Copy link
Contributor Author

We cat the log files in the end but you need to "unfold" output by clicking the arrow.
https://travis-ci.org/bitcoin-core/secp256k1/jobs/641332511
Or are talking about something else?

@gmaxwell
Copy link
Contributor

gmaxwell commented Feb 4, 2020

Ah, I missed the 'unfold' Thanks.

@fanquake
Copy link
Member

fanquake commented Feb 4, 2020

By the way, Bitcoin is currently looking into GitHub Actions for Windows at least:
bitcoin/bitcoin#17803

FYI. We've removed the GitHub Actions workflow from our repository. See discussion in that issue and bitcoin/bitcoin#18031.

@gmaxwell
Copy link
Contributor

The CI should probably do a build on a system without libgmp to catch accidental header dependencies.

The CI should probably have a makefileless build (e.g. just a big GCC commandline that sets all the required defines)

@real-or-random
Copy link
Contributor Author

related: bitcoin/bitcoin#18659

@real-or-random
Copy link
Contributor Author

We may want to look into Azure Pipelines: https://azure.microsoft.com/de-de/services/devops/pipelines/

@real-or-random
Copy link
Contributor Author

We have the ubsan sanitizer now on Travis (#839). Has anyone tried other sanitizers that we may want to enable?

@sipa
Copy link
Contributor

sipa commented Nov 4, 2020

asan/lsan is generally as easy as ubsan to enable (and they can all be enabled simultaneously) though has a higher performance impact. It can catch out-of-bound accesses (also on the stack), and memory leaks.

tsan isn't useful for us as there is no threading.

msan is a pain to integrate (as it requires building libc and other dependencies with msan as well), and we already have valgrind which largely detects the same issues.

@elichai
Copy link
Contributor

elichai commented Nov 4, 2020

msan is a pain to integrate (as it requires building libc and other dependencies with msan as well), and we already have valgrind which largely detects the same issues.

This isn't really correct for C, msan contains shims for the common libc uses, so there shouldn't be a need to rebuild libc to run msan here.

@real-or-random
Copy link
Contributor Author

real-or-random commented Nov 11, 2020

Travis is effectively shutting down for open source:
bitcoin/bitcoin#17802

Options:

  • Be lazy and pay them: but I don't think if we're happy with Travis in general)
  • migrate to Cirrus CI: Cirrus would work. We need to move s390x to qemu but native s390x is unreliable on Travis anyway
  • migrate to Shippable: I was playing around with this a few weeks ago (https://app.shippable.com/github/real-or-random/secp256k1/dashboard) This looked good on first glance and builds are fast. But it seems that the project is dead soon (no activity in their github repo, not even activity or replies on twitter). So I don't think we'll be happy with this for long.
  • do our own thing (e.g., Jenkins run by @gmaxwell )
  • look into other CIs, e.g., Azure Pipelines

Non-options:

edit: Azure is not good either:
Screenshot from 2020-11-11 11-39-24

@gmaxwell
Copy link
Contributor

Any public "free" solution is going to either be short lived or extremely stingy with cpu time for obvious economic reasons-- or, likely, both. Any "custom rolled" solution has the life limits from their sponsors not having time for maintenance, hardware failures, etc. but presumably wouldn't be cpu resources limited. Custom rolled can also guarantee uncontested access to systems, allowing for real benchmarking.

I suggest regardless: All testing should be setup as generic shell scripts (and/or python if really needed) that could be invoked from any CI system-- they're pretty much there already except for the travis matrixing. Then it becomes easier to move things around.

The project isn't just limited to just using one thing. If some free hosted thing works then it would probably good to run it in parallel with whatever else gets setup.

[I would already have a jenkins setup running but for ongoing power usage limitations here which are finally almost resolved.]

We need to move s390x to qemu but native s390x is unreliable on Travis anyway

I don't think having s390x is particularly important, esp since it's slightly broken for reasons out of our hands-- but having a BE target is nice. Debian on MIPS BE (with valgrind too) and Sparc run fine in qemu-system though both quite slow, and I assume BE PPC works but I haven't tried it. ...and each of these is way more important than s390x: They're all found in really cool spacecraft. I don't know where s390x machines are found, but certainly not in space. Armonk, New York, probably. Space is much better than Armonk: For one thing, space doesn't have any computer crackers.

@real-or-random
Copy link
Contributor Author

I don't think having s390x is particularly important, esp since it's slightly broken for reasons out of our hands-- but having a BE target is nice. Debian on MIPS BE (with valgrind too) and Sparc run fine in qemu-system though both quite slow, and I assume BE PPC works but I haven't tried it. ...and each of these is way more important than s390x:

Yes, sure. I don't care about s390x in particular. We should just have some BE architecture. If it's just for BE testing, we should probably just take the one which is fastest on qemu. Of course, having valgrind is better than not having it, and the probability that the Sparc build is relevant for some users is larger than for s390x.

@gmaxwell
Copy link
Contributor

Well I know mips with valgrind works, and I successfully got debian in sparc working but there is no valgrind. (which is too bad, sparc otherwise tends to expose some issues due to register windowing).

@real-or-random
Copy link
Contributor Author

The project isn't just limited to just using one thing. If some free hosted thing works then it would probably good to run it in parallel with whatever else gets setup.

Indeed.

More on the free thing:
I changed my mind on Github Actions. I think it's acceptable, see bitcoin/bitcoin#17803 (comment). But Marko seems to disagree and he has some valid points.

I think Cirrus CI is the best free/"cloud" option for now.

@jonasschnelli
Copy link
Contributor

There are plans to increase the stability and performance (as well as supported architectures) on bitcoinbuilds.org. There is also a GitHub integration (running only here for now). The CI is completely custom built and easy to extend. I'm currently trying to setup a team of devs/admins.

Ideally it would support running tests on hardware provided by authenticated developers/users.
Probably not what you are looking for today but could be something for the long run.

@real-or-random
Copy link
Contributor Author

Probably not what you are looking for today but could be something for the long run.

Thanks for letting us know. This sounds great and is certainly an option for the long run.

@apoelstra
Copy link
Contributor

I am working on building a new x86 system with many cores and am happy to replicate any @gmaxwell script infrastructure, to reduce the bus number. But I don't have any oddball hardware, nor do I have plans to amass any.

@JeremyRand
Copy link

edit: Azure is not good either:

@real-or-random The Azure docs claim that it can be used with generic Git repos (not requiring GitHub). Would that mode be suitable? I assume it wouldn't demand problematic GitHub permissions since it has no knowledge of GitHub in that mode, but I may be missing something.

@kroggen
Copy link

kroggen commented Dec 16, 2020

This is how I am building my fork (with VRF support) for Windows using GitHub Actions and Visual C: here

@real-or-random
Copy link
Contributor Author

I'm heavily working on Cirrus CI integration in https://github.com/real-or-random/secp256k1/tree/202012-cirrusci . It looks promising so far but I want to continue to hack on this a little bit and clean it up before I open a PR here. (And anyway, a PR wouldn't be build currently because Cirrus is not yet enabled.)

@real-or-random
Copy link
Contributor Author

notes from irc:

20:38 <real_or_random> I didn't know it's so easy to run tests in valgrind: make check LOG_COMPILER=valgrind
20:38 <real_or_random> or if we add benchmarks, this would also work: make check LOG_COMPILER='./libtool --mode=execute valgrind'
20:39 <real_or_random> nice thing: it runs exactly like "make check", e.g., it uses multiple threads
20:41 <real_or_random> I believe this will be very helpful for simplifying the ci shell script

@real-or-random
Copy link
Contributor Author

real-or-random commented Mar 8, 2021

By the way, I still don't the like way the results are shown in Cirrus. We run make check which runs ./tests and ./exhaustive_tests in parallel and write their outputs in log files. That means, when ./tests fails, the make check thing is red and then you click on it but you don't see any real error message. Instead, the real error is in a green section below. (See downstream example: https://cirrus-ci.com/task/6253397255913472?command=cat_tests_log#L4) Not a huge deal but simply not great for people who don't know what's going on.

We could simply run the tests without make check but then we don't test the automake part of make check. Maybe we can grep the log files for PASS or FAIL and make the "cat" script fail depending on this.

@real-or-random
Copy link
Contributor Author

Maybe the clang static analyzer (scan-build) is a nice tool we could include in the CI: #959 (comment)

@real-or-random
Copy link
Contributor Author

Cirrus CI now has ARM64 hardware, see cirruslabs/cirrus-ci-docs#905 and bitcoin/bitcoin#22749

@real-or-random
Copy link
Contributor Author

We could add a few more jobs with -O3 to increase the coverage of CI tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants