Skip to content

Conversation

@edwintorok
Copy link
Contributor

@edwintorok edwintorok commented Mar 20, 2023

Draft, still need to make and test the specfiles changes and a chainbuild.
Explanations in commit messages.

Not vTPM specific, but the changes were made to make vtpm/etcd development easier, so test it out here first (and once it works we can push this part to master fairly quickly)

@edwintorok edwintorok changed the base branch from master to feature/vtpm March 20, 2023 18:33
@edwintorok
Copy link
Contributor Author

This depends on having Dune 3.7 in xs-opam, which I thought we did, but looks like that is waiting on the 4.14 branch to be merged:on this PR here

@robhoes
Copy link
Member

robhoes commented Mar 21, 2023

Several of the public libs probably needed to be so before Specsavers removed the barriers. Some may still be needed, so a chainbuild is definitely required to confirm that these changes are right.

@psafont
Copy link
Member

psafont commented Mar 21, 2023

This PR depends on xapi-project/xs-opam#634

@psafont
Copy link
Member

psafont commented Mar 22, 2023

Not sure why the ocaml tests fail, when I run the tests locally, they all pass

@psafont psafont force-pushed the private/edvint/maintenance branch 3 times, most recently from adbb0fd to a39e8ef Compare March 22, 2023 14:05
@psafont
Copy link
Member

psafont commented Mar 22, 2023

Fatal error: exception Unix.Unix_error(Unix.ECONNRESET, "read", "")
File "ocaml/forkexecd/test/dune", line 6, characters 0-130:
 6 | (rule
 7 |  (alias runtest)
 8 |  (package xapi-forkexecd)
 9 |  (deps fe_test.sh fe_test.exe ../src/fe_main.exe)
10 |  (action
11 |   (run ./fe_test.sh)))
(cd _build/default/ocaml/forkexecd/test && ./fe_test.sh)
Command exited with code 2.

@edwintorok
Copy link
Contributor Author

edwintorok commented Mar 23, 2023

pushed some changes to the script to output a backtrace on failure, and also tweaked how dune runtest works in the CI so we get the output from the failed tests separated from the successful ones.
I can't repro the failure locally either, although I did find a pre-existing bug in the wait for socket code, which I fixed.

@edwintorok edwintorok force-pushed the private/edvint/maintenance branch 5 times, most recently from e6c3c4d to 3f7900c Compare May 31, 2023 12:38
@edwintorok edwintorok changed the base branch from feature/vtpm to master May 31, 2023 17:21
@edwintorok edwintorok marked this pull request as ready for review May 31, 2023 17:21
@edwintorok edwintorok force-pushed the private/edvint/maintenance branch from 3f7900c to 9791840 Compare June 5, 2023 12:46
@edwintorok
Copy link
Contributor Author

There is an ECONNRESET error here from forkexecd, I'll take a look

@edwintorok
Copy link
Contributor Author

There is a race condition in 'fe_test.sh' (that is more apparent now that I reduced the sleeps, but the problem was there before): the test script only checks for the presence of the socket and then starts a client, but the server obviously has to create the socket first and then start listening on it, which leaves a short window where the socket exists, nothing listens on it and clients would get ECONNREFUSED.

This is usually fixed by systemd daemon notify (when run within systemd), or by the parent process existing only once it has 'listen()'-ed on the socket, but 'forkexecd' does neither.

@edwintorok
Copy link
Contributor Author

Pushed a fix for the forkexecd test race condition by creating a 2nd file after we know the server is listening, and we check for both the presence of the socket and this 2nd file in the testcase.

@psafont
Copy link
Member

psafont commented Jun 5, 2023

httplib and xml-light 2 can't be made private, they're used by v6d

@edwintorok
Copy link
Contributor Author

edwintorok commented Jun 5, 2023

Hmm, I got a successful build of v6d and I indeed dropped the commits that made those libraries private, looks like I force pushed an outdated branch here. Let me check again against the branch that got tested.
Ah yes I have 2 revertts on my tested branch, but they're after the merge commit:

commit 0431abde5717cfd0fa9d04b7040d141e2058462a
Author: Edwin Török <edvin.torok@citrix.com>
Date:   Wed May 31 12:15:45 2023 +0000

    Revert "[maintenance]: http-lib should not be a public library"

    This reverts commit 8d97ffb5f9c68faa626405815c9e33a35dc94bd4.

commit 788704013085109d478b969a433289689d98cc2b
Author: Edwin Török <edvin.torok@citrix.com>
Date:   Wed May 31 11:50:52 2023 +0000

    Revert "[maintenance]: make xml-light2 library internal"

    This reverts commit 08f46c64d256f2e72a1ee79e37282c30ed38c7f6.

Add a `-skip-xapi` flag that avoids connecting to XAPI.
Useful if you just want to query the list of tests,
or execute a specific test that relies on external programs
(and thus cannot be a unit test)
, but doesn't require XAPI.

This will be useful for running etcd tests with 'dune exec'.

Also add a '--' flag to be able to pass flags to alcotest directly.

E.g. `dune exec ./quicktest.exe -- -skip-xapi -- list`
The first '--' separates 'dune' arguments from 'quicktest.exe' arguments,
and the 2nd one separates legacy quicktest CLI arguments from alcotest
arguments.

To ensure this keeps working properly add a runtest that just list the tests.

Signed-off-by: Edwin Török <edwin.torok@cloud.com>
…perties

Duplicate keys are not valid, make it obvious in the type.

Fix quoting according to https://www.freedesktop.org/software/systemd/man/systemd.syntax.html#Quoting

Neither Filename.quote or String.escaped would produce the correct result.
E.g. Filename.quote on "'" would produce "'\''" which is not correct according
to the above.

No functional change since these arguments are not yet used anywhere,
and quoting is usually not needed.

Signed-off-by: Edwin Török <edwin.torok@cloud.com>
sleep does work with non-integer values.
Avoid sleeping a full second every time.
(it'd be better if this relied on inotify, but inotify-wait may not be
available)

Signed-off-by: Edwin Török <edwin.torok@cloud.com>
@edwintorok edwintorok force-pushed the private/edvint/maintenance branch from 69a6776 to 63708d0 Compare June 5, 2023 15:58
Turn it into an internal library: it is only used by executables.

Signed-off-by: Edwin Török <edwin.torok@cloud.com>
edwintorok and others added 14 commits June 5, 2023 17:17
They're all internal.

Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Signed-off-by: Edwin Török <edwin.torok@cloud.com>
With systemd it is not necessary to daemonize, we can run in the foreground
with service type 'simple' or 'exec' ('exec' does more validation).
This means we don't need to link the Unix module.

This reduces (stripped) binary size to more than half.

Signed-off-by: Edwin Török <edwin.torok@cloud.com>
This forces us to fully declare the dependencies of our code,
and not rely on libraries that are brought in only as transitive dependencies
of other libraries we happen to link to.

E.g. if our module A depends on library X, which itself depends on library Y,
then currently by linking X we also get Y linked and accessible from A
directly.
If code in module A uses both module X and Y *directly* then it needs to
declare a dependency on both when implicit transitive deps are off or it gets a
link failure (typically an error about a module or type being abstract).
If the code in module A only uses module X then no change is needed (X can
still use Y and the final executable will link both, it is just a question of
what is visible and callable from A directly).

This is especially useful when writing new code to get dependencies correct
from the beginning.

Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Reduces the amount of work the build has to do if we don't need to preprocess
everything, but only the few modules that actually using [@@deriving].

Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Server.ml takes a while to compile, but most unit tests don't actually need it.
Reorganize Api_server into Api_server+Api_server_common, where the latter
suffices for unit tests.

Signed-off-by: Edwin Török <edwin.torok@cloud.com>
This signature is completely unused.
We could instead generate a client.mli, but that is more complicated,
currently the client.mli it'd generate wouldn't be polymorphic enough.

Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Remove most sleeps, and reduce test duration from 5 seconds to 1.
(If we do want to run a performance test we can increase these again)

Run just a basic test for 0.1 seconds instead of a performance test for 5s by
default.
(can still be tweaked by overriding SECS)

Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Instead of every 5s. Speeds up testing, and may speed up startup somewhat.
And a connection try once every 0.5s won't create a lot of load on the system.
(If needed we could implement some form of exponential backoff here).

Signed-off-by: Edwin Török <edwin.torok@cloud.com>
E.g with 'ulimit -n' 524288 this test is very slow because forkexecd closes
all potentially open file descriptors.

The test assumes a max fds of at least 1024, so set that.

Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Signed-off-by: Edwin Török <edvin.torok@citrix.com>
@edwintorok edwintorok force-pushed the private/edvint/maintenance branch from 63708d0 to 869853c Compare June 5, 2023 16:18
@edwintorok
Copy link
Contributor Author

Hmm now we have ECONNRESET bugs in the forkexecd tests instead of ECONNREFUSED. I could move the forkexecd changes out to a separate pull request, but I'll take a look first on whether I can reproduce and track down what goes wrong here (would be good to make these tests more reliable).

@edwintorok edwintorok force-pushed the private/edvint/maintenance branch from 9bd50e1 to 23a4b44 Compare June 6, 2023 09:34
@edwintorok edwintorok marked this pull request as draft June 6, 2023 12:20
There is a race condition between 'bind' and 'listen' where the socket exists,
but is not yet accepting connections (this can be tested by artificially
inserting a sleep, or running many times until the race condition is hit like
in the CI).
Create a 2nd file after we called Unix.listen that guarantees that we are now
listening on the socket (we also unlink the file prior to recreating the socket).

Tested by artificially inserting a 'sleep 5' between bind+listen, which
previously caused ECONNECT in the testcase, and after this change it works.

Signed-off-by: Edwin Török <edwin.torok@cloud.com>
The CI kept failing with:
```
Fatal error: exception Unix.Unix_error(Unix.ECONNRESET, "read", "")
Raised by primitive operation at Unix.read in file "unix.ml", line 258, characters 7-33
Called from Xapi_stdext_unix__Unixext.really_read in file "lib/xapi-stdext-unix/unixext.ml", line 425, characters 12-37
Called from Fecomms.read_raw_rpc in file "ocaml/forkexecd/lib/fecomms.ml", line 28, characters 2-38
```

ECONNRESET is quite unusual for a 'read', the Linux manpage doesn't mention
that as a possibility but the POSIX manpage does.
Forkexecd (the server) writes a response and immediately closes the socket, but
the manpage of 'close' says that 'shutdown' should be used before closing a
socket.

Use 'shutdown' before 'close' to avoid clients getting an ECONNRESET when
reading the last reply on the socket.

Only use 'shutdown' in the child because the same socket is used both in the
child and in the parent, and expected to be kept open in the child after the
parent has closed it.

Signed-off-by: Edwin Török <edwin.torok@cloud.com>
@edwintorok edwintorok force-pushed the private/edvint/maintenance branch from 23a4b44 to f17dd60 Compare June 6, 2023 13:54
edwintorok added a commit to edwintorok/xen-api that referenced this pull request Jan 22, 2024
Use the simplest solution: copy *.opam to *.opam.template so that we get exactly the same opam files as before,
and only declare the names in dune-project.

This will allow us to merge subprojects that already use dune-project to generate opam files (such as xapi-stdext),
and we can gradually move opam files to be generated by dune.
However first we should drop the opam files that we do not need,
which will be partially done in xapi-project#4939

Signed-off-by: Edwin Török <edwin.torok@cloud.com>
@edwintorok edwintorok changed the base branch from master to feature/perf February 15, 2024 14:21
@edwintorok edwintorok changed the title Various build system and testing improvements CP-48838: Various build system and testing improvements Apr 15, 2024
@edwintorok
Copy link
Contributor Author

This needs to be rebased, and potentially the forkexec changes separated out if they expose pre-existing bugs.
For now closing the PR, but I'll try to get back to this fairly soon as it reduces build times and fixes a few other issues with the build/tests that improves developer efficiency.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants