-
-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add SecureDrop installation test #25
base: main
Are you sure you want to change the base?
Conversation
tests/securedrop/install.pm
Outdated
assert_and_click("menu-vm-xterm"); | ||
|
||
|
||
assert_script_run('gpg --keyserver hkps://keys.openpgp.org --recv-key "2359 E653 8C06 13E6 5295 5E6C 188E DD3B 7B22 E6A3"'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert_script_run
depends on seeing serial console output - serial console from "work" VM isn't directly connected to the one of the host; for this to work you either need to run something like tail -F /var/log/xen/console/guest-work.log >> /dev/hvc0
in dom0 (we do that here), or do all that from dom0's terminal via qvm-run
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Would type_string and then "ret" work as well? I'm trying not to deviate to much from the original instructions so it's easy to update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that would work, but your test wouldn't detect if any of those command fails (other than possible some later step dom0 in dom0 failing).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point 😔. I'll just go ahead an use qvm-run
, then.
3502fd7
to
2008e3f
Compare
Hint: add |
5c8b79c
to
3a2149a
Compare
Thanks for the tip. I had seen that in some places and was wondering about its purpose. I'll add it in the next round. |
3a2149a
to
deebce7
Compare
tests/securedrop/install.pm
Outdated
assert_script_run('curl https://raw.githubusercontent.com/freedomofpress/securedrop/d91dc67/securedrop/tests/files/test_journalist_key.sec.no_passphrase | sudo tee /usr/share/securedrop-workstation-dom0-config/sd-journalist.sec'); | ||
assert_script_run('sdw-admin --validate'); | ||
|
||
assert_script_run('xfce4-power-manager -q'); # disable screen blanking during long command |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@marmarek there's a command which takes quite a while and in the meantime the screen blanks. I don't think it's xscreensaver because I think that's killed at the beginning of the test. Then I tried to disable XFCE's power management, but didn't help.
Have you encountered this before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My notes have this line:
x11_start_program('env xset s off', valid => 0);
but I'm not sure if that was enough either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I had to combine it with env xset -dpms
for this to fully work.
And FYI I noticed that just with env xset s off
it still blanked for a lot of the slow command (sdw-admin --apply
), but oddly enough the screen showed up just the logs upload command (video). No idea what went on there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It unblanked on the key press.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh! I totally forgot that it was literally typing each letter. That's why, then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recall that the above options were still not working perfectly (the screeen was still bllanking at some point). What seems to have solved it is enabling presentation mode. I haven't look at what it's doing under the hood. But it seems to work. And because the setting is persistent, I think it shouldn't need anymore all the xscreensaver exits.
Anyway:
So, longer timeout? This is running virtualized, so runs slower than native. And also, I recommend collecting and uploading logs. For example wrap it with |
bf7f90e
to
fb294a2
Compare
Fair point. I have added some timeout. Now I am running into another issue. I have created a needle through the web interface added for this step an |
Have you restarted the test after adding the needle? Or did you added it via developer mode? |
I thought I had restarted it afterwards. But will try again. It for sure wasn't via developer mode. Let's see if it now finds the needle. |
I see the issue: you haven't added the |
OK. Makes sense. I was afraid to create new tags. Where can I edit the needle? Or should I create a new one? |
For this one I just edited it manually. |
fb294a2
to
ff78699
Compare
de42ab9
to
4860126
Compare
I see you're increasing timeout for upload_logs, are you uploading some truly huge logs? Otherwise, I think hitting a timeout there is a symptom of some other issue... |
eafce98
to
752f9d7
Compare
I think so as well. I was trying both approaches: investigating the conditions under which the slow upload happens and trying to see the limits of the timeout. So far I've found that uploading packages at the beginning of the tests does work out just fine. However, when it runs at the end of everything the upload speed is very slow and it times out. And I don't think the issue is with the new templates having so many new packages because it even fails while uploading dom0. I think at this point I'll need your help figuring this out. I'm also fine with just adding a One thing that could triggering this is that I'm running |
It shouldn't harm, it just defines Is there some sys-net change? Idk, some firewall, or maybe some weird routing everything via tor? Or maybe it fails on reading the input file for some reason? Maybe some shell startup script (.bashrc etc) that could swallow/redirect all input? My next idea is to debug it via developer mode, let me try |
As far as I'm aware, it doesn't change the updates to do them through tor and it doesn't modify sys-net in a way that would affect this. From skimming through the code, all the sys-net impact should be:
But I can ask the team when the sun rises on the other side of the ocean. |
Thanks! I haven't played around with developer mode, yet. After entering that mode, one needs to use VNC, correct? Or is it though the web interface in some menu I haven't found. |
In this case, yes (which isn't exposed to the internet directly). The other use case for developer mode is to create needles interactively. |
Ok, this is really weird, it looks like some network packets are lost... My current hypothesis is it doesn't like sys-net restart (kernel or qemu bug). I'll try to get smaller reproducer. |
Interesting. Thanks for digging into it.
Will do! |
There is one instance where it succeeded and another where it failed on One thing that may also help here is that this morning I was checking if I ran the package logs upload at the beginning and it it passed that part and this was also on salmon. So in reality we have only two tests where the uploads succeeded: one at the beginning of the test run and another (very rare, at the end). |
I just tried to reproduce in a fresh job before SD gets installed, and it always works, also after sys-net restart, qubes restart and qubes+sys-net restart... |
In other words, SD is probably in some way interfering with sys-net. Either that or the VM's network bandwidth is sensitive to long-running tests. My day is full of meetings but I can try locally to see if I get similar packet drops, but I doubt it since I'm using the same system for browsing and it works fine (although TCP could be making packet drops non-noticeable -- which I doubt is the case). |
Time is unlikely the factor here, we have jobs running for 3h+ that upload a lot of logs (including multi-MB tarball of the whole /var/log) at the end. |
So, I have no idea what is going on... it's probably some QEMU or e1000e driver issue, but I don't really know why it happens, and why only here. All of that I tested by doing curl call in sys-net directly, to eliminate few moving parts. I'll test some workarounds like adding Details
This is how it looks on tcpdump from the host side (packets coming from qemu appear as coming from localhost):
and no further packets for this connection (not even retransmissions), looks like qemu isn't sending them. When looking from the sys-net side, all are sent, and there are retransmissions for the missing ones. when it works, it looks like this:
|
as for a temporary band-aid, try to add something like this to your branch: diff --git a/lib/networking.pm b/lib/networking.pm
index c0053f2..c1d4215 100644
--- a/lib/networking.pm
+++ b/lib/networking.pm
@@ -102,7 +102,7 @@ curl() {
fi
if [ -n "\$inputfile" ]; then
- qvm-run --no-gui -p sys-net "curl \$allargs" <\$inputfile
+ qvm-run --no-gui -p sys-net "curl --max-time 10 \$allargs" <\$inputfile
else
qvm-run --no-gui -p sys-net "curl \$allargs"
fi and It won't fix uploading, but at least won't block the whole thing on this issue. And when we'll manage to fix uploading, it should automatically start working again without test change. |
This is strange indeed. Thanks for investigating in any case. I have some implementation questions:
Will do. Should I also add it to the
Do you mean adding the |
Not sure. The issue we observe is only about sending data (POST requests with some payload), not GET request. I have checked that 10s is more than enough for upload calls, I haven't inspected logs for download calls (I don't remember fetching big files with this function, but probably safer to skip timeout on download for now (see that fetching curl-wrapper works).
I'd prefer only in securedrop test, so add the function parameter. |
In a specific set of circumstances [1] dropped packets were causing the log uploads to take a long time and eventually fail. Even though the client would send retries, they would not reach the host. Adding a timeout limit ensures early failures. [1]: QubesOS#25 (comment) Co-Authored-By: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
752f9d7
to
3da1f31
Compare
3da1f31
to
d8b192b
Compare
First attempt at adding a test for SecureDrop.