Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker networking by hostnames inside linux-sandbox #5869

Closed
talya opened this issue Aug 13, 2018 · 8 comments
Closed

Docker networking by hostnames inside linux-sandbox #5869

talya opened this issue Aug 13, 2018 · 8 comments

Comments

@talya
Copy link

talya commented Aug 13, 2018

Description of the problem / feature request:

Networking to a docker container via hostname while running with linux-sandbox is not possible.
Is there some way to whitelist other networks (other than eth0)?
Is it possible that the -N flag in the linux-sandbox code is relevant? There doesn't seem to be a matching cli flag for turning this on.

Feature requests: what underlying problem are you trying to solve with this feature?

We're having issues with docker networking by hostnames inside linux-sandbox strategy.

The use case we're seeing this in is running an end-to-end test that includes a jvm and a docker container, where the jvm needs to communicate with the container.
On RBE we managed to get the test green, by using --network=$HOST_NETWORK_NAME.
The fact that all containers are on the same network allows us to address by hostname (container name).

However, for our local development (mac as well as linux), which also run bazel from within a docker container, we cannot address by name because of sandboxing.
When we try accessing the container (even via curl), we get an 'unknown host' error.

e.g.
In our test runner, assume HOST_NETWORK_NAME is set to the network of the container running the test.
We start a container # docker run --name cassandra -p 9040 --network=$HOST_NETWORK_NAME cassandra:3.9
Then we # curl cassandra:9040
Unknown host: cassandra

Without sandboxing, it works.

In theory we could decide to use --network=container:$HOST_CONTAINER_NAME + change our code to use 'localhost' addressing, but this has 2 downsides:
a - we force our developers to never used named networks.
b - it will cause port collisions on localhost and so won't allow us to run locally with concurrent tests.

We need to solve this to allow local development, and also so that these tests won't break on RBE once network sandboxing is enabled there.

@dslomov
Copy link
Contributor

dslomov commented Sep 10, 2018

@philwo ping

@philwo
Copy link
Member

philwo commented Sep 10, 2018

Hi,

I'm not sure I really understand what you're doing, could you please give me an example from a BUILD file that shows the test that you're trying to run and the actual command-line of Bazel (with all flags present in a bazelrc file, too) that you're trying to run and that then fails?

I'll also describe the network blocking behavior of Bazel:

By default, Bazel does not block network access for running actions, even when they run in the linux-sandbox. This behavior can be tuned by various mechanisms.

  • If you specify the flag --java_debug, Bazel will always allow network access for actions, because otherwise you couldn't connect to the remote debugging port of the JVM of your Java test.
  • If you specify the tag "block-network" for a test in the BUILD file (e.g. java_test(..., tags = ["block-network"]), then Bazel will ask the sandbox to prevent the test from accessing the network.
  • If you specify the tag "requires-network" for a test in the BUILD file, then Bazel will ask the sandbox to allow the test to access the network.
  • If none of these tags are specified, the default behavior is defined by the flag --experimental_sandbox_default_allow_network, which defaults to "true". "True" means that tests can access the network by default.

Now, what does "prevent the test from accessing the network" even mean? The exact meaning is platform-dependent: On Linux, this means that the test will run in a completely isolated network that has its own instance of localhost and no access anything else, on macOS this means that the test can interact with the normal localhost of the machine, but not other hosts. This behavior is not configurable at the moment - you can only switch it on or off.

I hope this helps!

@talya
Copy link
Author

talya commented Sep 14, 2018

@philwo tks for the reply and the great explanation of sandboxing!

What we are trying to do is support a common integration testing pattern we have, which makes use of docker containers for either collaborator services (mysql, cassandra, redis, ...), or for the service-under-test, or both.
(Giving an example is a bit complicated right now since we use our own homegrown docker testkit code.)

In both cases, the tests cannot be run with linux-sandbox, since as you described this gives an isolated network.

The technical root cause here seems to be the fact the linux-sandbox code uses the CLONE_NEWNET flag, while the docker containers we spin up via this test live in their own network namespace (controlled by the docker daemon), thereby blocking all communication between linux-sandbox <--> container.
(this means our theoretical solution from above of using --network=container:$HOST_CONTAINER_NAME also doesn't work, we now know this)

We are investigating several directions for now to understand what's feasible, here's a quick list for now:

  • Docker to allow spawning containers inside existing namespaces (or somehow move generated containers to existing namespaces)
  • linux-sandbox to somehow allow docker networking while blocking outside world
  • Run a docker daemon within the sandbox and spawn containers there (DinD if feasible?)
  • Use docker strategy for bazel sandbox (instead of linux-sandbox)

@philwo
Copy link
Member

philwo commented Sep 24, 2018

Hi @talya,

I agree - it sounds like this currently just won't work. Using the Docker strategy to run your tests sounds like the best option to me, because the caveats (due to the overhead, it's slower than using linux-sandbox) probably won't apply to running rather large integration tests.

If you need additional features for that (e.g. being able to specify a --network?), let me know and I'll see that we can get them in :)

Edit: Thinking about it, I'm still not sure why linux-sandbox puts your tests in a new network namespace - according to the list in my last post, unless you explicitly specify "block-network" or use --experimental_sandbox_default_allow_network=false, it shouldn't do so and the tests should have unlimited access to the network. Isn't this the case for you?

@jin jin added team-Local-Exec Issues and PRs for the Execution (Local) team and removed team-Execution labels Jan 14, 2019
@jin
Copy link
Member

jin commented Jan 14, 2019

@philwo could you add a priority to this issue, please?

@talya
Copy link
Author

talya commented Jan 15, 2019

small update from our side - we have a poc of a new direction that seems to be promising - In a nutshell the solution is to start a “routing” container and have it implement a docker network driver and have the containers our tests start connect to that network.
We use the namespace of the bazel linux-sandbox process (as it appears in root pid namespace), and then the routing container knows to only allow network to other docker containers but not to the internet.

@philwo
Copy link
Member

philwo commented Jan 16, 2019

@talya Sounds like a nice solution for your use case! :)

Is there anything we should add to Bazel or can I close this issue considering that you found a way?

At the moment, I'm not sure what the exact feature request is, because I've never used these named networks and communication between Docker containers using a shared network... if there's still something to do here, can you please try to explain it again, what you would like Bazel or the linux-sandbox to do?

@talya
Copy link
Author

talya commented Jan 16, 2019

Nothing needed from bazel, will close :)
Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants