-
Notifications
You must be signed in to change notification settings - Fork 65
[ML] Add system call restrictions to the ML processes #98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Yes, I think also |
} | ||
|
||
} else { | ||
LOG_DEBUG(<< "Seccomp BPF not available"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could imagine this being info as it might be useful for customers to know
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure. Upgrading this would mean customers running on CentOS 6 would be spammed every time an ML process starts up. And normalize
can run pretty frequently...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It started as info level but as Dave points out every time normalize starts the message is logged and this can spam the log files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually we could log this message in autodetect main only. It is a useful message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remember ES itself will have logged this once at startup. Logging in autodetect will still log 100 times if 100 jobs are started.
// In production this needs to match the setting of java.io.tmpdir. We rely | ||
// on the JVM that spawns our controller daemon setting TMPDIR in the | ||
// environment of the spawned process. | ||
const char* tmpDir(::getenv("TMPDIR")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider using boost for that, it handles some more cases afaik and can be easily replaced with std once we switch to C++17
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what you mean is there a boost version of getenv? I'm using the temp dir set by elasticsearch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean boost::filesystem::temp_directory_path()
std::string getTempRulesFilename() { | ||
std::string tempDir = getTempDir(); | ||
// randomise the sandbox rules filename | ||
char * tempName = ::tempnam(tempDir.c_str(), "ml"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider boost::filesystem::unique_path
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mkstemps
is the safest way to do this as it avoids a race condition between creating and opening the file. I pushed a change to do this, unfortunately it uses strlcpy
and strlcat
and file descriptors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left quite a few comments about low level things but the fundamental structure of this is exactly right in my opinion.
One more thing is that some of the formatting looks a little different to what clang-format would choose, so please run dev-tools/clang-format.sh
over the whole codebase.
include/seccomp/CSystemCallFilter.h
Outdated
//! | ||
class CSystemCallFilter { | ||
public: | ||
CSystemCallFilter(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should use a static method like installSystemCallFilter()
rather than the constructor. Using the constructor implies an RAII idiom where the destructor reverses what the constructor has done. But that's impossible for system call filters.
Similar classes would be CCrashHandler
and CProcessPriority
and they use a static method rather than the constructor.
include/seccomp/CSystemCallFilter.h
Outdated
//! Installs Sandbox/sys call filters | ||
//! | ||
//! DESCRIPTION:\n | ||
//! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could say the intention is to prevent starting new processes and, where possible, preventing network access.
|
||
namespace ml { | ||
namespace seccomp { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything from here down to the end of canUseSeccompBpf()
should be in an unnamed namespace to enforce internal linkage.
unsigned int SECCOMP_DATA_ARCH_OFFSET = 0x04; | ||
|
||
// Copied from seccomp.h | ||
// seccomp.h cannot be included as it was added in Linux kernal 3.17 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: kernal -> kernel
|
||
// The old x32 ABI always has bit 30 set in the sys call numbers. | ||
// The x64 architecture should fail these calls | ||
unsigned int UPPER_NR_LIMIT = 0x3FFFFFFF; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this and the two below should be const std::uint32_t
(plus #include <cstdint>
). The struct member they're used with is explicitly 32 bits, and whilst unsigned int
is 32 bits on all the compilers we're currently using it's best practice to specify the size exactly where the consuming code does.
CPPUNIT_ASSERT(semVersion.tokenise(release, tokens)); | ||
// Seccomp is available in kernals since 3.5 | ||
if (std::stoi(tokens[0]) < 3 || std::stoi(tokens[1]) < 5) { | ||
LOG_INFO(<< "Cannot test seccomp on linux kernals before 3.5"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: kernals -> kernels
// system call filters are applied | ||
CPPUNIT_ASSERT(systemCall()); | ||
|
||
// // Install the filter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: double //
lib/seccomp/unittest/Makefile
Outdated
TARGET=ml_test$(EXE_EXT) | ||
|
||
USE_BOOST=1 | ||
USE_BOOST_REGEX_LIBS=1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You shouldn't need this as lib/core
has it and only lib/core
is using the Boost regex lib directly. (If you do need it then it's a sign something isn't working as expected.)
lib/seccomp/unittest/Makefile
Outdated
USE_BOOST_REGEX_LIBS=1 | ||
|
||
LIBS:=$(LIB_ML_SECCOMP) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency the linker flags should be added here too, i.e. LDFLAGS:=$(ML_SECCOMP_LDFLAGS)
mk/stdcpptest.mk
Outdated
@@ -15,7 +15,7 @@ REQUIRED_LIBS=$(LIB_ML_TEST) $(LIB_ML_CORE) $(LIB_ML_VER) $(CPPUNITLIBS) | |||
LIBS:=$(LOCALLIBS) $(filter-out $(REQUIRED_LIBS), $(LIBS)) $(REQUIRED_LIBS) | |||
|
|||
CPPFLAGS+=$(INCLUDE_PATH) | |||
LDFLAGS:=$(UTLDFLAGS) $(LDFLAGS) $(LIB_PATH) $(ML_VER_LDFLAGS) | |||
LDFLAGS:=$(UTLDFLAGS) $(LDFLAGS) $(LIB_PATH) $(ML_VER_LDFLAGS) $(ML_SECCOMP_LDFLAGS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're not linking the seccomp library to every unit test program I think the linker flags should also be added just in the one unit test program that tests it.
docs/CHANGELOG.asciidoc
Outdated
@@ -35,6 +35,10 @@ Explicit change point detection and modelling ({pull}92[#92]) | |||
Forecasting of Machine Learning job time series is now supported for large jobs by temporarily storing | |||
model state on disk ({pull}89[#89]) | |||
|
|||
Secure the ML processes by preventing system calls such as fork and exec. The Linux implemenation uses | |||
Seccomp BPF to intercept system calls and is available in kernels since 3.5. On Windows Job Objects prevent | |||
new proceses being created and macOs uses the sandbox functionality ({pull}98[#98]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
proceses -> processes
char * tempName = ::tempnam(tempDir.c_str(), "ml"); | ||
if (tempName != nullptr) { | ||
return std::string(tempName) + ".sb"; | ||
::free(tempName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
free after return? But goes away with boost anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a few more minor comments/suggestions.
docs/CHANGELOG.asciidoc
Outdated
@@ -35,6 +35,10 @@ Explicit change point detection and modelling ({pull}92[#92]) | |||
Forecasting of Machine Learning job time series is now supported for large jobs by temporarily storing | |||
model state on disk ({pull}89[#89]) | |||
|
|||
Secure the ML processes by preventing system calls such as fork and exec. The Linux implemenation uses | |||
Seccomp BPF to intercept system calls and is available in kernels since 3.5. On Windows Job Objects prevent | |||
new processes being created and macOs uses the sandbox functionality ({pull}98[#98]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think the official name is "macOS" with a captial S at the end
(allow file-write-data)"); | ||
|
||
// mkstemps will replace the Xs with random characters | ||
static const char FILE_NAME_TEMPLATE[] = {"ml.XXXXXX.sb"}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not necessary to declare variables static
within an unnamed namespace - simply being inside the unnamed namespace gives them internal linkage. (I can see a few files where I've unnecessarily added static
in this situation in the past, so I'll remove those in a follow-up.)
|
||
// Create and open a temporary file with a random name | ||
// templateBuff is updated with the new filename. | ||
// 3 is the size of the suffix ".sb" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be nicer if the 3
was a symbolic constant (e.g. FILE_NAME_TEMPLATE_SUFFIX_LEN
) defined adjacent to FILE_NAME_TEMPLATE
. Then anyone changing FILE_NAME_TEMPLATE
would be more likely to see they also needed to change the 3
.
// Create and open a temporary file with a random name | ||
// templateBuff is updated with the new filename. | ||
// 3 is the size of the suffix ".sb" | ||
int fd = mkstemps(templateBuff.get(), 3); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could avoid the C-style string functions by doing something like this:
std::string profileFilename = getTempDir() + FILE_NAME_TEMPLATE;
int fd = mkstemps(&profileFilename[0], FILE_NAME_TEMPLATE_SUFFIX_LEN);
Also, the documentation for mkstemps()
states that it can fail and set errno
, so maybe log the error and return an empty string in that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that safe? Is &profileFilename[0]
null terminated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is in C++11 and above, but not previous standards.
https://stackoverflow.com/a/11752722 summarizes the situation.
|
||
namespace { | ||
|
||
struct CheckedHandle { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be SCheckedHandle
as our structs begin with S
include/seccomp/CSystemCallFilter.h
Outdated
//! | ||
class CSystemCallFilter { | ||
public: | ||
CSystemCallFilter(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should use a static method like installSystemCallFilter()
rather than the constructor. Using the constructor implies an RAII idiom where the destructor reverses what the constructor has done. But that's impossible for system call filters.
CCrashHandler
and CProcessPriority
are conceptually similar and they use a static method rather than the constructor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making all those changes.
LGTM
const char* TEST_WRITE_PIPE_NAME = "\\\\.\\pipe\\testwritepipe"; | ||
#else | ||
const char* TEST_READ_PIPE_NAME = "testreadpipe"; | ||
const char* TEST_WRITE_PIPE_NAME = "testwritepipe"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To have internal linkage these 4 const char*
constants should really be const char* const
.
(allow file-write-data)"); | ||
|
||
// mkstemps will replace the Xs with random characters | ||
const char* FILE_NAME_TEMPLATE = "ml.XXXXXX.sb"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could make this a const std::string
, as that's what we usually use for string constants unless there's some horrible circular dependency in linkage or static initialization.
But if you stick with a const char*
then it really should be const char* const
to result in internal linkage.
BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, __NR_exit, 4, 0), | ||
BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, __NR_statfs, 3, 0), | ||
BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, __NR_dup2, 2, 0), | ||
BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, __NR_open, 1, 0), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed a commit that inverts the BPF approach on linux so instead of listing banned syscalls I list all the ones that can be used. This has the advantage that we don't have to think of every syscall that should be blocked and covers new functions (e.g.
To get the full list I change to filter to return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked over the Linux sections. 👍
Restrict the ability of the autodetect, autoconfig, categorize and normalize programs to make system calls. Implemented for Linux (Seccomp BPF), macOS (sandbox) and Windows (Job Groups)
Restrict the ability of the autodetect, autoconfig, categorize and normalize programs to make system calls. Implemented for Linux (Seccomp BPF), macOS (sandbox) and Windows (Job Groups)
Add system call restrictions to the ML processes (#98) Restrict the ability of the autodetect, autoconfig, categorize and normalize programs to make system calls. Implemented for Linux (Seccomp BPF), macOS (sandbox) and Windows (Job Groups) * [ML] Add comment explaining changes required if Linux BPF are modified.
The
autodetect
,autoconfig
,categorize
andnormalize
programs all require the ability to create a named pipe (mkfifo/CreateNamedPipe) open a connection to a named pipe and read & write to those pipes. All other system calls which are not required such as fork/exec should be restricted using platform specific APIs.The code here is similar to SystemCallFilter.java.
LibSeccomp
This PR adds a new static library
libseccomp
containing a single classCSystemCallFilter
the library must be linked in the 4 main programs listed above.Due to static linkage remember to
make relink
the unit test and binaries after making a change.macOS
macOS has the most configurable sandboxing functionality the approach here is to blacklist everything by default but allow
read*
andwrite*
for open, mkfifo, read and write. This imposes harder limits than the SystemCallFilter.java which only denies fork and exec.Linux
Seccomp BPF was added to
prctl()
in kernal 3.5, no filters will be installed in older kernals. The dedicatedseccomp()
function was added in kernal 3.17 and provides the same the functionality asprctl()
but with the SECCOMP_FILTER_FLAG_TSYNC option that is not required here if the filters are installed by the main thread before any other threads are spawned. Because of this we use the older more compatibleprctl()
function.SystemCallFilter.java tries to use
seccomp()
and falls back toprctl()
Windows
The Windows implementation uses Job Objects to prevent the process spawning another.
Release note: Secures the machine learning processes by preventing system calls such as fork
and exec. The Linux implementation uses Seccomp BPF (secure computing with
Berkeley Packet Filters) to intercept system calls and is available in kernels
since 3.5. On Windows, Job Objects prevent new processes being created and macOS
uses the sandbox functionality