Skip to content

[ML] Add system call restrictions to the ML processes #98

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
May 28, 2018

Conversation

davidkyle
Copy link
Member

@davidkyle davidkyle commented May 21, 2018

The autodetect, autoconfig, categorize and normalize programs all require the ability to create a named pipe (mkfifo/CreateNamedPipe) open a connection to a named pipe and read & write to those pipes. All other system calls which are not required such as fork/exec should be restricted using platform specific APIs.

The code here is similar to SystemCallFilter.java.

LibSeccomp

This PR adds a new static library libseccomp containing a single class CSystemCallFilter the library must be linked in the 4 main programs listed above.

Due to static linkage remember to make relink the unit test and binaries after making a change.

macOS

macOS has the most configurable sandboxing functionality the approach here is to blacklist everything by default but allow read* and write* for open, mkfifo, read and write. This imposes harder limits than the SystemCallFilter.java which only denies fork and exec.

Linux

Seccomp BPF was added to prctl() in kernal 3.5, no filters will be installed in older kernals. The dedicated seccomp() function was added in kernal 3.17 and provides the same the functionality as prctl() but with the SECCOMP_FILTER_FLAG_TSYNC option that is not required here if the filters are installed by the main thread before any other threads are spawned. Because of this we use the older more compatible prctl() function.

SystemCallFilter.java tries to use seccomp() and falls back to prctl()

Windows

The Windows implementation uses Job Objects to prevent the process spawning another.

Release note: Secures the machine learning processes by preventing system calls such as fork
and exec. The Linux implementation uses Seccomp BPF (secure computing with
Berkeley Packet Filters) to intercept system calls and is available in kernels
since 3.5. On Windows, Job Objects prevent new processes being created and macOS
uses the sandbox functionality

@droberts195
Copy link
Contributor

execve, fork and vfork are disallowed should any other system calls be? fexecve? execveat?

Yes, execveat should definitely be disallowed (if it exists). I guess the complication here is that it only exists on kernel 3.19 and above. (And I think fexecve is the libc function that uses execveat rather than a system call itself?)

I think also socket, socketcall and socketpair should be disallowed, because we wanted to prevent network access. If we stop people getting hold of a socket then hopefully that will avoid the need to list every single other network function.

}

} else {
LOG_DEBUG(<< "Seccomp BPF not available");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could imagine this being info as it might be useful for customers to know

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure. Upgrading this would mean customers running on CentOS 6 would be spammed every time an ML process starts up. And normalize can run pretty frequently...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It started as info level but as Dave points out every time normalize starts the message is logged and this can spam the log files

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually we could log this message in autodetect main only. It is a useful message

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remember ES itself will have logged this once at startup. Logging in autodetect will still log 100 times if 100 jobs are started.

// In production this needs to match the setting of java.io.tmpdir. We rely
// on the JVM that spawns our controller daemon setting TMPDIR in the
// environment of the spawned process.
const char* tmpDir(::getenv("TMPDIR"));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider using boost for that, it handles some more cases afaik and can be easily replaced with std once we switch to C++17

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean is there a boost version of getenv? I'm using the temp dir set by elasticsearch

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean boost::filesystem::temp_directory_path()

std::string getTempRulesFilename() {
std::string tempDir = getTempDir();
// randomise the sandbox rules filename
char * tempName = ::tempnam(tempDir.c_str(), "ml");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider boost::filesystem::unique_path

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mkstemps is the safest way to do this as it avoids a race condition between creating and opening the file. I pushed a change to do this, unfortunately it uses strlcpy and strlcat and file descriptors.

Copy link
Contributor

@droberts195 droberts195 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left quite a few comments about low level things but the fundamental structure of this is exactly right in my opinion.

One more thing is that some of the formatting looks a little different to what clang-format would choose, so please run dev-tools/clang-format.sh over the whole codebase.

//!
class CSystemCallFilter {
public:
CSystemCallFilter();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should use a static method like installSystemCallFilter() rather than the constructor. Using the constructor implies an RAII idiom where the destructor reverses what the constructor has done. But that's impossible for system call filters.

Similar classes would be CCrashHandler and CProcessPriority and they use a static method rather than the constructor.

//! Installs Sandbox/sys call filters
//!
//! DESCRIPTION:\n
//!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could say the intention is to prevent starting new processes and, where possible, preventing network access.


namespace ml {
namespace seccomp {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything from here down to the end of canUseSeccompBpf() should be in an unnamed namespace to enforce internal linkage.

unsigned int SECCOMP_DATA_ARCH_OFFSET = 0x04;

// Copied from seccomp.h
// seccomp.h cannot be included as it was added in Linux kernal 3.17
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: kernal -> kernel


// The old x32 ABI always has bit 30 set in the sys call numbers.
// The x64 architecture should fail these calls
unsigned int UPPER_NR_LIMIT = 0x3FFFFFFF;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this and the two below should be const std::uint32_t (plus #include <cstdint>). The struct member they're used with is explicitly 32 bits, and whilst unsigned int is 32 bits on all the compilers we're currently using it's best practice to specify the size exactly where the consuming code does.

CPPUNIT_ASSERT(semVersion.tokenise(release, tokens));
// Seccomp is available in kernals since 3.5
if (std::stoi(tokens[0]) < 3 || std::stoi(tokens[1]) < 5) {
LOG_INFO(<< "Cannot test seccomp on linux kernals before 3.5");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: kernals -> kernels

// system call filters are applied
CPPUNIT_ASSERT(systemCall());

// // Install the filter
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: double //

TARGET=ml_test$(EXE_EXT)

USE_BOOST=1
USE_BOOST_REGEX_LIBS=1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't need this as lib/core has it and only lib/core is using the Boost regex lib directly. (If you do need it then it's a sign something isn't working as expected.)

USE_BOOST_REGEX_LIBS=1

LIBS:=$(LIB_ML_SECCOMP)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency the linker flags should be added here too, i.e. LDFLAGS:=$(ML_SECCOMP_LDFLAGS)

mk/stdcpptest.mk Outdated
@@ -15,7 +15,7 @@ REQUIRED_LIBS=$(LIB_ML_TEST) $(LIB_ML_CORE) $(LIB_ML_VER) $(CPPUNITLIBS)
LIBS:=$(LOCALLIBS) $(filter-out $(REQUIRED_LIBS), $(LIBS)) $(REQUIRED_LIBS)

CPPFLAGS+=$(INCLUDE_PATH)
LDFLAGS:=$(UTLDFLAGS) $(LDFLAGS) $(LIB_PATH) $(ML_VER_LDFLAGS)
LDFLAGS:=$(UTLDFLAGS) $(LDFLAGS) $(LIB_PATH) $(ML_VER_LDFLAGS) $(ML_SECCOMP_LDFLAGS)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're not linking the seccomp library to every unit test program I think the linker flags should also be added just in the one unit test program that tests it.

@@ -35,6 +35,10 @@ Explicit change point detection and modelling ({pull}92[#92])
Forecasting of Machine Learning job time series is now supported for large jobs by temporarily storing
model state on disk ({pull}89[#89])

Secure the ML processes by preventing system calls such as fork and exec. The Linux implemenation uses
Seccomp BPF to intercept system calls and is available in kernels since 3.5. On Windows Job Objects prevent
new proceses being created and macOs uses the sandbox functionality ({pull}98[#98])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

proceses -> processes

char * tempName = ::tempnam(tempDir.c_str(), "ml");
if (tempName != nullptr) {
return std::string(tempName) + ".sb";
::free(tempName);
Copy link
Contributor

@tveasey tveasey May 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

free after return? But goes away with boost anyway.

Copy link
Contributor

@droberts195 droberts195 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few more minor comments/suggestions.

@@ -35,6 +35,10 @@ Explicit change point detection and modelling ({pull}92[#92])
Forecasting of Machine Learning job time series is now supported for large jobs by temporarily storing
model state on disk ({pull}89[#89])

Secure the ML processes by preventing system calls such as fork and exec. The Linux implemenation uses
Seccomp BPF to intercept system calls and is available in kernels since 3.5. On Windows Job Objects prevent
new processes being created and macOs uses the sandbox functionality ({pull}98[#98])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think the official name is "macOS" with a captial S at the end

(allow file-write-data)");

// mkstemps will replace the Xs with random characters
static const char FILE_NAME_TEMPLATE[] = {"ml.XXXXXX.sb"};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not necessary to declare variables static within an unnamed namespace - simply being inside the unnamed namespace gives them internal linkage. (I can see a few files where I've unnecessarily added static in this situation in the past, so I'll remove those in a follow-up.)


// Create and open a temporary file with a random name
// templateBuff is updated with the new filename.
// 3 is the size of the suffix ".sb"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be nicer if the 3 was a symbolic constant (e.g. FILE_NAME_TEMPLATE_SUFFIX_LEN) defined adjacent to FILE_NAME_TEMPLATE. Then anyone changing FILE_NAME_TEMPLATE would be more likely to see they also needed to change the 3.

// Create and open a temporary file with a random name
// templateBuff is updated with the new filename.
// 3 is the size of the suffix ".sb"
int fd = mkstemps(templateBuff.get(), 3);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could avoid the C-style string functions by doing something like this:

std::string profileFilename = getTempDir() + FILE_NAME_TEMPLATE;
int fd = mkstemps(&profileFilename[0], FILE_NAME_TEMPLATE_SUFFIX_LEN);

Also, the documentation for mkstemps() states that it can fail and set errno, so maybe log the error and return an empty string in that case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that safe? Is &profileFilename[0] null terminated?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is in C++11 and above, but not previous standards.

https://stackoverflow.com/a/11752722 summarizes the situation.


namespace {

struct CheckedHandle {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be SCheckedHandle as our structs begin with S

//!
class CSystemCallFilter {
public:
CSystemCallFilter();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should use a static method like installSystemCallFilter() rather than the constructor. Using the constructor implies an RAII idiom where the destructor reverses what the constructor has done. But that's impossible for system call filters.

CCrashHandler and CProcessPriority are conceptually similar and they use a static method rather than the constructor.

Copy link
Contributor

@droberts195 droberts195 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making all those changes.

LGTM

const char* TEST_WRITE_PIPE_NAME = "\\\\.\\pipe\\testwritepipe";
#else
const char* TEST_READ_PIPE_NAME = "testreadpipe";
const char* TEST_WRITE_PIPE_NAME = "testwritepipe";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To have internal linkage these 4 const char* constants should really be const char* const.

(allow file-write-data)");

// mkstemps will replace the Xs with random characters
const char* FILE_NAME_TEMPLATE = "ml.XXXXXX.sb";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could make this a const std::string, as that's what we usually use for string constants unless there's some horrible circular dependency in linkage or static initialization.

But if you stick with a const char* then it really should be const char* const to result in internal linkage.

BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, __NR_exit, 4, 0),
BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, __NR_statfs, 3, 0),
BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, __NR_dup2, 2, 0),
BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, __NR_open, 1, 0),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We call dladdr() and raise() as part of the crash handler. I think it would be worth double checking that these don't require a system call that's not on the list if you haven't done so already.

@davidkyle
Copy link
Member Author

I pushed a commit that inverts the BPF approach on linux so instead of listing banned syscalls I list all the ones that can be used. This has the advantage that we don't have to think of every syscall that should be blocked and covers new functions (e.g. execveat added in kernel 3.19). The downside is that a different distribution makes unusual syscalls then the processes won't function. I do not know how much of a risk that is.

strace is useful for finding the syscalls a program makes for example the output from running the seccomp unit tests:

$strace -c ML_CPP_HOME/lib/seccomp/unittest/ml_test
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 43.50    0.001027        1027         1           wait4
 15.67    0.000370           3       114        80 open
  8.68    0.000205          23         9           brk
  8.30    0.000196           2        86           mmap
  6.01    0.000142           4        34        23 stat
  3.64    0.000086           2        53           mprotect
  2.20    0.000052          26         2         2 connect
  2.12    0.000050           3        18           write
  2.03    0.000048           1        36           close
  1.95    0.000046          46         1           readlink
  1.74    0.000041           1        35           read
  1.57    0.000037           1        34           fstat
  0.59    0.000014           7         2         2 access
  0.55    0.000013          13         1           execve
  0.51    0.000012           6         2           lstat
  0.25    0.000006           3         2           socket
  0.21    0.000005           2         3           clone
  0.17    0.000004           1         3         1 prctl
  0.08    0.000002           0        14           rt_sigaction
  0.08    0.000002           2         1           getcwd
  0.04    0.000001           0         3           rt_sigprocmask
  0.04    0.000001           1         2           getuid
  0.04    0.000001           1         2           futex
  0.00    0.000000           0         1           lseek
  0.00    0.000000           0         6           munmap
  0.00    0.000000           0         1           uname
  0.00    0.000000           0         1           getrlimit
  0.00    0.000000           0         1           arch_prctl
  0.00    0.000000           0         1           set_tid_address
  0.00    0.000000           0         1           set_robust_list
------ ----------- ----------- --------- --------- ----------------

To get the full list I change to filter to return SECCOMP_RET_KILL so the error would be logged then ran the integration tests adding syscalls to the filter until the tests stopped failing.

Copy link
Member

@andrewkroh andrewkroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked over the Linux sections. 👍

@davidkyle davidkyle merged commit c15c7aa into elastic:master May 28, 2018
@davidkyle davidkyle deleted the seccomp-lib branch May 28, 2018 21:07
davidkyle added a commit to davidkyle/ml-cpp that referenced this pull request May 29, 2018
Restrict the ability of the autodetect, autoconfig, categorize and normalize
programs to make system calls. Implemented for Linux (Seccomp BPF),
 macOS (sandbox) and Windows (Job Groups)
hendrikmuhs pushed a commit to hendrikmuhs/ml-cpp that referenced this pull request May 30, 2018
Restrict the ability of the autodetect, autoconfig, categorize and normalize programs to make system calls. Implemented for Linux (Seccomp BPF), macOS (sandbox) and Windows (Job Groups)
davidkyle added a commit that referenced this pull request May 31, 2018
Add system call restrictions to the ML processes  (#98)

Restrict the ability of the autodetect, autoconfig, categorize and normalize
programs to make system calls. Implemented for Linux (Seccomp BPF),
 macOS (sandbox) and Windows (Job Groups)

* [ML] Add comment explaining changes required if Linux BPF are modified.
@sophiec20 sophiec20 added :ml and removed :ml labels Jun 12, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants