Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ASan][libc++] Turn on ASan annotations for short strings #79536

Merged
merged 2 commits into from
May 7, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
60 changes: 44 additions & 16 deletions libcxx/include/string
Original file line number Diff line number Diff line change
Expand Up @@ -662,7 +662,6 @@ _LIBCPP_PUSH_MACROS
#else
# define _LIBCPP_STRING_INTERNAL_MEMORY_ACCESS
#endif
#define _LIBCPP_SHORT_STRING_ANNOTATIONS_ALLOWED false

_LIBCPP_BEGIN_NAMESPACE_STD

Expand Down Expand Up @@ -736,10 +735,44 @@ public:
//
// This string implementation doesn't contain any references into itself. It only contains a bit that says whether
// it is in small or large string mode, so the entire structure is trivially relocatable if its members are.
#if !defined(_LIBCPP_HAS_NO_ASAN) && defined(_LIBCPP_INSTRUMENTED_WITH_ASAN)
// When compiling with AddressSanitizer (ASan), basic_string cannot be trivially
// relocatable. Because the object's memory might be poisoned when its content
// is kept inside objects memory (short string optimization), instead of in allocated
// external memory. In such cases, the destructor is responsible for unpoisoning
// the memory to avoid triggering false positives.
// Therefore it's crucial to ensure the destructor is called
using __trivially_relocatable = false_type;
Copy link
Contributor

@philnik777 philnik777 May 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
using __trivially_relocatable = false_type;
using __trivially_relocatable = void;

IMO ASan shouldn't affect traits of the type. This change definitely needs more discussion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe there needs to be some ASan handling when relocating objects? We're destroying objects after all, which should probably poison memory.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to incrementally improve string annotations.
Without change of __trivially_relocatable and no other changes ASan error happens exactly here:

} else {
__builtin_memcpy(const_cast<__remove_const_t<_Tp>*>(__result), __first, sizeof(_Tp) * (__last - __first));
}

__builtin_memcpy accesses poisoned memory, when objects memory is poisoned.

We can unpoison memory there for every type. Then problematic code area would be:

 } else {
 #ifndef  _LIBCPP_HAS_NO_ASAN
   __asan_unpoison_memory_region(__first, sizeof(_Tp) * (__last - __first));
#endif
   __builtin_memcpy(const_cast<__remove_const_t<_Tp>*>(__result), __first, sizeof(_Tp) * (__last - __first)); 
 } 

Disadvantage of that over changing the value of __trivially_relocatable is the need of unpoisoning memory like that in every function which depends on __libcpp_is_trivially_relocatable. But it shouldn't be a big problem.

What I don't like more is fact that objects after that move won't be poisoned correctly (may result in false negatives).
My idea to fix it is creating a compiler-rt function __asan_move_annotations which may be used here instead of __asan_unpoison_memory_region, making old buffer unpoisoned and the new buffer poisoned in the same way as the old one.

If we try to optimize it as much as possible, creating something similar to __libcpp_is_trivially_relocatable just for ASan is possible. For example __libcpp_may_memory_be_poisoned, and calling __asan_unpoison_memory_region/__asan_move_annotations if and only if that value is true. However, __asan_unpoison_memory_region should be cheaper than __builtin_memcpy, so I think we can accept additional cost here as new ASan trait adds a lot of complexity.

We're destroying objects after all, which should probably poison memory.

Poisoning, if happens, happens in the allocator (during deallocation) and we cannot assume anything about it. For example, allocator cleaning memory before freeing it touches the whole memory before annotations are updated. Therefore __annotate_delete() in all classes is necessary.

But something like __asan_move_annotations is possible and I can create it. Then we can go back to one value of __trivially_relocatable.
To put my actions behind my words, I will open a PR with that change Today or Tomorrow.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that change as two or three PRs.

Last thing is trivial. __memcpy_with_asan I hope to be fairly easy, but I am not sure what place is the best to add it. Fist one I just drafted, but didn't test it yet. I should finish everything today, unless we have a discussion about changing direction of that work. Let me know what you think.

#else
using __trivially_relocatable = __conditional_t<
__libcpp_is_trivially_relocatable<allocator_type>::value && __libcpp_is_trivially_relocatable<pointer>::value,
basic_string,
void>;
#endif
#if !defined(_LIBCPP_HAS_NO_ASAN) && defined(_LIBCPP_INSTRUMENTED_WITH_ASAN)
_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20
pointer __asan_volatile_wrapper(pointer const &__ptr) const {
if (__libcpp_is_constant_evaluated())
return __ptr;

pointer volatile __copy_ptr = __ptr;

return const_cast<pointer &>(__copy_ptr);
}

_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20
const_pointer __asan_volatile_wrapper(const_pointer const &__ptr) const {
if (__libcpp_is_constant_evaluated())
return __ptr;

const_pointer volatile __copy_ptr = __ptr;

return const_cast<const_pointer &>(__copy_ptr);
}
#define _LIBCPP_ASAN_VOLATILE_WRAPPER(PTR) __asan_volatile_wrapper(PTR)
#else
#define _LIBCPP_ASAN_VOLATILE_WRAPPER(PTR) PTR
#endif

static_assert((!is_array<value_type>::value), "Character type of basic_string must not be an array");
static_assert((is_standard_layout<value_type>::value), "Character type of basic_string must be standard-layout");
Expand Down Expand Up @@ -1886,16 +1919,16 @@ private:
__r_.first().__l.__data_ = __p;
}
_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 pointer __get_long_pointer() _NOEXCEPT {
return __r_.first().__l.__data_;
return _LIBCPP_ASAN_VOLATILE_WRAPPER(__r_.first().__l.__data_);
}
_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 const_pointer __get_long_pointer() const _NOEXCEPT {
return __r_.first().__l.__data_;
return _LIBCPP_ASAN_VOLATILE_WRAPPER(__r_.first().__l.__data_);
}
_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 pointer __get_short_pointer() _NOEXCEPT {
return pointer_traits<pointer>::pointer_to(__r_.first().__s.__data_[0]);
_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_STRING_INTERNAL_MEMORY_ACCESS pointer __get_short_pointer() _NOEXCEPT {
return _LIBCPP_ASAN_VOLATILE_WRAPPER(pointer_traits<pointer>::pointer_to(__r_.first().__s.__data_[0]));
}
_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 const_pointer __get_short_pointer() const _NOEXCEPT {
return pointer_traits<const_pointer>::pointer_to(__r_.first().__s.__data_[0]);
_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_STRING_INTERNAL_MEMORY_ACCESS const_pointer __get_short_pointer() const _NOEXCEPT {
return _LIBCPP_ASAN_VOLATILE_WRAPPER(pointer_traits<const_pointer>::pointer_to(__r_.first().__s.__data_[0]));
}
_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 pointer __get_pointer() _NOEXCEPT {
return __is_long() ? __get_long_pointer() : __get_short_pointer();
Expand All @@ -1914,38 +1947,33 @@ private:
#endif
}

// ASan: short string is poisoned if and only if this function returns true.
_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 bool __asan_short_string_is_annotated() const _NOEXCEPT {
return _LIBCPP_SHORT_STRING_ANNOTATIONS_ALLOWED && !__libcpp_is_constant_evaluated();
}

_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 void __annotate_new(size_type __current_size) const _NOEXCEPT {
(void)__current_size;
#if !defined(_LIBCPP_HAS_NO_ASAN) && defined(_LIBCPP_INSTRUMENTED_WITH_ASAN)
if (!__libcpp_is_constant_evaluated() && (__asan_short_string_is_annotated() || __is_long()))
if (!__libcpp_is_constant_evaluated())
__annotate_contiguous_container(data() + capacity() + 1, data() + __current_size + 1);
#endif
}

_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 void __annotate_delete() const _NOEXCEPT {
#if !defined(_LIBCPP_HAS_NO_ASAN) && defined(_LIBCPP_INSTRUMENTED_WITH_ASAN)
if (!__libcpp_is_constant_evaluated() && (__asan_short_string_is_annotated() || __is_long()))
if (!__libcpp_is_constant_evaluated())
__annotate_contiguous_container(data() + size() + 1, data() + capacity() + 1);
#endif
}

_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 void __annotate_increase(size_type __n) const _NOEXCEPT {
(void)__n;
#if !defined(_LIBCPP_HAS_NO_ASAN) && defined(_LIBCPP_INSTRUMENTED_WITH_ASAN)
if (!__libcpp_is_constant_evaluated() && (__asan_short_string_is_annotated() || __is_long()))
if (!__libcpp_is_constant_evaluated())
__annotate_contiguous_container(data() + size() + 1, data() + size() + 1 + __n);
#endif
}

_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 void __annotate_shrink(size_type __old_size) const _NOEXCEPT {
(void)__old_size;
#if !defined(_LIBCPP_HAS_NO_ASAN) && defined(_LIBCPP_INSTRUMENTED_WITH_ASAN)
if (!__libcpp_is_constant_evaluated() && (__asan_short_string_is_annotated() || __is_long()))
if (!__libcpp_is_constant_evaluated())
__annotate_contiguous_container(data() + __old_size + 1, data() + size() + 1);
#endif
}
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,182 @@
//===----------------------------------------------------------------------===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//

// REQUIRES: asan
// UNSUPPORTED: c++03

#include <cassert>
#include <string>
#include <array>
#include <deque>
#include "test_macros.h"
#include "asan_testing.h"
#include "min_allocator.h"

// This tests exists to check if strings work well with deque, as those
// may be partialy annotated, we cannot simply call
// is_double_ended_contiguous_container_asan_correct, as it assumes that
// object memory inside is not annotated, so we check everything in a more careful way.

template <typename D>
void verify_inside(D const& d) {
for (size_t i = 0; i < d.size(); ++i) {
assert(is_string_asan_correct(d[i]));
}
}

template <typename S, size_t N>
S get_s(char c) {
S s;
for (size_t i = 0; i < N; ++i)
s.push_back(c);

return s;
}

template <class C, class S>
void test_string() {
size_t const N = sizeof(S) < 256 ? (4096 / sizeof(S)) : 16;

{
C d1a(1), d1b(N), d1c(N + 1), d1d(5 * N);
verify_inside(d1a);
verify_inside(d1b);
verify_inside(d1c);
verify_inside(d1d);
}
{
C d2;
for (size_t i = 0; i < 3 * N + 2; ++i) {
d2.push_back(get_s<S, 1>(i % 10 + 'a'));
verify_inside(d2);
d2.push_back(get_s<S, 22>(i % 10 + 'b'));
verify_inside(d2);

d2.pop_front();
verify_inside(d2);
}
}
{
C d3;
for (size_t i = 0; i < 3 * N + 2; ++i) {
d3.push_front(get_s<S, 1>(i % 10 + 'a'));
verify_inside(d3);
d3.push_front(get_s<S, 28>(i % 10 + 'b'));
verify_inside(d3);

d3.pop_back();
verify_inside(d3);
}
}
{
C d4;
for (size_t i = 0; i < 3 * N + 2; ++i) {
// When there is no SSO, all elements inside should not be poisoned,
// so we can verify deque poisoning.
d4.push_front(get_s<S, 33>(i % 10 + 'a'));
verify_inside(d4);
assert(is_double_ended_contiguous_container_asan_correct(d4));
d4.push_back(get_s<S, 28>(i % 10 + 'b'));
verify_inside(d4);
assert(is_double_ended_contiguous_container_asan_correct(d4));
}
}
{
C d5;
for (size_t i = 0; i < 3 * N + 2; ++i) {
// In d4 we never had poisoned memory inside deque.
// Here we start with SSO, so part of the inside of the container,
// will be poisoned.
d5.push_front(S());
verify_inside(d5);
}
for (size_t i = 0; i < d5.size(); ++i) {
// We change the size to have long string.
// Memory owne by deque should not be poisoned by string.
d5[i].resize(100);
verify_inside(d5);
}

assert(is_double_ended_contiguous_container_asan_correct(d5));

d5.erase(d5.begin() + 2);
verify_inside(d5);

d5.erase(d5.end() - 2);
verify_inside(d5);

assert(is_double_ended_contiguous_container_asan_correct(d5));
}
{
C d6a;
assert(is_double_ended_contiguous_container_asan_correct(d6a));

C d6b(N + 2, get_s<S, 100>('a'));
d6b.push_front(get_s<S, 101>('b'));
while (!d6b.empty()) {
d6b.pop_back();
assert(is_double_ended_contiguous_container_asan_correct(d6b));
}

C d6c(N + 2, get_s<S, 102>('c'));
while (!d6c.empty()) {
d6c.pop_back();
assert(is_double_ended_contiguous_container_asan_correct(d6c));
}
}
{
C d7(9 * N + 2);

d7.insert(d7.begin() + 1, S());
verify_inside(d7);

d7.insert(d7.end() - 3, S());
verify_inside(d7);

d7.insert(d7.begin() + 2 * N, get_s<S, 1>('a'));
verify_inside(d7);

d7.insert(d7.end() - 2 * N, get_s<S, 1>('b'));
verify_inside(d7);

d7.insert(d7.begin() + 2 * N, 3 * N, get_s<S, 1>('c'));
verify_inside(d7);

// It may not be short for big element types, but it will be checked correctly:
d7.insert(d7.end() - 2 * N, 3 * N, get_s<S, 2>('d'));
verify_inside(d7);

d7.erase(d7.begin() + 2);
verify_inside(d7);

d7.erase(d7.end() - 2);
verify_inside(d7);
}
}

template <class S>
void test_container() {
test_string<std::deque<S, std::allocator<S>>, S>();
test_string<std::deque<S, min_allocator<S>>, S>();
test_string<std::deque<S, safe_allocator<S>>, S>();
}

int main(int, char**) {
// Those tests support only types based on std::basic_string.
test_container<std::string>();
test_container<std::wstring>();
#if TEST_STD_VER >= 11
test_container<std::u16string>();
test_container<std::u32string>();
#endif
#if TEST_STD_VER >= 20
test_container<std::u8string>();
#endif

return 0;
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
//===----------------------------------------------------------------------===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//

// REQUIRES: asan
// UNSUPPORTED: c++03

// <string>

// Basic test if ASan annotations work for short strings.

#include <string>
#include <cassert>
#include <cstdlib>

#include "asan_testing.h"
#include "min_allocator.h"
#include "test_iterators.h"
#include "test_macros.h"

extern "C" void __sanitizer_set_death_callback(void (*callback)(void));

void do_exit() { exit(0); }

int main(int, char**) {
{
typedef cpp17_input_iterator<char*> MyInputIter;
// Should not trigger ASan.
std::basic_string<char, std::char_traits<char>, safe_allocator<char>> v;
char i[] = {'a', 'b', 'c', 'd'};

v.insert(v.begin(), MyInputIter(i), MyInputIter(i + 4));
assert(v[0] == 'a');
assert(is_string_asan_correct(v));
}

__sanitizer_set_death_callback(do_exit);
{
using T = char;
using C = std::basic_string<T, std::char_traits<T>, safe_allocator<T>>;
const T t[] = {'a', 'b', 'c', 'd', 'e', 'f', 'g'};
C c(std::begin(t), std::end(t));
assert(is_string_asan_correct(c));
assert(__sanitizer_verify_contiguous_container(c.data(), c.data() + c.size() + 1, c.data() + c.capacity() + 1) !=
0);
volatile T foo = c[c.size() + 1]; // should trigger ASAN. Use volatile to prevent being optimized away.
assert(false); // if we got here, ASAN didn't trigger
((void)foo);
}

return 0;
}
Loading
Loading