Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[compiler-rt][ASan] Add function copying annotations #91702

Merged

Conversation

AdvenamTacet
Copy link
Member

@AdvenamTacet AdvenamTacet commented May 10, 2024

This PR adds a __sanitizer_copy_contiguous_container_annotations function, which copies annotations from one memory area to another. New area is annotated in the same way as the old region at the beginning (within limitations of ASan).

Overlapping case: The function supports overlapping containers, however no assumptions should be made outside of no false positives in new buffer area. (It doesn't modify old container annotations where it's not necessary, false negatives may happen in edge granules of the new container area.) I don't expect this function to be used with overlapping buffers, but it's designed to work with them and not result in incorrect ASan errors (false positives).

If buffers have granularity-aligned distance between them (old_beg % granularity == new_beg % granularity), copying algorithm works faster. If the distance is not granularity-aligned, annotations are copied byte after byte.

void __sanitizer_copy_contiguous_container_annotations(
    const void *old_storage_beg_p, const void *old_storage_end_p,
    const void *new_storage_beg_p, const void *new_storage_end_p) {

This function aims to help with short string annotations and similar container annotations. Right now we change trait types of std::basic_string when compiling with ASan and this function purpose is reverting that change as soon as possible.

#if !defined(_LIBCPP_HAS_NO_ASAN) && defined(_LIBCPP_INSTRUMENTED_WITH_ASAN)
// When compiling with AddressSanitizer (ASan), basic_string cannot be trivially
// relocatable. Because the object's memory might be poisoned when its content
// is kept inside objects memory (short string optimization), instead of in allocated
// external memory. In such cases, the destructor is responsible for unpoisoning
// the memory to avoid triggering false positives.
// Therefore it's crucial to ensure the destructor is called.
using __trivially_relocatable = void;
#else
using __trivially_relocatable = __conditional_t<
__libcpp_is_trivially_relocatable<allocator_type>::value && __libcpp_is_trivially_relocatable<pointer>::value,
basic_string,
void>;
#endif

The goal is to not change __trivially_relocatable when compiling with ASan. If this function is accepted and upstreamed, the next step is creating a function like __memcpy_with_asan moving memory with ASan. And then using this function instead of __builtin__memcpy while moving trivially relocatable objects.

} else {
__builtin_memcpy(const_cast<__remove_const_t<_Tp>*>(__result), __first, sizeof(_Tp) * (__last - __first));
}


I'm thinking if there is a good way to address fact that in a container the new buffer is usually bigger than the previous one. We may add two more arguments to the functions to address it (the beginning and the end of the whole buffer.

Another potential change is removing new_storage_end_p as it's redundant, because we require the same size.

Potential future work is creating a function __asan_unsafe_memmove, which will be basically memmove, but with turned off instrumentation (therefore it will allow copy data from poisoned area).

Copy link

github-actions bot commented May 10, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@AdvenamTacet AdvenamTacet force-pushed the short-string-annotations-cleaning-traits branch 2 times, most recently from cab5bc6 to 8ff3017 Compare May 10, 2024 07:37
@philnik777
Copy link
Contributor

trivial relocation is also interesting for erasing object from a vector, so the builtins should probably be more akin to a memmove than a memcpy.

@AdvenamTacet AdvenamTacet force-pushed the short-string-annotations-cleaning-traits branch from 8ff3017 to 7d3bc1a Compare May 17, 2024 04:56
@vitalybuka
Copy link
Collaborator

LGTM for Asan however HWASAN story is not clear.
Let's make sure we can use this for HWASAN as well.

It's probably fine, but I'd like to think about this more.

@vitalybuka vitalybuka requested a review from fmayer May 20, 2024 16:26
@AdvenamTacet
Copy link
Member Author

AdvenamTacet commented May 22, 2024

I still have to fix failing CI, I can sit to it on Monday and, if necessary, spend next week enough of time to fix it (Edit: testing it actually takes prohibitive amount of time, but I'm committed to make this PR work). I expect the problem to be a small implementation bug. Let me know if you have any high level requests for that code, I'm happy to implement them.
I will also create a separate PR cleaning compiler-rt ASan code, as I noticed a little bit of outdated code while writing this one.

trivial relocation is also interesting for erasing object from a vector, so the builtins should probably be more akin to a memmove than a memcpy.

Should we change __builtin_memcpy to __builtin_memmove? Then I would have to add support for overlapping areas to this PR. (It complicates the implementation a little bit.)

It's probably fine, but I'd like to think about this more.

Sure, let me know what are your conclusions!

@philnik777
Copy link
Contributor

trivial relocation is also interesting for erasing object from a vector, so the builtins should probably be more akin to a memmove than a memcpy.

Should we change __builtin_memcpy to __builtin_memmove? Then I would have to add support for overlapping areas to this PR. (It complicates the implementation a little bit.)

Yes. My plan is to change the __builtin_memcpy to __builtin_memmove in __uninitialized_allocator_relocate to implement more optimizations in vector.

@AdvenamTacet
Copy link
Member Author

Alright, I will update that PR accordingly in the coming week.

Copy link
Collaborator

@vitalybuka vitalybuka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume work in progress.

@AdvenamTacet AdvenamTacet force-pushed the short-string-annotations-cleaning-traits branch 2 times, most recently from 7725910 to 0ab9216 Compare August 26, 2024 09:31
@AdvenamTacet AdvenamTacet force-pushed the short-string-annotations-cleaning-traits branch 2 times, most recently from 82a7c7e to 93ab376 Compare August 28, 2024 09:23
@AdvenamTacet AdvenamTacet marked this pull request as ready for review August 28, 2024 09:58
Advenam Tacet and others added 25 commits October 8, 2024 12:17
This PR adds a `__sanitizer_move_contiguous_container_annotations` function, which moves annotations from one memory area to another.
Old area is unpoisoned at the end. New area is annotated in the same way as the old region at the beginning (within limitations of ASan).

```cpp
void __sanitizer_move_contiguous_container_annotations(
    const void *old_storage_beg_p, const void *old_storage_end_p,
    const void *new_storage_beg_p, const void *new_storage_end_p) {
```

This function aims to help with short string annotations and similar container annotations.
Right now we change trait types of `std::basic_string` when compiling with ASan.

https://github.com/llvm/llvm-project/blob/87f3407856e61a73798af4e41b28bc33b5bf4ce6/libcxx/include/string#L738-L751

The goal is to not change `__trivially_relocatable` when compiling with ASan.
If this function is accpeted and upstreamed, the next step is creating function like `__memcpy_with_asan` moving memory with ASan.
And then using this function instead of `__builtin__memcpy` while moving trivially relocatable objects.

NOTICE: I did not test it yet, so it's probably not compiling, but I don't expect big changes. PR is WIP until I test it.

---

I'm thinking if there is a good way to address fact that in a container the new buffer is usually bigger than the previous one.
We may add two more arguments to the functions to address it (the beginning and the end of the whole buffer.

Another potential change is removing `new_storage_end_p` as it's redundant, because we require the same size.
Implemented correct container annotations copying function (non-overlapping containers only)

Refactored the initial PoC to a fully functional implementation for copying container annotations without moving/copying the memory content itself. This function currently supports cases where the source and destination containers do not overlap. Handling overlapping containers would add complexity.

The function is designed to work irrespective of whether the buffers are granule-aligned or the distance between them is granule-aligned. However, such scenarios may have an impact on performance.

A Test case has been included to verify the correctness of the implementation.

Removed unpoisoning oryginal buffer at the end, users can do it by themselves if they want to.
Added helper functions, removed unnecessary branches, simplified code.
Adds support for overlapping containers.
Adds test case for that.
Update comments, simplify flow, improve readability.
- VPrintf level changed to 3.
- unique_ptr used in the test example.
For safety.
Fix code formatting.
@AdvenamTacet AdvenamTacet force-pushed the short-string-annotations-cleaning-traits branch from 690e3a0 to b31757b Compare October 8, 2024 10:25
@AdvenamTacet AdvenamTacet merged commit c76045d into llvm:main Oct 15, 2024
8 checks passed
bricknerb pushed a commit to bricknerb/llvm-project that referenced this pull request Oct 16, 2024
This PR adds a `__sanitizer_copy_contiguous_container_annotations`
function, which copies annotations from one memory area to another. New
area is annotated in the same way as the old region at the beginning
(within limitations of ASan).

Overlapping case: The function supports overlapping containers, however
no assumptions should be made outside of no false positives in new
buffer area. (It doesn't modify old container annotations where it's not
necessary, false negatives may happen in edge granules of the new
container area.) I don't expect this function to be used with
overlapping buffers, but it's designed to work with them and not result
in incorrect ASan errors (false positives).

If buffers have granularity-aligned distance between them (`old_beg %
granularity == new_beg % granularity`), copying algorithm works faster.
If the distance is not granularity-aligned, annotations are copied byte
after byte.

```cpp
void __sanitizer_copy_contiguous_container_annotations(
    const void *old_storage_beg_p, const void *old_storage_end_p,
    const void *new_storage_beg_p, const void *new_storage_end_p) {
```

This function aims to help with short string annotations and similar
container annotations. Right now we change trait types of
`std::basic_string` when compiling with ASan and this function purpose
is reverting that change as soon as possible.


https://github.com/llvm/llvm-project/blob/87f3407856e61a73798af4e41b28bc33b5bf4ce6/libcxx/include/string#L738-L751

The goal is to not change `__trivially_relocatable` when compiling with
ASan. If this function is accepted and upstreamed, the next step is
creating a function like `__memcpy_with_asan` moving memory with ASan.
And then using this function instead of `__builtin__memcpy` while moving
trivially relocatable objects.


https://github.com/llvm/llvm-project/blob/11a6799740f824282650aa9ec249b55dcf1a8aae/libcxx/include/__memory/uninitialized_algorithms.h#L644-L646

---

I'm thinking if there is a good way to address fact that in a container
the new buffer is usually bigger than the previous one. We may add two
more arguments to the functions to address it (the beginning and the end
of the whole buffer.

Another potential change is removing `new_storage_end_p` as it's
redundant, because we require the same size.

Potential future work is creating a function `__asan_unsafe_memmove`,
which will be basically memmove, but with turned off instrumentation
(therefore it will allow copy data from poisoned area).

---------

Co-authored-by: Vitaly Buka <vitalybuka@google.com>
DanielCChen pushed a commit to DanielCChen/llvm-project that referenced this pull request Oct 16, 2024
This PR adds a `__sanitizer_copy_contiguous_container_annotations`
function, which copies annotations from one memory area to another. New
area is annotated in the same way as the old region at the beginning
(within limitations of ASan).

Overlapping case: The function supports overlapping containers, however
no assumptions should be made outside of no false positives in new
buffer area. (It doesn't modify old container annotations where it's not
necessary, false negatives may happen in edge granules of the new
container area.) I don't expect this function to be used with
overlapping buffers, but it's designed to work with them and not result
in incorrect ASan errors (false positives).

If buffers have granularity-aligned distance between them (`old_beg %
granularity == new_beg % granularity`), copying algorithm works faster.
If the distance is not granularity-aligned, annotations are copied byte
after byte.

```cpp
void __sanitizer_copy_contiguous_container_annotations(
    const void *old_storage_beg_p, const void *old_storage_end_p,
    const void *new_storage_beg_p, const void *new_storage_end_p) {
```

This function aims to help with short string annotations and similar
container annotations. Right now we change trait types of
`std::basic_string` when compiling with ASan and this function purpose
is reverting that change as soon as possible.


https://github.com/llvm/llvm-project/blob/87f3407856e61a73798af4e41b28bc33b5bf4ce6/libcxx/include/string#L738-L751

The goal is to not change `__trivially_relocatable` when compiling with
ASan. If this function is accepted and upstreamed, the next step is
creating a function like `__memcpy_with_asan` moving memory with ASan.
And then using this function instead of `__builtin__memcpy` while moving
trivially relocatable objects.


https://github.com/llvm/llvm-project/blob/11a6799740f824282650aa9ec249b55dcf1a8aae/libcxx/include/__memory/uninitialized_algorithms.h#L644-L646

---

I'm thinking if there is a good way to address fact that in a container
the new buffer is usually bigger than the previous one. We may add two
more arguments to the functions to address it (the beginning and the end
of the whole buffer.

Another potential change is removing `new_storage_end_p` as it's
redundant, because we require the same size.

Potential future work is creating a function `__asan_unsafe_memmove`,
which will be basically memmove, but with turned off instrumentation
(therefore it will allow copy data from poisoned area).

---------

Co-authored-by: Vitaly Buka <vitalybuka@google.com>
bricknerb pushed a commit to bricknerb/llvm-project that referenced this pull request Oct 17, 2024
This PR adds a `__sanitizer_copy_contiguous_container_annotations`
function, which copies annotations from one memory area to another. New
area is annotated in the same way as the old region at the beginning
(within limitations of ASan).

Overlapping case: The function supports overlapping containers, however
no assumptions should be made outside of no false positives in new
buffer area. (It doesn't modify old container annotations where it's not
necessary, false negatives may happen in edge granules of the new
container area.) I don't expect this function to be used with
overlapping buffers, but it's designed to work with them and not result
in incorrect ASan errors (false positives).

If buffers have granularity-aligned distance between them (`old_beg %
granularity == new_beg % granularity`), copying algorithm works faster.
If the distance is not granularity-aligned, annotations are copied byte
after byte.

```cpp
void __sanitizer_copy_contiguous_container_annotations(
    const void *old_storage_beg_p, const void *old_storage_end_p,
    const void *new_storage_beg_p, const void *new_storage_end_p) {
```

This function aims to help with short string annotations and similar
container annotations. Right now we change trait types of
`std::basic_string` when compiling with ASan and this function purpose
is reverting that change as soon as possible.


https://github.com/llvm/llvm-project/blob/87f3407856e61a73798af4e41b28bc33b5bf4ce6/libcxx/include/string#L738-L751

The goal is to not change `__trivially_relocatable` when compiling with
ASan. If this function is accepted and upstreamed, the next step is
creating a function like `__memcpy_with_asan` moving memory with ASan.
And then using this function instead of `__builtin__memcpy` while moving
trivially relocatable objects.


https://github.com/llvm/llvm-project/blob/11a6799740f824282650aa9ec249b55dcf1a8aae/libcxx/include/__memory/uninitialized_algorithms.h#L644-L646

---

I'm thinking if there is a good way to address fact that in a container
the new buffer is usually bigger than the previous one. We may add two
more arguments to the functions to address it (the beginning and the end
of the whole buffer.

Another potential change is removing `new_storage_end_p` as it's
redundant, because we require the same size.

Potential future work is creating a function `__asan_unsafe_memmove`,
which will be basically memmove, but with turned off instrumentation
(therefore it will allow copy data from poisoned area).

---------

Co-authored-by: Vitaly Buka <vitalybuka@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants