Skip to content

I need to do an oob vector load. How? #2

Open
@brson

Description

@brson

As an optimization during a buffer search, I need (very want) to load that buffer into a SIMD vector, even when the buffer doesn't fit into the vector. E.g. I might have a 31-byte buffer that can be efficiently searched with a 32-byte wide AVX2 vector.

From a machine perspective, I don't see this as a problem, as long as the load doesn't extend beyond the current page; from LLVM's perspective this seems like UB.

I'd really like to be able to write this code in Rust and not have to use assembly.

Here's an example of this pattern:

    #[inline(always)]
    unsafe fn do_tail_clever(needle: u8, p: *const u8, len: isize,
                             i: isize, q: __m256i) -> Option<usize> {
        let rem = len - i;
        debug_assert!(rem < 32);

        // Check if the 32-byte load is within the current page
        let page_alignment = 4096;
        let page_mask = !(page_alignment - 1);
        let current_p = p.offset(i) as usize;
        let avx_read_end = current_p + 32;
        let next_page = (current_p & page_mask) + page_alignment;

        if likely(avx_read_end <= next_page) {
            let x = _mm256_loadu_si256(p.offset(i) as *const __m256i);
            let r = _mm256_cmpeq_epi8(x, q);
            let z = _mm256_movemask_epi8(r);
            let garbage_mask = {
                let ones = u32::max_value();
                let mask = ones << rem;
                let mask = !mask;
                mask as i32
            };
            let z = z & garbage_mask;
            if z != 0 {
                return off(i, z);
            }

            return None;
        }

        // Slow path
        do_tail_simple(needle, p, len, i, q)
    }

It loads beyond the array, does vector operations on it, then disregards the oob bytes with a mask.

I'm hopeful that there is some mechanism to tell LLVM to 'forget' what it knows about this pointer, 'fooling' the optimizer into not messing with it.

From the LLVM aliasing rules, there is some language that makes me hopeful:

An integer constant other than zero or a pointer value returned from a function not defined within LLVM may be associated with address ranges allocated through mechanisms other than those provided by LLVM. Such ranges shall not overlap with any ranges of addresses allocated by mechanisms provided by LLVM.

So there is a class of pointers that can operate on arbitrary memory (those that don't come from LLVM). That suggests to me that I could e.g. send my pointer through assembly or some other black-box function to 'clean it', maybe. On the other hand, calling into any function, or even into inline asm imposes extra instructions that more-or-less defeat the optimization (inline asm in LLVM seems to always spill registers). Though that sentence also says "such ranges shall not overlap with any ranges of addresses allocated by mechanisms provided by LLVM"

I'm not sure how much 'wiggle-room' there is. Is a malloc'd array "provided by LLVM"? What are the consequences of disobeying this "shall not"?

Even if there's no in-language solution and it is technically UB, I am hopeful that I can do this thing without LLVM messing with my codegen.

cc @nikomatsakis writing this here per your request.

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-memoryTopic: Related to memory accessesS-pending-designStatus: Resolving this issue requires addressing some open design questions

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions