Non-Cascaded `#[inline]` Propagation

When looking at the code I noticed that there was a break in `#[inline]` propagation on functions. Given that `#[inline]` is non-transitive, cross crate inline optimization can be very easily disrupted if the chain breaks at the wrong depth.

When running tests with this setup:
```rust
use quick_xml::events::Event;
use quick_xml::reader::Reader;

static XML: &str = include_str!(r"XML");

fn main() {
    quick_xml();
}

fn quick_xml() {
    let mut reader = Reader::from_str(XML);
    reader.trim_text(true);

    let mut buf = Vec::with_capacity(1024 * 8);

    loop {
        match reader.read_event_into(&mut buf) {
            Err(e) => panic!("Error at position {}: {:?}", reader.buffer_position(), e),
            Ok(Event::Eof) => break,
            Ok(Event::Start(_)) => {}
            Ok(Event::Text(_)) => {}
            _ => (),
        }
        buf.clear();
    }
}

```

I noticed that although this function has `#[inline]`:
```rust
    #[inline]
    pub fn read_event_into<'b>(&mut self, buf: &'b mut Vec<u8>) -> Result<Event<'b>> {
        self.read_event_impl(buf)
    }

```

The next funtion its calling does not:
```rust
fn read_event_impl<'i, B>(&mut self, mut buf: B) -> Result<Event<'i>>
    where
        R: XmlSource<'i, B>,
    {
        read_event_impl!(self, buf, self.reader, read_until_open, read_until_close)
    }

```

In trying to see the effects of the break in the `#[inline]` propgation, I ran a few tests. I went through and at first added `#[inline]` to the functions so that there is an end-to-end `#[inline]` chain, and then later went through and changed it to be `#[inline(always)]` to see what the differences would be.

I ran `lto=true` as well to see if there is any benefit to `#[inline]` or if it just ultimately matches what a manual `#[inline]` decoration brings, looking to see if rather than adding the attribute, just give a heads up in the docs that lto would benefit this crate to some degree. 

The xml file was a 1.3GB file from the worksheet of this [dataset](https://raw.githubusercontent.com/wiki/jqnatividad/qsv/files/NYC_311_SR_2010-2020-sample-1M.7z).

## Benchmarks

```text
quick_xml: master branch
  Time (mean ± σ):      5.941 s ±  0.018 s    [User: 5.614 s, System: 0.350 s]
  Range (min … max):    5.915 s …  5.976 s    10 runs

quick_xml: master branch + lto
  Time (mean ± σ):      4.623 s ±  0.049 s    [User: 4.231 s, System: 0.395 s]
  Range (min … max):    4.579 s …  4.719 s    10 runs

// I might have missed coverage for this one, but the main functions were gotten for sure
quick_xml: inline
  Time (mean ± σ):      5.236 s ±  0.240 s    [User: 4.877 s, System: 0.353 s]
  Range (min … max):    5.123 s …  5.916 s    10 runs
  
quick_xml: inline(always)
  Time (mean ± σ):      4.232 s ±  0.022 s    [User: 3.843 s, System: 0.385 s]
  Range (min … max):    4.201 s …  4.255 s    10 runs

quick_xml: inline(always) + lto
  Time (mean ± σ):      4.148 s ±  0.017 s    [User: 3.744 s, System: 0.393 s]
  Range (min … max):    4.116 s …  4.162 s    10 runs
```
Percentage over `master`:
![image](https://github.com/tafia/quick-xml/assets/12489689/25839367-b695-4fb5-91a7-31d45f42db39)

Seeing that the changes even benefited `lto`, I think this is a worth while add.

## Profile

Current `master`:
![image](https://github.com/tafia/quick-xml/assets/12489689/a6dd0a80-9b9b-4b94-bf53-3d487d1e2dea)

`#[inline]`:
![image](https://github.com/tafia/quick-xml/assets/12489689/9fc05fd7-a509-4822-a21e-951f4fee75d2)

`#[inline(always)]`:
![image](https://github.com/tafia/quick-xml/assets/12489689/68143d6a-a4b3-453c-8aca-0f0899a11a83)

`#[inline(always)]` + `lto`:
![image](https://github.com/tafia/quick-xml/assets/12489689/0cd8369d-0be6-4161-b762-7040ea997244)

One thing I did notice was a `memcpy` instruction that was 25% of the samples:
![image](https://github.com/tafia/quick-xml/assets/12489689/2cc338d8-fab9-4f5e-a001-3ef4a0216d68)

I'm not sure if this is part of the program or if its just an artifact of the profiling.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Non-Cascaded `#[inline]` Propagation #678

Benchmarks

Profile

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Non-Cascaded #[inline] Propagation #678

Description

Benchmarks

Profile

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Non-Cascaded `#[inline]` Propagation #678