Avoid returning errors where possible

We were previously unsure whether it was better to return an error and exit the fuzz target early, or to append "fake" data (like all zeros) when an `Arbitrary` implementation requested more data than the fuzzer gave us. We originally chose to exit early, and this commit is the first step in reversing that decision. We didn't have any data to motivate either choice when we originally made the decision to exit early. What we had was an intuition that we shouldn't repeatedly waste time on test cases that are very similar to each other because they are all padded with the same fake data at the end. If we exited early, our thinking went, we would yield our time back to the fuzzer, letting it more efficiently explore the state space. In practice, it has worked out okay, but not great. What we've been missing out on are chance mutations that cause us to explore new code paths. A random mutation made by the fuzzer that would otherwise take us to a new code path happens to have an invalid UTF-8 code point or not quite enough data, and then we return an error and exit early. If we avoided returning errors, then that random mutation would have led us to new code paths. A corollary is that test case reduction is more efficient as well, since the reduced input bytes are also more likely to succeed in generating a valid instance of the `Arbitrary` type. In summary, exiting early yields time back to the fuzzer, giving it more chances to try new mutations, while avoiding early exits is more forgiving to random mutations, making them more likely to succeed at finding new code paths. This is a trade off, and we can't have both in the limit. My local, not-super-rigorous experiments are telling me that we made the wrong trade off originally, and that being forgiving with mutations to find new code paths easier is the better choice. Next breaking release that we make, I think we can remove the fallibility from the `Arbitrary` trait and all the `Unstructured` methods. The only tricky one is `Unstructured::get_bytes`, but I think we can either loosen its contract so that it doesn't always return a slice of length `size`, or remove the method completely in favor of `Unstructured::fill_buffer`.
rust-fuzz · Aug 21, 2020 · 1d2f653 · 1d2f653
1 parent 10d68cf
commit 1d2f653
Show file tree

Hide file tree

Showing 2 changed files with 28 additions and 10 deletions.
diff --git a/src/lib.rs b/src/lib.rs
@@ -1013,10 +1013,22 @@ where
 impl Arbitrary for String {
     fn arbitrary(u: &mut Unstructured<'_>) -> Result<Self> {
         let size = u.arbitrary_len::<u8>()?;
-        let bytes = u.get_bytes(size)?;
-        str::from_utf8(bytes)
-            .map_err(|_| Error::IncorrectFormat)
-            .map(Into::into)
+        match str::from_utf8(&u.data[..size]) {
+            Ok(s) => {
+                u.data = &u.data[size..];
+                Ok(s.into())
+            }
+            Err(e) => {
+                let i = e.valid_up_to();
+                let (valid, rest) = u.data.split_at(i);
+                let s = unsafe {
+                    debug_assert!(str::from_utf8(valid).is_ok());
+                    str::from_utf8_unchecked(valid)
+                };
+                u.data = rest;
+                Ok(s.into())
+            }
+        }
     }
 
     fn arbitrary_take_rest(u: Unstructured<'_>) -> Result<Self> {
@@ -1300,7 +1312,8 @@ mod test {
         assert_eq!(z, [1, 2]);
         rb.fill_buffer(&mut z).unwrap();
         assert_eq!(z, [3, 4]);
-        assert!(rb.fill_buffer(&mut z).is_err());
+        rb.fill_buffer(&mut z).unwrap();
+        assert_eq!(z, [0, 0]);
     }
 
     #[test]

diff --git a/src/unstructured.rs b/src/unstructured.rs
@@ -68,7 +68,7 @@ use std::{mem, ops};
 /// # }
 /// ```
 pub struct Unstructured<'a> {
-    data: &'a [u8],
+    pub(crate) data: &'a [u8],
 }
 
 impl<'a> Unstructured<'a> {
@@ -222,7 +222,7 @@ impl<'a> Unstructured<'a> {
 
     fn arbitrary_byte_size(&mut self) -> Result<usize> {
         if self.data.len() == 0 {
-            Err(Error::NotEnoughData)
+            Ok(0)
         } else if self.data.len() == 1 {
             self.data = &[];
             Ok(0)
@@ -385,11 +385,16 @@ impl<'a> Unstructured<'a> {
     /// let mut buf = [0; 2];
     /// assert!(u.fill_buffer(&mut buf).is_ok());
     /// assert!(u.fill_buffer(&mut buf).is_ok());
-    /// assert!(u.fill_buffer(&mut buf).is_err());
     /// ```
     pub fn fill_buffer(&mut self, buffer: &mut [u8]) -> Result<()> {
-        let bytes = self.get_bytes(buffer.len())?;
-        buffer.copy_from_slice(bytes);
+        let n = std::cmp::min(buffer.len(), self.data.len());
+        for i in 0..n {
+            buffer[i] = self.data[i];
+        }
+        for i in self.data.len()..buffer.len() {
+            buffer[i] = 0;
+        }
+        self.data = &self.data[n..];
         Ok(())
     }