sqlite: Rewrite custom recovery code.

Our largest SQLite patch, 0001-Virtual-table-supporting-recovery-of-corrupted-datab.patch, contains the implementation of a virtual table extension that contains custom recovery logic. The patch is written in C, and roughly follows SQLite's coding style. This CL reimplements the recover functionality in //sql/recover_module. The new implementation is based on the high-level description of the old one, and does not match its structure. The new implementation passes all the tests that shipped with the patch. Bug: 945204 Change-Id: I04df1edfeb48f907303d9525b1c81486bdc62f75 Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/1546942 Commit-Queue: Victor Costan <pwnall@chromium.org> Reviewed-by: Staphany Park <staphany@chromium.org> Reviewed-by: Chris Mumford <cmumford@google.com> Cr-Commit-Position: refs/heads/master@{#659641}
qweri0p · May 14, 2019 · 153dd1a · 153dd1a
1 parent dfb75af
commit 153dd1a
Show file tree

Hide file tree

Showing 23 changed files with 3,101 additions and 18 deletions.
diff --git a/sql/BUILD.gn b/sql/BUILD.gn
@@ -18,6 +18,24 @@ component("sql") {
     "internal_api_token.h",
     "meta_table.cc",
     "meta_table.h",
+    "recover_module/btree.cc",
+    "recover_module/btree.h",
+    "recover_module/cursor.cc",
+    "recover_module/cursor.h",
+    "recover_module/integers.cc",
+    "recover_module/integers.h",
+    "recover_module/module.cc",
+    "recover_module/module.h",
+    "recover_module/pager.cc",
+    "recover_module/pager.h",
+    "recover_module/parsing.cc",
+    "recover_module/parsing.h",
+    "recover_module/payload.cc",
+    "recover_module/payload.h",
+    "recover_module/record.cc",
+    "recover_module/record.h",
+    "recover_module/table.cc",
+    "recover_module/table.h",
     "recovery.cc",
     "recovery.h",
     "sql_features.cc",
@@ -94,7 +112,7 @@ test("sql_unittests") {
   sources = [
     "database_unittest.cc",
     "meta_table_unittest.cc",
-    "recover_module_unittest.cc",
+    "recover_module/module_unittest.cc",
     "recovery_unittest.cc",
     "sql_memory_dump_provider_unittest.cc",
     "sqlite_features_unittest.cc",

diff --git a/sql/recover_module/README.md b/sql/recover_module/README.md
@@ -0,0 +1,94 @@
+# SQLite Data Recovery
+
+This directory implements data recovery heuristics for SQLite databases whose
+files were corrupted on disk. The recovery code walks through the B-tree
+holding a SQLite table and recovers all records that seem healthy. Even if
+recovery succeeds, a recovered table may be missing records, and existing
+records may have corrupted values inside them. Any constraints imposed by Chrome
+or by SQLite may be broken.
+
+
+## Usage
+
+The default approach for handling corruption in SQLite databases is to
+immediately stop using the database, delete it, and start over with a new
+database. The recovery method implemented here is intended for databases used by
+Chrome features that handle high-value user data, such as History and Bookmarks.
+These features carefully handle data inconsistency edge cases, and their
+database schemas are resilient to partial data loss.
+
+The code is plugged into the rest of Chrome via
+[SQLite's virtual table module API](https://sqlite.org/vtab.html). The example
+below covers a typical recovery scenario.
+
+```sql
+-- Feature table schema.
+CREATE TABLE data(name TEXT PRIMARY KEY, value TEXT NOT NULL);
+
+-- Recover in another database. The corrupted one is unreliable.
+ATTACH DATABASE '/tmp/db.db' as recovery;
+-- Re-create the feature table's schema.
+CREATE TABLE recovery.feature(name TEXT PRIMARY KEY, value TEXT NOT NULL);
+-- Start reading the corrupted data.
+CREATE VIRTUAL TABLE temp.recover_feature USING recover(
+    main.feature, -- The corrupted database.
+    -- Recovery will skip row values that don't have the TEXT type.
+    name TEXT STRICT NOT NULL,
+    -- Recovery will include any row value coercible to TEXT.
+    value TEXT NOT NULL);
+-- Data recovered from corrupted databases may not meet schema constraints, so
+-- recovery insertions must use "OR REPLACE"  or "OR IGNORE".
+INSERT OR REPLACE INTO recovery.feature(rowid, name, value)
+SELECT rowid, name, value FROM temp.recover_feature;
+-- Cleanup after the recovery operation.
+DROP TABLE temp.recover_feature;
+DETACH DATABASE recovery;
+-- Replace the corrupted database file with the recovered one.
+```
+
+The feature invoking the recovery virtual table must know the schema of the
+database being recovered. A generic high-level recovery layer should first
+recover
+[the `sqlite_master` table](https://www.sqlite.org/fileformat.html#storage_of_the_sql_database_schema),
+which has a well known format, then use its contents to recover the schema of
+any other table. This recovery module already relies on the integrity of the
+`sqlite_master` table.
+
+The column definitions in the virtual table creation statement must follow
+the syntax _column\_name_ _type\_name_ [`STRICT`] [`NOT NULL`]. _type\_name_ is
+[the SQLite data types](https://www.sqlite.org/datatype3.html#storage_classes_and_datatypes),
+or the special types `ANY` or `ROWID`.
+
+The `ANY` type can be used to recover values of all types.
+
+The `ROWID` type must be used for columns that alias
+[rowid](https://www.sqlite.org/lang_createtable.html#rowid). This typically only
+happens when a column is declared as `INTEGER PRIMARY KEY`.
+Designating `ROWID` columns is essential for decoding records correctly.
+
+TODO(pwnall): Look into removing `STRICT`, if it's not used.
+
+
+## Limitations
+
+The current implementation only handles [table
+B-trees](https://www.sqlite.org/fileformat.html#b_tree_pages). It cannot
+recover [WITHOUT ROWID](https://www.sqlite.org/rowidtable.html) tables, which
+are stored in index B-trees.
+
+
+## Code Map
+
+The code is structured as follows.
+
+* integers.{cc,h} decodes the integer formats used by SQLite.
+* btree.{cc,h} decodes the cells in SQLite's B-tree pages.
+* payload.{cc,h} reads record payloads from B-tree pages and overflow pages.
+* record.{cc,h} decodes column values from record.
+* cursor.{cc,h} implements a SQLite virtual table cursor.
+* table.{cc,h} implements one recovery virtual table.
+* parsing.{cc,h} parses the SQL strings passed in via `CREATE VIRTUAL TABLE`
+  and implements the constraints explained above.
+* module.{cc,h} implements the SQLite virtual table interface.
+
+The feature is tested by integration tests that issue SQLite queries.
diff --git a/sql/recover_module/btree.cc b/sql/recover_module/btree.cc
@@ -0,0 +1,252 @@
+// Copyright 2019 The Chromium Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style license that can be
+// found in the LICENSE file.
+
+#include "sql/recover_module/btree.h"
+
+#include <algorithm>
+#include <limits>
+#include <type_traits>
+
+#include "base/logging.h"
+#include "sql/recover_module/integers.h"
+#include "sql/recover_module/pager.h"
+#include "third_party/sqlite/sqlite3.h"
+
+namespace sql {
+namespace recover {
+
+namespace {
+
+// The SQLite database format is documented at the following URLs.
+//   https://www.sqlite.org/fileformat.html
+//   https://www.sqlite.org/fileformat2.html
+constexpr uint8_t kInnerTablePageType = 0x05;
+constexpr uint8_t kLeafTablePageType = 0x0D;
+
+// Offset from the page header to the page type byte.
+constexpr int kPageTypePageOffset = 0;
+// Offset from the page header to the 2-byte cell count.
+constexpr int kCellCountPageOffset = 3;
+// Offset from an inner page header to the 4-byte last child page ID.
+constexpr int kLastChildIdInnerPageOffset = 8;
+// Offset from an inner page header to the cell pointer array.
+constexpr int kFirstCellOfsetInnerPageOffset = 12;
+// Offset from a leaf page header to the cell pointer array.
+constexpr int kFirstCellOfsetLeafPageOffset = 8;
+
+}  // namespace
+
+#if !DCHECK_IS_ON()
+// In DCHECKed builds, the decoder contains a sequence checker, which has a
+// non-trivial destructor.
+static_assert(std::is_trivially_destructible<InnerPageDecoder>::value,
+              "Move the destructor to the .cc file if it's non-trival");
+#endif  // !DCHECK_IS_ON()
+
+InnerPageDecoder::InnerPageDecoder(DatabasePageReader* db_reader)
+    : page_id_(db_reader->page_id()),
+      db_reader_(db_reader),
+      cell_count_(ComputeCellCount(db_reader)),
+      next_read_index_(0) {
+  DCHECK(IsOnValidPage(db_reader));
+  DCHECK(DatabasePageReader::IsValidPageId(page_id_));
+}
+
+int InnerPageDecoder::TryAdvance() {
+  DCHECK_CALLED_ON_VALID_SEQUENCE(sequence_checker_);
+  DCHECK(CanAdvance());
+
+  const int sqlite_status = db_reader_->ReadPage(page_id_);
+  if (sqlite_status != SQLITE_OK) {
+    // TODO(pwnall): UMA the error code.
+
+    next_read_index_ = cell_count_ + 1;  // End the reading process.
+    return DatabasePageReader::kInvalidPageId;
+  }
+
+  const uint8_t* const page_data = db_reader_->page_data();
+  const int read_index = next_read_index_;
+  next_read_index_ += 1;
+  if (read_index == cell_count_)
+    return LoadBigEndianInt32(page_data + kLastChildIdInnerPageOffset);
+
+  const int cell_pointer_offset =
+      kFirstCellOfsetInnerPageOffset + (read_index << 1);
+  DCHECK_LE(cell_pointer_offset + 2, db_reader_->page_size())
+      << "ComputeCellCount() used an incorrect upper bound";
+  const int cell_pointer = LoadBigEndianUint16(page_data + cell_pointer_offset);
+
+  static_assert(std::numeric_limits<uint16_t>::max() + 4 <
+                    std::numeric_limits<int>::max(),
+                "The addition below may overflow");
+  if (cell_pointer + 4 >= db_reader_->page_size()) {
+    // Each cell needs 1 byte for the rowid varint, in addition to the 4 bytes
+    // for the child page number that will be read below. Skip cells that
+    // obviously go over the page end.
+    return DatabasePageReader::kInvalidPageId;
+  }
+  if (cell_pointer < kFirstCellOfsetInnerPageOffset) {
+    // The pointer points into the cell's header.
+    return DatabasePageReader::kInvalidPageId;
+  }
+
+  return LoadBigEndianInt32(page_data + cell_pointer);
+}
+
+// static
+bool InnerPageDecoder::IsOnValidPage(DatabasePageReader* db_reader) {
+  static_assert(kPageTypePageOffset < DatabasePageReader::kMinUsablePageSize,
+                "The check below may perform an out-of-bounds memory access");
+  return db_reader->page_data()[kPageTypePageOffset] == kInnerTablePageType;
+}
+
+// static
+int InnerPageDecoder::ComputeCellCount(DatabasePageReader* db_reader) {
+  // The B-tree page header stores the cell count.
+  int header_count =
+      LoadBigEndianUint16(db_reader->page_data() + kCellCountPageOffset);
+  static_assert(
+      kCellCountPageOffset + 2 <= DatabasePageReader::kMinUsablePageSize,
+      "The read above may be out of bounds");
+
+  // However, the data may be corrupted. So, use an upper bound based on the
+  // fact that the cell pointer array should never extend past the end of the
+  // page.
+  //
+  // The page size is always even, because it is either a power of two, for
+  // most pages, or a power of two minus 100, for the first database page. The
+  // cell pointer array starts at offset 12. So, each cell pointer must be
+  // separated from the page buffer's end by an even number of bytes.
+  DCHECK((db_reader->page_size() - kFirstCellOfsetInnerPageOffset) % 2 == 0);
+  int upper_bound =
+      (db_reader->page_size() - kFirstCellOfsetInnerPageOffset) >> 1;
+  static_assert(
+      kFirstCellOfsetInnerPageOffset <= DatabasePageReader::kMinUsablePageSize,
+      "The |upper_bound| computation above may overflow");
+
+  return std::min(header_count, upper_bound);
+}
+
+#if !DCHECK_IS_ON()
+// In DCHECKed builds, the decoder contains a sequence checker, which has a
+// non-trivial destructor.
+static_assert(std::is_trivially_destructible<LeafPageDecoder>::value,
+              "Move the destructor to the .cc file if it's non-trival");
+#endif  // !DCHECK_IS_ON()
+
+LeafPageDecoder::LeafPageDecoder(DatabasePageReader* db_reader)
+    : page_id_(db_reader->page_id()),
+      db_reader_(db_reader),
+      cell_count_(ComputeCellCount(db_reader)),
+      next_read_index_(0),
+      last_record_size_(0) {
+  DCHECK(IsOnValidPage(db_reader));
+  DCHECK(DatabasePageReader::IsValidPageId(page_id_));
+}
+
+bool LeafPageDecoder::TryAdvance() {
+  DCHECK_CALLED_ON_VALID_SEQUENCE(sequence_checker_);
+  DCHECK(CanAdvance());
+
+#if DCHECK_IS_ON()
+  // DCHECKs use last_record_size == 0 to check for incorrect access to the
+  // decoder's state.
+  last_record_size_ = 0;
+#endif  // DCHECK_IS_ON()
+
+  const int sqlite_status = db_reader_->ReadPage(page_id_);
+  if (sqlite_status != SQLITE_OK) {
+    // TODO(pwnall): UMA the error code.
+
+    next_read_index_ = cell_count_;  // End the reading process.
+    return false;
+  }
+
+  const uint8_t* page_data = db_reader_->page_data();
+  const int read_index = next_read_index_;
+  next_read_index_ += 1;
+
+  const int cell_pointer_offset =
+      kFirstCellOfsetLeafPageOffset + (read_index << 1);
+  DCHECK_LE(cell_pointer_offset + 2, db_reader_->page_size())
+      << "ComputeCellCount() used an incorrect upper bound";
+  const int cell_pointer = LoadBigEndianUint16(page_data + cell_pointer_offset);
+
+  static_assert(std::numeric_limits<uint16_t>::max() + 3 <
+                    std::numeric_limits<int>::max(),
+                "The addition below may overflow");
+  if (cell_pointer + 3 >= db_reader_->page_size()) {
+    // Each cell needs at least 1 byte for page type varint, 1 byte for the
+    // rowid varint, and 1 byte for the record header size varint. Skip cells
+    // that obviously go over the page end.
+    return false;
+  }
+  if (cell_pointer < kFirstCellOfsetLeafPageOffset) {
+    // The pointer points into the cell's header.
+    return false;
+  }
+
+  const uint8_t* const cell_start = page_data + cell_pointer;
+  const uint8_t* const page_end = page_data + db_reader_->page_size();
+  DCHECK_LT(cell_start, page_end) << "Failed to skip over empty cells";
+
+  const uint8_t* rowid_start;
+  std::tie(last_record_size_, rowid_start) = ParseVarint(cell_start, page_end);
+  if (rowid_start == page_end) {
+    // The value size varint extended to the end of the page, so the rowid
+    // varint starts past the page end.
+    return false;
+  }
+  if (last_record_size_ <= 0) {
+    // Each payload needs at least one varint. Skip empty payloads.
+#if DCHECK_IS_ON()
+    // DCHECKs use last_record_size == 0 to check for incorrect access to the
+    // decoder's state.
+    last_record_size_ = 0;
+#endif  // DCHECK_IS_ON()
+    return false;
+  }
+
+  const uint8_t* record_start;
+  std::tie(last_record_rowid_, record_start) =
+      ParseVarint(rowid_start, page_end);
+  if (record_start == page_end) {
+    // The rowid varint extended to the end of the page, so the record starts
+    // past the page end. Records need at least 1 byte for their header size
+    // varint, so this suggests corruption.
+    last_record_size_ = 0;
+    return false;
+  }
+
+  last_record_offset_ = record_start - page_data;
+  return true;
+}
+
+// static
+bool LeafPageDecoder::IsOnValidPage(DatabasePageReader* db_reader) {
+  static_assert(kPageTypePageOffset < DatabasePageReader::kMinUsablePageSize,
+                "The check below may perform an out-of-bounds memory access");
+  return db_reader->page_data()[kPageTypePageOffset] == kLeafTablePageType;
+}
+
+// static
+int LeafPageDecoder::ComputeCellCount(DatabasePageReader* db_reader) {
+  // See InnerPageDecoder::ComputeCellCount() for the reasoning behind the code.
+  int header_count =
+      LoadBigEndianUint16(db_reader->page_data() + kCellCountPageOffset);
+  static_assert(
+      kCellCountPageOffset + 2 <= DatabasePageReader::kMinUsablePageSize,
+      "The read above may be out of bounds");
+
+  int upper_bound =
+      (db_reader->page_size() - kFirstCellOfsetLeafPageOffset) >> 1;
+  static_assert(
+      kFirstCellOfsetLeafPageOffset <= DatabasePageReader::kMinUsablePageSize,
+      "The |upper_bound| computation above may overflow");
+
+  return std::min(header_count, upper_bound);
+}
+
+}  // namespace recover
+}  // namespace sql