Skip to content

Commit

Permalink
sqlite: Rewrite custom recovery code.
Browse files Browse the repository at this point in the history
Our largest SQLite patch,
0001-Virtual-table-supporting-recovery-of-corrupted-datab.patch,
contains the implementation of a virtual table extension that contains
custom recovery logic.  The patch is written in C, and roughly follows
SQLite's coding style.

This CL reimplements the recover functionality in //sql/recover_module.
The new implementation is based on the high-level description of the old
one, and does not match its structure. The new implementation passes all
the tests that shipped with the patch.


Bug: 945204
Change-Id: I04df1edfeb48f907303d9525b1c81486bdc62f75
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/1546942
Commit-Queue: Victor Costan <pwnall@chromium.org>
Reviewed-by: Staphany Park <staphany@chromium.org>
Reviewed-by: Chris Mumford <cmumford@google.com>
Cr-Commit-Position: refs/heads/master@{#659641}
  • Loading branch information
pwnall authored and Commit Bot committed May 14, 2019
1 parent dfb75af commit 153dd1a
Show file tree
Hide file tree
Showing 23 changed files with 3,101 additions and 18 deletions.
20 changes: 19 additions & 1 deletion sql/BUILD.gn
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,24 @@ component("sql") {
"internal_api_token.h",
"meta_table.cc",
"meta_table.h",
"recover_module/btree.cc",
"recover_module/btree.h",
"recover_module/cursor.cc",
"recover_module/cursor.h",
"recover_module/integers.cc",
"recover_module/integers.h",
"recover_module/module.cc",
"recover_module/module.h",
"recover_module/pager.cc",
"recover_module/pager.h",
"recover_module/parsing.cc",
"recover_module/parsing.h",
"recover_module/payload.cc",
"recover_module/payload.h",
"recover_module/record.cc",
"recover_module/record.h",
"recover_module/table.cc",
"recover_module/table.h",
"recovery.cc",
"recovery.h",
"sql_features.cc",
Expand Down Expand Up @@ -94,7 +112,7 @@ test("sql_unittests") {
sources = [
"database_unittest.cc",
"meta_table_unittest.cc",
"recover_module_unittest.cc",
"recover_module/module_unittest.cc",
"recovery_unittest.cc",
"sql_memory_dump_provider_unittest.cc",
"sqlite_features_unittest.cc",
Expand Down
94 changes: 94 additions & 0 deletions sql/recover_module/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
# SQLite Data Recovery

This directory implements data recovery heuristics for SQLite databases whose
files were corrupted on disk. The recovery code walks through the B-tree
holding a SQLite table and recovers all records that seem healthy. Even if
recovery succeeds, a recovered table may be missing records, and existing
records may have corrupted values inside them. Any constraints imposed by Chrome
or by SQLite may be broken.


## Usage

The default approach for handling corruption in SQLite databases is to
immediately stop using the database, delete it, and start over with a new
database. The recovery method implemented here is intended for databases used by
Chrome features that handle high-value user data, such as History and Bookmarks.
These features carefully handle data inconsistency edge cases, and their
database schemas are resilient to partial data loss.

The code is plugged into the rest of Chrome via
[SQLite's virtual table module API](https://sqlite.org/vtab.html). The example
below covers a typical recovery scenario.

```sql
-- Feature table schema.
CREATE TABLE data(name TEXT PRIMARY KEY, value TEXT NOT NULL);

-- Recover in another database. The corrupted one is unreliable.
ATTACH DATABASE '/tmp/db.db' as recovery;
-- Re-create the feature table's schema.
CREATE TABLE recovery.feature(name TEXT PRIMARY KEY, value TEXT NOT NULL);
-- Start reading the corrupted data.
CREATE VIRTUAL TABLE temp.recover_feature USING recover(
main.feature, -- The corrupted database.
-- Recovery will skip row values that don't have the TEXT type.
name TEXT STRICT NOT NULL,
-- Recovery will include any row value coercible to TEXT.
value TEXT NOT NULL);
-- Data recovered from corrupted databases may not meet schema constraints, so
-- recovery insertions must use "OR REPLACE" or "OR IGNORE".
INSERT OR REPLACE INTO recovery.feature(rowid, name, value)
SELECT rowid, name, value FROM temp.recover_feature;
-- Cleanup after the recovery operation.
DROP TABLE temp.recover_feature;
DETACH DATABASE recovery;
-- Replace the corrupted database file with the recovered one.
```

The feature invoking the recovery virtual table must know the schema of the
database being recovered. A generic high-level recovery layer should first
recover
[the `sqlite_master` table](https://www.sqlite.org/fileformat.html#storage_of_the_sql_database_schema),
which has a well known format, then use its contents to recover the schema of
any other table. This recovery module already relies on the integrity of the
`sqlite_master` table.

The column definitions in the virtual table creation statement must follow
the syntax _column\_name_ _type\_name_ [`STRICT`] [`NOT NULL`]. _type\_name_ is
[the SQLite data types](https://www.sqlite.org/datatype3.html#storage_classes_and_datatypes),
or the special types `ANY` or `ROWID`.

The `ANY` type can be used to recover values of all types.

The `ROWID` type must be used for columns that alias
[rowid](https://www.sqlite.org/lang_createtable.html#rowid). This typically only
happens when a column is declared as `INTEGER PRIMARY KEY`.
Designating `ROWID` columns is essential for decoding records correctly.

TODO(pwnall): Look into removing `STRICT`, if it's not used.


## Limitations

The current implementation only handles [table
B-trees](https://www.sqlite.org/fileformat.html#b_tree_pages). It cannot
recover [WITHOUT ROWID](https://www.sqlite.org/rowidtable.html) tables, which
are stored in index B-trees.


## Code Map

The code is structured as follows.

* integers.{cc,h} decodes the integer formats used by SQLite.
* btree.{cc,h} decodes the cells in SQLite's B-tree pages.
* payload.{cc,h} reads record payloads from B-tree pages and overflow pages.
* record.{cc,h} decodes column values from record.
* cursor.{cc,h} implements a SQLite virtual table cursor.
* table.{cc,h} implements one recovery virtual table.
* parsing.{cc,h} parses the SQL strings passed in via `CREATE VIRTUAL TABLE`
and implements the constraints explained above.
* module.{cc,h} implements the SQLite virtual table interface.

The feature is tested by integration tests that issue SQLite queries.
252 changes: 252 additions & 0 deletions sql/recover_module/btree.cc
Original file line number Diff line number Diff line change
@@ -0,0 +1,252 @@
// Copyright 2019 The Chromium Authors. All rights reserved.
// Use of this source code is governed by a BSD-style license that can be
// found in the LICENSE file.

#include "sql/recover_module/btree.h"

#include <algorithm>
#include <limits>
#include <type_traits>

#include "base/logging.h"
#include "sql/recover_module/integers.h"
#include "sql/recover_module/pager.h"
#include "third_party/sqlite/sqlite3.h"

namespace sql {
namespace recover {

namespace {

// The SQLite database format is documented at the following URLs.
// https://www.sqlite.org/fileformat.html
// https://www.sqlite.org/fileformat2.html
constexpr uint8_t kInnerTablePageType = 0x05;
constexpr uint8_t kLeafTablePageType = 0x0D;

// Offset from the page header to the page type byte.
constexpr int kPageTypePageOffset = 0;
// Offset from the page header to the 2-byte cell count.
constexpr int kCellCountPageOffset = 3;
// Offset from an inner page header to the 4-byte last child page ID.
constexpr int kLastChildIdInnerPageOffset = 8;
// Offset from an inner page header to the cell pointer array.
constexpr int kFirstCellOfsetInnerPageOffset = 12;
// Offset from a leaf page header to the cell pointer array.
constexpr int kFirstCellOfsetLeafPageOffset = 8;

} // namespace

#if !DCHECK_IS_ON()
// In DCHECKed builds, the decoder contains a sequence checker, which has a
// non-trivial destructor.
static_assert(std::is_trivially_destructible<InnerPageDecoder>::value,
"Move the destructor to the .cc file if it's non-trival");
#endif // !DCHECK_IS_ON()

InnerPageDecoder::InnerPageDecoder(DatabasePageReader* db_reader)
: page_id_(db_reader->page_id()),
db_reader_(db_reader),
cell_count_(ComputeCellCount(db_reader)),
next_read_index_(0) {
DCHECK(IsOnValidPage(db_reader));
DCHECK(DatabasePageReader::IsValidPageId(page_id_));
}

int InnerPageDecoder::TryAdvance() {
DCHECK_CALLED_ON_VALID_SEQUENCE(sequence_checker_);
DCHECK(CanAdvance());

const int sqlite_status = db_reader_->ReadPage(page_id_);
if (sqlite_status != SQLITE_OK) {
// TODO(pwnall): UMA the error code.

next_read_index_ = cell_count_ + 1; // End the reading process.
return DatabasePageReader::kInvalidPageId;
}

const uint8_t* const page_data = db_reader_->page_data();
const int read_index = next_read_index_;
next_read_index_ += 1;
if (read_index == cell_count_)
return LoadBigEndianInt32(page_data + kLastChildIdInnerPageOffset);

const int cell_pointer_offset =
kFirstCellOfsetInnerPageOffset + (read_index << 1);
DCHECK_LE(cell_pointer_offset + 2, db_reader_->page_size())
<< "ComputeCellCount() used an incorrect upper bound";
const int cell_pointer = LoadBigEndianUint16(page_data + cell_pointer_offset);

static_assert(std::numeric_limits<uint16_t>::max() + 4 <
std::numeric_limits<int>::max(),
"The addition below may overflow");
if (cell_pointer + 4 >= db_reader_->page_size()) {
// Each cell needs 1 byte for the rowid varint, in addition to the 4 bytes
// for the child page number that will be read below. Skip cells that
// obviously go over the page end.
return DatabasePageReader::kInvalidPageId;
}
if (cell_pointer < kFirstCellOfsetInnerPageOffset) {
// The pointer points into the cell's header.
return DatabasePageReader::kInvalidPageId;
}

return LoadBigEndianInt32(page_data + cell_pointer);
}

// static
bool InnerPageDecoder::IsOnValidPage(DatabasePageReader* db_reader) {
static_assert(kPageTypePageOffset < DatabasePageReader::kMinUsablePageSize,
"The check below may perform an out-of-bounds memory access");
return db_reader->page_data()[kPageTypePageOffset] == kInnerTablePageType;
}

// static
int InnerPageDecoder::ComputeCellCount(DatabasePageReader* db_reader) {
// The B-tree page header stores the cell count.
int header_count =
LoadBigEndianUint16(db_reader->page_data() + kCellCountPageOffset);
static_assert(
kCellCountPageOffset + 2 <= DatabasePageReader::kMinUsablePageSize,
"The read above may be out of bounds");

// However, the data may be corrupted. So, use an upper bound based on the
// fact that the cell pointer array should never extend past the end of the
// page.
//
// The page size is always even, because it is either a power of two, for
// most pages, or a power of two minus 100, for the first database page. The
// cell pointer array starts at offset 12. So, each cell pointer must be
// separated from the page buffer's end by an even number of bytes.
DCHECK((db_reader->page_size() - kFirstCellOfsetInnerPageOffset) % 2 == 0);
int upper_bound =
(db_reader->page_size() - kFirstCellOfsetInnerPageOffset) >> 1;
static_assert(
kFirstCellOfsetInnerPageOffset <= DatabasePageReader::kMinUsablePageSize,
"The |upper_bound| computation above may overflow");

return std::min(header_count, upper_bound);
}

#if !DCHECK_IS_ON()
// In DCHECKed builds, the decoder contains a sequence checker, which has a
// non-trivial destructor.
static_assert(std::is_trivially_destructible<LeafPageDecoder>::value,
"Move the destructor to the .cc file if it's non-trival");
#endif // !DCHECK_IS_ON()

LeafPageDecoder::LeafPageDecoder(DatabasePageReader* db_reader)
: page_id_(db_reader->page_id()),
db_reader_(db_reader),
cell_count_(ComputeCellCount(db_reader)),
next_read_index_(0),
last_record_size_(0) {
DCHECK(IsOnValidPage(db_reader));
DCHECK(DatabasePageReader::IsValidPageId(page_id_));
}

bool LeafPageDecoder::TryAdvance() {
DCHECK_CALLED_ON_VALID_SEQUENCE(sequence_checker_);
DCHECK(CanAdvance());

#if DCHECK_IS_ON()
// DCHECKs use last_record_size == 0 to check for incorrect access to the
// decoder's state.
last_record_size_ = 0;
#endif // DCHECK_IS_ON()

const int sqlite_status = db_reader_->ReadPage(page_id_);
if (sqlite_status != SQLITE_OK) {
// TODO(pwnall): UMA the error code.

next_read_index_ = cell_count_; // End the reading process.
return false;
}

const uint8_t* page_data = db_reader_->page_data();
const int read_index = next_read_index_;
next_read_index_ += 1;

const int cell_pointer_offset =
kFirstCellOfsetLeafPageOffset + (read_index << 1);
DCHECK_LE(cell_pointer_offset + 2, db_reader_->page_size())
<< "ComputeCellCount() used an incorrect upper bound";
const int cell_pointer = LoadBigEndianUint16(page_data + cell_pointer_offset);

static_assert(std::numeric_limits<uint16_t>::max() + 3 <
std::numeric_limits<int>::max(),
"The addition below may overflow");
if (cell_pointer + 3 >= db_reader_->page_size()) {
// Each cell needs at least 1 byte for page type varint, 1 byte for the
// rowid varint, and 1 byte for the record header size varint. Skip cells
// that obviously go over the page end.
return false;
}
if (cell_pointer < kFirstCellOfsetLeafPageOffset) {
// The pointer points into the cell's header.
return false;
}

const uint8_t* const cell_start = page_data + cell_pointer;
const uint8_t* const page_end = page_data + db_reader_->page_size();
DCHECK_LT(cell_start, page_end) << "Failed to skip over empty cells";

const uint8_t* rowid_start;
std::tie(last_record_size_, rowid_start) = ParseVarint(cell_start, page_end);
if (rowid_start == page_end) {
// The value size varint extended to the end of the page, so the rowid
// varint starts past the page end.
return false;
}
if (last_record_size_ <= 0) {
// Each payload needs at least one varint. Skip empty payloads.
#if DCHECK_IS_ON()
// DCHECKs use last_record_size == 0 to check for incorrect access to the
// decoder's state.
last_record_size_ = 0;
#endif // DCHECK_IS_ON()
return false;
}

const uint8_t* record_start;
std::tie(last_record_rowid_, record_start) =
ParseVarint(rowid_start, page_end);
if (record_start == page_end) {
// The rowid varint extended to the end of the page, so the record starts
// past the page end. Records need at least 1 byte for their header size
// varint, so this suggests corruption.
last_record_size_ = 0;
return false;
}

last_record_offset_ = record_start - page_data;
return true;
}

// static
bool LeafPageDecoder::IsOnValidPage(DatabasePageReader* db_reader) {
static_assert(kPageTypePageOffset < DatabasePageReader::kMinUsablePageSize,
"The check below may perform an out-of-bounds memory access");
return db_reader->page_data()[kPageTypePageOffset] == kLeafTablePageType;
}

// static
int LeafPageDecoder::ComputeCellCount(DatabasePageReader* db_reader) {
// See InnerPageDecoder::ComputeCellCount() for the reasoning behind the code.
int header_count =
LoadBigEndianUint16(db_reader->page_data() + kCellCountPageOffset);
static_assert(
kCellCountPageOffset + 2 <= DatabasePageReader::kMinUsablePageSize,
"The read above may be out of bounds");

int upper_bound =
(db_reader->page_size() - kFirstCellOfsetLeafPageOffset) >> 1;
static_assert(
kFirstCellOfsetLeafPageOffset <= DatabasePageReader::kMinUsablePageSize,
"The |upper_bound| computation above may overflow");

return std::min(header_count, upper_bound);
}

} // namespace recover
} // namespace sql
Loading

0 comments on commit 153dd1a

Please sign in to comment.