forked from Pissandshittium/pissandshittium
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
sqlite: Rewrite custom recovery code.
Our largest SQLite patch, 0001-Virtual-table-supporting-recovery-of-corrupted-datab.patch, contains the implementation of a virtual table extension that contains custom recovery logic. The patch is written in C, and roughly follows SQLite's coding style. This CL reimplements the recover functionality in //sql/recover_module. The new implementation is based on the high-level description of the old one, and does not match its structure. The new implementation passes all the tests that shipped with the patch. Bug: 945204 Change-Id: I04df1edfeb48f907303d9525b1c81486bdc62f75 Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/1546942 Commit-Queue: Victor Costan <pwnall@chromium.org> Reviewed-by: Staphany Park <staphany@chromium.org> Reviewed-by: Chris Mumford <cmumford@google.com> Cr-Commit-Position: refs/heads/master@{#659641}
- Loading branch information
Showing
23 changed files
with
3,101 additions
and
18 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,94 @@ | ||
# SQLite Data Recovery | ||
|
||
This directory implements data recovery heuristics for SQLite databases whose | ||
files were corrupted on disk. The recovery code walks through the B-tree | ||
holding a SQLite table and recovers all records that seem healthy. Even if | ||
recovery succeeds, a recovered table may be missing records, and existing | ||
records may have corrupted values inside them. Any constraints imposed by Chrome | ||
or by SQLite may be broken. | ||
|
||
|
||
## Usage | ||
|
||
The default approach for handling corruption in SQLite databases is to | ||
immediately stop using the database, delete it, and start over with a new | ||
database. The recovery method implemented here is intended for databases used by | ||
Chrome features that handle high-value user data, such as History and Bookmarks. | ||
These features carefully handle data inconsistency edge cases, and their | ||
database schemas are resilient to partial data loss. | ||
|
||
The code is plugged into the rest of Chrome via | ||
[SQLite's virtual table module API](https://sqlite.org/vtab.html). The example | ||
below covers a typical recovery scenario. | ||
|
||
```sql | ||
-- Feature table schema. | ||
CREATE TABLE data(name TEXT PRIMARY KEY, value TEXT NOT NULL); | ||
|
||
-- Recover in another database. The corrupted one is unreliable. | ||
ATTACH DATABASE '/tmp/db.db' as recovery; | ||
-- Re-create the feature table's schema. | ||
CREATE TABLE recovery.feature(name TEXT PRIMARY KEY, value TEXT NOT NULL); | ||
-- Start reading the corrupted data. | ||
CREATE VIRTUAL TABLE temp.recover_feature USING recover( | ||
main.feature, -- The corrupted database. | ||
-- Recovery will skip row values that don't have the TEXT type. | ||
name TEXT STRICT NOT NULL, | ||
-- Recovery will include any row value coercible to TEXT. | ||
value TEXT NOT NULL); | ||
-- Data recovered from corrupted databases may not meet schema constraints, so | ||
-- recovery insertions must use "OR REPLACE" or "OR IGNORE". | ||
INSERT OR REPLACE INTO recovery.feature(rowid, name, value) | ||
SELECT rowid, name, value FROM temp.recover_feature; | ||
-- Cleanup after the recovery operation. | ||
DROP TABLE temp.recover_feature; | ||
DETACH DATABASE recovery; | ||
-- Replace the corrupted database file with the recovered one. | ||
``` | ||
|
||
The feature invoking the recovery virtual table must know the schema of the | ||
database being recovered. A generic high-level recovery layer should first | ||
recover | ||
[the `sqlite_master` table](https://www.sqlite.org/fileformat.html#storage_of_the_sql_database_schema), | ||
which has a well known format, then use its contents to recover the schema of | ||
any other table. This recovery module already relies on the integrity of the | ||
`sqlite_master` table. | ||
|
||
The column definitions in the virtual table creation statement must follow | ||
the syntax _column\_name_ _type\_name_ [`STRICT`] [`NOT NULL`]. _type\_name_ is | ||
[the SQLite data types](https://www.sqlite.org/datatype3.html#storage_classes_and_datatypes), | ||
or the special types `ANY` or `ROWID`. | ||
|
||
The `ANY` type can be used to recover values of all types. | ||
|
||
The `ROWID` type must be used for columns that alias | ||
[rowid](https://www.sqlite.org/lang_createtable.html#rowid). This typically only | ||
happens when a column is declared as `INTEGER PRIMARY KEY`. | ||
Designating `ROWID` columns is essential for decoding records correctly. | ||
|
||
TODO(pwnall): Look into removing `STRICT`, if it's not used. | ||
|
||
|
||
## Limitations | ||
|
||
The current implementation only handles [table | ||
B-trees](https://www.sqlite.org/fileformat.html#b_tree_pages). It cannot | ||
recover [WITHOUT ROWID](https://www.sqlite.org/rowidtable.html) tables, which | ||
are stored in index B-trees. | ||
|
||
|
||
## Code Map | ||
|
||
The code is structured as follows. | ||
|
||
* integers.{cc,h} decodes the integer formats used by SQLite. | ||
* btree.{cc,h} decodes the cells in SQLite's B-tree pages. | ||
* payload.{cc,h} reads record payloads from B-tree pages and overflow pages. | ||
* record.{cc,h} decodes column values from record. | ||
* cursor.{cc,h} implements a SQLite virtual table cursor. | ||
* table.{cc,h} implements one recovery virtual table. | ||
* parsing.{cc,h} parses the SQL strings passed in via `CREATE VIRTUAL TABLE` | ||
and implements the constraints explained above. | ||
* module.{cc,h} implements the SQLite virtual table interface. | ||
|
||
The feature is tested by integration tests that issue SQLite queries. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,252 @@ | ||
// Copyright 2019 The Chromium Authors. All rights reserved. | ||
// Use of this source code is governed by a BSD-style license that can be | ||
// found in the LICENSE file. | ||
|
||
#include "sql/recover_module/btree.h" | ||
|
||
#include <algorithm> | ||
#include <limits> | ||
#include <type_traits> | ||
|
||
#include "base/logging.h" | ||
#include "sql/recover_module/integers.h" | ||
#include "sql/recover_module/pager.h" | ||
#include "third_party/sqlite/sqlite3.h" | ||
|
||
namespace sql { | ||
namespace recover { | ||
|
||
namespace { | ||
|
||
// The SQLite database format is documented at the following URLs. | ||
// https://www.sqlite.org/fileformat.html | ||
// https://www.sqlite.org/fileformat2.html | ||
constexpr uint8_t kInnerTablePageType = 0x05; | ||
constexpr uint8_t kLeafTablePageType = 0x0D; | ||
|
||
// Offset from the page header to the page type byte. | ||
constexpr int kPageTypePageOffset = 0; | ||
// Offset from the page header to the 2-byte cell count. | ||
constexpr int kCellCountPageOffset = 3; | ||
// Offset from an inner page header to the 4-byte last child page ID. | ||
constexpr int kLastChildIdInnerPageOffset = 8; | ||
// Offset from an inner page header to the cell pointer array. | ||
constexpr int kFirstCellOfsetInnerPageOffset = 12; | ||
// Offset from a leaf page header to the cell pointer array. | ||
constexpr int kFirstCellOfsetLeafPageOffset = 8; | ||
|
||
} // namespace | ||
|
||
#if !DCHECK_IS_ON() | ||
// In DCHECKed builds, the decoder contains a sequence checker, which has a | ||
// non-trivial destructor. | ||
static_assert(std::is_trivially_destructible<InnerPageDecoder>::value, | ||
"Move the destructor to the .cc file if it's non-trival"); | ||
#endif // !DCHECK_IS_ON() | ||
|
||
InnerPageDecoder::InnerPageDecoder(DatabasePageReader* db_reader) | ||
: page_id_(db_reader->page_id()), | ||
db_reader_(db_reader), | ||
cell_count_(ComputeCellCount(db_reader)), | ||
next_read_index_(0) { | ||
DCHECK(IsOnValidPage(db_reader)); | ||
DCHECK(DatabasePageReader::IsValidPageId(page_id_)); | ||
} | ||
|
||
int InnerPageDecoder::TryAdvance() { | ||
DCHECK_CALLED_ON_VALID_SEQUENCE(sequence_checker_); | ||
DCHECK(CanAdvance()); | ||
|
||
const int sqlite_status = db_reader_->ReadPage(page_id_); | ||
if (sqlite_status != SQLITE_OK) { | ||
// TODO(pwnall): UMA the error code. | ||
|
||
next_read_index_ = cell_count_ + 1; // End the reading process. | ||
return DatabasePageReader::kInvalidPageId; | ||
} | ||
|
||
const uint8_t* const page_data = db_reader_->page_data(); | ||
const int read_index = next_read_index_; | ||
next_read_index_ += 1; | ||
if (read_index == cell_count_) | ||
return LoadBigEndianInt32(page_data + kLastChildIdInnerPageOffset); | ||
|
||
const int cell_pointer_offset = | ||
kFirstCellOfsetInnerPageOffset + (read_index << 1); | ||
DCHECK_LE(cell_pointer_offset + 2, db_reader_->page_size()) | ||
<< "ComputeCellCount() used an incorrect upper bound"; | ||
const int cell_pointer = LoadBigEndianUint16(page_data + cell_pointer_offset); | ||
|
||
static_assert(std::numeric_limits<uint16_t>::max() + 4 < | ||
std::numeric_limits<int>::max(), | ||
"The addition below may overflow"); | ||
if (cell_pointer + 4 >= db_reader_->page_size()) { | ||
// Each cell needs 1 byte for the rowid varint, in addition to the 4 bytes | ||
// for the child page number that will be read below. Skip cells that | ||
// obviously go over the page end. | ||
return DatabasePageReader::kInvalidPageId; | ||
} | ||
if (cell_pointer < kFirstCellOfsetInnerPageOffset) { | ||
// The pointer points into the cell's header. | ||
return DatabasePageReader::kInvalidPageId; | ||
} | ||
|
||
return LoadBigEndianInt32(page_data + cell_pointer); | ||
} | ||
|
||
// static | ||
bool InnerPageDecoder::IsOnValidPage(DatabasePageReader* db_reader) { | ||
static_assert(kPageTypePageOffset < DatabasePageReader::kMinUsablePageSize, | ||
"The check below may perform an out-of-bounds memory access"); | ||
return db_reader->page_data()[kPageTypePageOffset] == kInnerTablePageType; | ||
} | ||
|
||
// static | ||
int InnerPageDecoder::ComputeCellCount(DatabasePageReader* db_reader) { | ||
// The B-tree page header stores the cell count. | ||
int header_count = | ||
LoadBigEndianUint16(db_reader->page_data() + kCellCountPageOffset); | ||
static_assert( | ||
kCellCountPageOffset + 2 <= DatabasePageReader::kMinUsablePageSize, | ||
"The read above may be out of bounds"); | ||
|
||
// However, the data may be corrupted. So, use an upper bound based on the | ||
// fact that the cell pointer array should never extend past the end of the | ||
// page. | ||
// | ||
// The page size is always even, because it is either a power of two, for | ||
// most pages, or a power of two minus 100, for the first database page. The | ||
// cell pointer array starts at offset 12. So, each cell pointer must be | ||
// separated from the page buffer's end by an even number of bytes. | ||
DCHECK((db_reader->page_size() - kFirstCellOfsetInnerPageOffset) % 2 == 0); | ||
int upper_bound = | ||
(db_reader->page_size() - kFirstCellOfsetInnerPageOffset) >> 1; | ||
static_assert( | ||
kFirstCellOfsetInnerPageOffset <= DatabasePageReader::kMinUsablePageSize, | ||
"The |upper_bound| computation above may overflow"); | ||
|
||
return std::min(header_count, upper_bound); | ||
} | ||
|
||
#if !DCHECK_IS_ON() | ||
// In DCHECKed builds, the decoder contains a sequence checker, which has a | ||
// non-trivial destructor. | ||
static_assert(std::is_trivially_destructible<LeafPageDecoder>::value, | ||
"Move the destructor to the .cc file if it's non-trival"); | ||
#endif // !DCHECK_IS_ON() | ||
|
||
LeafPageDecoder::LeafPageDecoder(DatabasePageReader* db_reader) | ||
: page_id_(db_reader->page_id()), | ||
db_reader_(db_reader), | ||
cell_count_(ComputeCellCount(db_reader)), | ||
next_read_index_(0), | ||
last_record_size_(0) { | ||
DCHECK(IsOnValidPage(db_reader)); | ||
DCHECK(DatabasePageReader::IsValidPageId(page_id_)); | ||
} | ||
|
||
bool LeafPageDecoder::TryAdvance() { | ||
DCHECK_CALLED_ON_VALID_SEQUENCE(sequence_checker_); | ||
DCHECK(CanAdvance()); | ||
|
||
#if DCHECK_IS_ON() | ||
// DCHECKs use last_record_size == 0 to check for incorrect access to the | ||
// decoder's state. | ||
last_record_size_ = 0; | ||
#endif // DCHECK_IS_ON() | ||
|
||
const int sqlite_status = db_reader_->ReadPage(page_id_); | ||
if (sqlite_status != SQLITE_OK) { | ||
// TODO(pwnall): UMA the error code. | ||
|
||
next_read_index_ = cell_count_; // End the reading process. | ||
return false; | ||
} | ||
|
||
const uint8_t* page_data = db_reader_->page_data(); | ||
const int read_index = next_read_index_; | ||
next_read_index_ += 1; | ||
|
||
const int cell_pointer_offset = | ||
kFirstCellOfsetLeafPageOffset + (read_index << 1); | ||
DCHECK_LE(cell_pointer_offset + 2, db_reader_->page_size()) | ||
<< "ComputeCellCount() used an incorrect upper bound"; | ||
const int cell_pointer = LoadBigEndianUint16(page_data + cell_pointer_offset); | ||
|
||
static_assert(std::numeric_limits<uint16_t>::max() + 3 < | ||
std::numeric_limits<int>::max(), | ||
"The addition below may overflow"); | ||
if (cell_pointer + 3 >= db_reader_->page_size()) { | ||
// Each cell needs at least 1 byte for page type varint, 1 byte for the | ||
// rowid varint, and 1 byte for the record header size varint. Skip cells | ||
// that obviously go over the page end. | ||
return false; | ||
} | ||
if (cell_pointer < kFirstCellOfsetLeafPageOffset) { | ||
// The pointer points into the cell's header. | ||
return false; | ||
} | ||
|
||
const uint8_t* const cell_start = page_data + cell_pointer; | ||
const uint8_t* const page_end = page_data + db_reader_->page_size(); | ||
DCHECK_LT(cell_start, page_end) << "Failed to skip over empty cells"; | ||
|
||
const uint8_t* rowid_start; | ||
std::tie(last_record_size_, rowid_start) = ParseVarint(cell_start, page_end); | ||
if (rowid_start == page_end) { | ||
// The value size varint extended to the end of the page, so the rowid | ||
// varint starts past the page end. | ||
return false; | ||
} | ||
if (last_record_size_ <= 0) { | ||
// Each payload needs at least one varint. Skip empty payloads. | ||
#if DCHECK_IS_ON() | ||
// DCHECKs use last_record_size == 0 to check for incorrect access to the | ||
// decoder's state. | ||
last_record_size_ = 0; | ||
#endif // DCHECK_IS_ON() | ||
return false; | ||
} | ||
|
||
const uint8_t* record_start; | ||
std::tie(last_record_rowid_, record_start) = | ||
ParseVarint(rowid_start, page_end); | ||
if (record_start == page_end) { | ||
// The rowid varint extended to the end of the page, so the record starts | ||
// past the page end. Records need at least 1 byte for their header size | ||
// varint, so this suggests corruption. | ||
last_record_size_ = 0; | ||
return false; | ||
} | ||
|
||
last_record_offset_ = record_start - page_data; | ||
return true; | ||
} | ||
|
||
// static | ||
bool LeafPageDecoder::IsOnValidPage(DatabasePageReader* db_reader) { | ||
static_assert(kPageTypePageOffset < DatabasePageReader::kMinUsablePageSize, | ||
"The check below may perform an out-of-bounds memory access"); | ||
return db_reader->page_data()[kPageTypePageOffset] == kLeafTablePageType; | ||
} | ||
|
||
// static | ||
int LeafPageDecoder::ComputeCellCount(DatabasePageReader* db_reader) { | ||
// See InnerPageDecoder::ComputeCellCount() for the reasoning behind the code. | ||
int header_count = | ||
LoadBigEndianUint16(db_reader->page_data() + kCellCountPageOffset); | ||
static_assert( | ||
kCellCountPageOffset + 2 <= DatabasePageReader::kMinUsablePageSize, | ||
"The read above may be out of bounds"); | ||
|
||
int upper_bound = | ||
(db_reader->page_size() - kFirstCellOfsetLeafPageOffset) >> 1; | ||
static_assert( | ||
kFirstCellOfsetLeafPageOffset <= DatabasePageReader::kMinUsablePageSize, | ||
"The |upper_bound| computation above may overflow"); | ||
|
||
return std::min(header_count, upper_bound); | ||
} | ||
|
||
} // namespace recover | ||
} // namespace sql |
Oops, something went wrong.