Closed
Description
Bug report
Checklist
- I am confident this is a bug in CPython, not a bug in a third-party project
- I have searched the CPython issue tracker,
and am confident this bug has not been reported before
CPython versions tested on:
3.11
Operating systems tested on:
Linux
Output from running 'python -VV' on the command line:
Python 3.11.5 (main, Aug 25 2023, 13:19:53) [GCC 9.4.0]
A clear and concise description of the bug:
Apologies if I'm misunderstanding. Please advice if I should post elsewhere. But shouldn't iterdump() properly detect VARCHAR columns with binary data and output X'' strings instead of throwing an error? This is what sqlite3 .dump
does.
import sqlite3
with sqlite3.connect(db_path) as conn:
with open(dump_path, 'w') as dump:
for line in conn.iterdump():
pass
The above will throw an error:
File "foo.py", line 79, in dump_sqlite_db
for line in conn.iterdump():
File "/usr/lib/python3.11/sqlite3/dump.py", line 63, in _iterdump
for row in query_res:
sqlite3.OperationalError: Could not decode to UTF-8 column ''INSERT INTO "sync_entities_metadata" VALUES('||quote("storage_key")||','||quote("metadata")||')'' with text 'INSERT INTO "sync_entities_metadata" VALUES(1,'v10����
I tried enabling conn.text_factory = bytes
as a workaround, but now get a different error.
File "foo.py", line 79, in dump_sqlite_db
for line in conn.iterdump():
File "/usr/lib/python3.11/sqlite3/dump.py", line 43, in _iterdump
elif table_name.startswith('sqlite_'):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: startswith first arg must be bytes or a tuple of bytes, not str
Linked PRs
- gh-108590: Fix sqlite3.iterdump for invalid unicode in text columns. #108657
- [3.12] gh-108590: Fix sqlite3.iterdump for invalid Unicode in TEXT columns (GH-108657) #108673
- [3.11] gh-108590: Fix sqlite3.iterdump for invalid Unicode in TEXT columns (GH-108657) #108674
- gh-108590: Fix sqlite3 iterdump() for table columns containing invalid Unicode sequences #108683
- gh-108590: Revert gh-108657 (commit 400a1cebc) #108686
- [3.11] gh-108590: Revert gh-108657 (commit 400a1cebc) (#108686) #108694
- gh-108590: Fix sqlite3.iterdump for invalid unicode in text columns and reproducability. #108695
- gh-108590: Improve sqlite3 docs on encoding issues and how to handle those #108699
- [3.12] gh-108590: Improve sqlite3 docs on encoding issues and how to handle those (GH-108699) #111324
- [3.11] gh-108590: Improve sqlite3 docs on encoding issues and how to handle those (GH-108699) #111325