Description
- 环境
$ uname -a
Linux azure 6.5.0-1018-azure #19~22.04.2-Ubuntu SMP Thu Mar 21 16:45:46 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
$ mysqld --version
/usr/sbin/mysqld Ver 8.0.36-0ubuntu0.22.04.1 for Linux on x86_64 ((Ubuntu))
$ /usr/lib/postgresql/16/bin/postgres --version
postgres (PostgreSQL) 16.2 (Ubuntu 16.2-1.pgdg22.04+1)
pgloader
@3.6.10
mysql_fdw
@2.9.1
库名假定为tbm
- 使用某11年历史的commonlisp程序 https://github.com/dimitri/pgloader 可以在多个数据源(其中有着mysql和pgsql)之间导入导出
1.1 由于 dimitri/pgloader#1211 ,从postgre社区apt source安装的pgloader 3.6.10-2.pgdg22.04+1
无法连接mysqld
$ apt show -a pgloader
Package: pgloader
Version: 3.6.10-2.pgdg22.04+1
Priority: optional
Section: database
Maintainer: Dimitri Fontaine <dim@tapoueh.org>
Installed-Size: 33.9 MB
Depends: freetds-dev, libc6 (>= 2.34), zlib1g (>= 1:1.1.4), libsqlite3-0, libssl3
Homepage: https://github.com/dimitri/pgloader
Download-Size: 29.3 MB
APT-Sources: https://apt.postgresql.org/pub/repos/apt jammy-pgdg/main amd64 Packages
Description: extract, transform and load data into PostgreSQL
pgloader imports data from different kind of sources and COPY it into
PostgreSQL.
.
The command language is described in the manual page and allows one to
describe where to find the data source, its format, and to describe data
processing and transformation.
.
Supported source formats include CSV, fixed width flat files, dBase3 files
(DBF), and SQLite and MySQL databases. In most of those formats, pgloader
is able to auto-discover the schema and create the tables and the indexes
in PostgreSQL. In the MySQL case it's possible to edit CASTing rules from
the pgloader command directly.
Package: pgloader
Version: 3.6.3-1ubuntu1
Priority: extra
Section: universe/misc
Origin: Ubuntu
Maintainer: Ubuntu Developers <ubuntu-devel-discuss@lists.ubuntu.com>
Original-Maintainer: Dimitri Fontaine <dim@tapoueh.org>
Bugs: https://bugs.launchpad.net/ubuntu/+filebug
Installed-Size: 33.8 MB
Depends: freetds-dev, libc6 (>= 2.34), zlib1g (>= 1:1.1.4), libssl3
Homepage: https://github.com/dimitri/pgloader
Download-Size: 29.5 MB
APT-Sources: http://azure.archive.ubuntu.com/ubuntu jammy/universe amd64 Packages
Description: extract, transform and load data into PostgreSQL
pgloader imports data from different kind of sources and COPY it into
PostgreSQL.
.
The command language is described in the manual page and allows one to
describe where to find the data source, its format, and to describe data
processing and transformation.
.
Supported source formats include CSV, fixed width flat files, dBase3 files
(DBF), and SQLite and MySQL databases. In most of those formats, pgloader
is able to auto-discover the schema and create the tables and the indexes
in PostgreSQL. In the MySQL case it's possible to edit CASTing rules from
the pgloader command directly.
1.1.1. 建议从源码自行编译3.6.10
sudo apt install -y pgloader sbcl && # 通过安装apt中的pgloader来安装其运行时所需依赖和sbcl commonlisp编译器
git clone https://github.com/dimitri/pgloader &&
cd pgloader &&
git checkout v3.6.10 &&
make DYNSIZE=1024 && # 内存限制
build/bin/pgloader --version && # pgloader version "3.6.af8c3c1" compiled with SBCL 2.1.11.debian
sudo apt purge -y sbcl
1.1.2. 清理:
sudo apt purge -y pgloader &&
sudo apt autoremove --purge
1.2. 由于 dimitri/pgloader#782 ,pgloader只能通过仍在使用mysql 8.0.34
中deprecated的mysql_native_password
用户登录mysqld
1.2.1. 建议ad-hoc地临时创建一个最小权限(只GRANT
库/表级权限)用户用于pgloader而非修改现有用户的default auth plugin
- 令文件
schema.load
有
LOAD DATABASE
FROM mysql://username:password@localhost/tbm
INTO pgsql://username:password@localhost/tbm
WITH quote identifiers, schema only
INCLUDING ONLY TABLE NAMES MATCHING ~/^tbm(_|i_(?!imageInReply)|c_f(78579|2265748|17019292|25459979|27546680))/
;
2.1. 可以使用INCLUDING ONLY TABLE NAMES MATCHING ~/regexp/, 'exact'
来filter要导入的表名
2.2. 令文件data.load
有
LOAD DATABASE
FROM mysql://username:password@localhost/tbm
INTO pgsql://username:password@localhost/tbm
SET MySQL PARAMETERS -- https://github.com/dimitri/pgloader/issues/999
net_read_timeout = '300',
net_write_timeout = '300'
WITH quote identifiers, data only, on error stop, workers = 2, concurrency = 1, batch rows = 1000, batch size = 8MB, prefetch rows = 10000 -- 酌情tunning https://pgloader.readthedocs.io/en/latest/command.html#batch-behaviour-options
INCLUDING ONLY TABLE NAMES MATCHING ~/^tbm(_|i_(?!imageInReply)|c_f(78579|2265748|17019292|25459979|27546680))/
;
2.3. 分开导入表结构
和表数据
是因为pgloader默认行为是按照指定和mysql://
源下默认的CASTING
创建目的表结构
(除了索引),再COPY
导入表数据
,最后并行{表索引数量}
个CREATE INDEX
并且不等待其完成而是并行地开始下一个表的过程: https://pgloader.readthedocs.io/en/latest/batches.html
By default, as many CREATE INDEX threads as the maximum number of indexes per table are found in your source schema. It is possible to set the max parallel create index WITH option to another number in case there’s just too many of them to create.
由于未知原因在3.6.af8c3c1
中WITH max parallel create index 1
并没有实际作用仍然会执行上述流程所以不得不分开在WITH schema only
的.load
中先完成CREATE INDEX
dimitri/pgloader#1576 以减少io争用
2.3.1. 出于类似的原因也建议先禁用autovacuum
ALTER SYSTEM SET autovacuum = off;
SELECT pg_reload_conf();
SELECT * FROM pg_settings WHERE name = 'autovacuum';
清理:待pgloader完成后再手动VACUMM FULL
https://dba.stackexchange.com/questions/130496/is-it-worth-it-to-run-vacuum-on-a-table-that-only-receives-inserts/130514#130514
ALTER SYSTEM RESET autovacuum;
SELECT pg_reload_conf();
VACUMM FULL;
2.4. WITH quote identifiers
是为了方便后续校验两端表数据
是否一致时可以假定列名相同,您可以之后再重命名为postgre社区习惯的snake_case
以减少满屏幕""
(相当于mysql的``
)
2.5. 视奸正在导入什么表 https://stackoverflow.com/questions/35319597/how-to-stop-kill-a-query-in-postgresql
SELECT * FROM pg_stat_activity WHERE application_name = 'pgloader';
SELECT table_schema, table_name, pg_relation_size('"'||table_schema||'"."'||table_name||'"')::float8 / 1024 / 1024 "MiB"
FROM information_schema.tables
WHERE table_schema = 'tbm'
ORDER BY 3 DESC;
- 设有表
tbmc_f97650_thread
结构
mysql端:
CREATE TABLE `tbmc_f97650_thread` (
`id` int unsigned NOT NULL AUTO_INCREMENT,
`tid` bigint unsigned NOT NULL,
`threadType` bigint unsigned NOT NULL,
`stickyType` tinytext CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci,
`topicType` tinytext CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci,
`isGood` tinyint DEFAULT NULL,
`title` text CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci NOT NULL,
`authorUid` bigint NOT NULL,
`postedAt` int unsigned NOT NULL,
`latestReplyPostedAt` int unsigned NOT NULL,
`latestReplierUid` bigint DEFAULT NULL,
`replyCount` int unsigned DEFAULT NULL,
`viewCount` int unsigned DEFAULT NULL,
`shareCount` int unsigned DEFAULT NULL,
`agreeCount` int DEFAULT NULL,
`disagreeCount` int DEFAULT NULL,
`zan` blob,
`geolocation` blob,
`authorPhoneType` text CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci,
`createdAt` int unsigned NOT NULL,
`updatedAt` int unsigned DEFAULT NULL,
`lastSeenAt` int unsigned DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `tid` (`tid`) USING BTREE,
KEY `authorUid` (`authorUid`),
KEY `postTime` (`postedAt` DESC)
) ENGINE=InnoDB CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
由pgloader在pgsql端按照2.3.
中mysql://
源下默认的CASTING
所创建:
create table tbm_dev.tbmc_f97650_thread (
id bigint primary key not null default nextval('tbmc_f97650_thread_id_seq'::regclass),
tid bigint not null,
"threadType" bigint not null,
"stickyType" text,
"topicType" text,
"isGood" smallint,
title text not null,
"authorUid" bigint not null,
"postedAt" bigint not null,
"latestReplyPostedAt" bigint not null,
"latestReplierUid" bigint,
"replyCount" bigint,
"viewCount" bigint,
"shareCount" bigint,
"agreeCount" integer,
"disagreeCount" integer,
zan bytea,
geolocation bytea,
"authorPhoneType" text,
"createdAt" bigint not null,
"updatedAt" bigint,
"lastSeenAt" bigint
);
create index "idx_23173_authorUid" on tbmc_f97650_thread using btree ("authorUid");
create index "idx_23173_postTime" on tbmc_f97650_thread using btree ("postedAt");
create unique index idx_23173_tid on tbmc_f97650_thread using btree (tid);
3.1. 可以通过将表中每列值
concat起来hash后再aggreate地对每行hash
再hash来得出整个表的hash
mermaid.live
flowchart TD
fields[tid\nthreadType\nstickyType\ntopicType\nisGood\ntitle\nauthorUid\npostedAt\nlatestReplyPostedAt\nlatestReplierUid\nreplyCount\nviewCount\nshareCount\nagreeCount\ndisagreeCount\nzan\ngeolocation\nauthorPhoneType\ncreatedAt\nupdatedAt\nlastSeenAt]
fields --CONCAT_WS(' ', ...)--> row[row\nsingle long csv line string but using space and without any escape]
row --sha2-256--> row_hash
row_hash --GROUP_CONCAT(... ORDER BY tid SEPARATOR '\n')--> rows_hash[rows_hash\nnew line delimited rows]
rows_hash --sha2-256--> table_hash
mysql
SET SESSION group_concat_max_len = 18446744073709551615; -- 默认只有1024超过会带WARNING截断 https://dba.stackexchange.com/questions/197746/why-set-group-concat-max-len-below-the-maximum
SELECT SHA2(GROUP_CONCAT(SHA2(CONCAT_WS(' ',
tid, threadType, stickyType, topicType, isGood, title, authorUid, postedAt, latestReplyPostedAt, latestReplierUid, replyCount, viewCount, shareCount, agreeCount, disagreeCount, LOWER(HEX(zan)), LOWER(HEX(geolocation)), authorPhoneType, createdAt, updatedAt, lastSeenAt
), 512) ORDER BY tid SEPARATOR '\n'), 512) hash FROM tbmc_f97650_thread;
blob类型字段必须套LOWER(HEX())
因为mysql默认的blob2text
输出类似0xDEADBEAF
形式而pgsql是deadbeaf
pgsql
SELECT encode(sha512(convert_to(array_to_string(array_agg(encode(sha512(convert_to(concat_ws(' ',
tid, "threadType", "stickyType", "topicType", "isGood", title, "authorUid", "postedAt", "latestReplyPostedAt", "latestReplierUid", "replyCount", "viewCount", "shareCount", "agreeCount", "disagreeCount", encode(zan, 'hex'), encode(geolocation, 'hex'), "authorPhoneType", "createdAt", "updatedAt", "lastSeenAt"
), 'utf8')), 'hex') ORDER BY tid), E'\n'), 'utf8')), 'hex') hash FROM tbmc_f97650_thread;
3.2. 然而在大表上两者最终都会耗尽内存而失败,并且需要
3.3. 理论上可以通过赋予纯SQL(无需臭名昭著的命令式存储过程
或外部程序辅助)图灵完备性 https://wiki.postgresql.org/wiki/Mandelbrot_set 导致其踏入了同一条Turing Tarpit
之河流的rCTE
来LIMIT 10000
地每次hash的行并对每10k行aggreate之hash
再aggreate地hash从而减少每行hash所aggreate的字符串长度
mermaid.live
flowchart TD
fields[tid\nthreadType\nstickyType\ntopicType\nisGood\ntitle\nauthorUid\npostedAt\nlatestReplyPostedAt\nlatestReplierUid\nreplyCount\nviewCount\nshareCount\nagreeCount\ndisagreeCount\nzan\ngeolocation\nauthorPhoneType\ncreatedAt\nupdatedAt\nlastSeenAt]
fields --LIMIT 10000\nCONCAT_WS(' ', ...)--> row
fields --LIMIT 10000\nCONCAT_WS(' ', ...)--> row
fields --LIMIT 10000\nCONCAT_WS(' ', ...)--> row
fields --LIMIT 10000\nCONCAT_WS(' ', ...)--> row
fields --LIMIT rest\nCONCAT_WS(' ', ...)--> row
row[row\nsingle long csv line string but using space and without any escape]
row --LIMIT 10000\nsha2-256--> row_hash
row --LIMIT 10000\nsha2-256--> row_hash
row --LIMIT 10000\nsha2-256--> row_hash
row --LIMIT 10000\nsha2-256--> row_hash
row --LIMIT rest\nsha2-256--> row_hash
row_hash --GROUP_CONCAT(... ORDER BY tid SEPARATOR '\n')--> rows_hash[rows_hash\nnew line delimited rows]
rows_hash --sha2-256--> table_hash
- https://stackoverflow.com/questions/59715303/is-there-any-way-join-tables-of-mysql-and-postgresql/59721228#59721228 中
某so纯路人
指出可以使用postgre社区牛逼哄哄
的某知名企业EnterpriseDB
所开发的某pgsql扩展 https://github.com/EnterpriseDB/mysql_fdw 通过foreign data wrapper来在pgsql中将外部数据当做迫真物化视图
查询
4.1. 安装:
git clone https://github.com/EnterpriseDB/mysql_fdw &&
git checkout REL-2_9_1 &&
sudo apt install -y libmysqlclient-dev postgresql-server-dev-16 &&
export PATH=/usr/local/pgsql/bin/:$PATH &&
make USE_PGXS=1 &&
sudo make USE_PGXS=1 install &&
make USE_PGXS=1 installcheck &&
sudo apt purge -y postgresql-server-dev-16
清理:
sudo apt purge -y libmysqlclient-dev &&
sudo make USE_PGXS=1 uninstall &&
rm -rf mysql_fdw
4.2. 安装该pgsql扩展
ALTER USER tbm WITH SUPERUSER;
CREATE EXTENSION IF NOT EXISTS mysql_fdw;
ALTER USER tbm WITH NOSUPERUSER;
CREATE SERVER IF NOT EXISTS mysql
FOREIGN DATA WRAPPER mysql_fdw
OPTIONS (host '127.0.0.1', port '3306');
CREATE USER MAPPING IF NOT EXISTS FOR CURRENT_USER -- 当前登录用户 也可以用PUBLIC使得用户名/密码对所有pgsql用户可见
SERVER mysql
OPTIONS (username 'username', password 'password');
清理:
DROP USER MAPPING IF EXISTS FOR CURRENT_USER SERVER mysql; -- FOR CURRENT_USER同上
DROP SCHEMA IF EXISTS mysql CASCADE;
DROP SERVER IF EXISTS mysql CASCADE;
DROP EXTENSION IF EXISTS mysql_fdw CASCADE;
DROP DATABASE contrib_regression; -- make USE_PGXS=1 installcheck所产生
4.3. 在pgsql中创建fdw表
CREATE SCHEMA IF NOT EXISTS mysql;
CREATE FOREIGN TABLE IF NOT EXISTS mysql."tbmc_f97650_thread"()
INHERITS (tbm."tbmc_f97650_thread")
SERVER mysql
OPTIONS (dbname 'tbm', table_name 'tbmc_f97650_thread');
使用table inheritance
是为了避免在2.
已经pgloader schema.load
后再写一遍fdw表结构
,但这会导致
4.3.1. 被继承表tbmc_f97650_thread
无法再直接UPDATE
/DELETE
返回[HV005] ERROR: system attribute "tableoid" can't be fetched from remote relation
而必须加ONLY
EnterpriseDB/mysql_fdw#300 但仍然可以INSERT
4.3.1.1. 这个bug可能会在未来版本的mysql_fdw
中修复但在这里并不重要,因为这种迫真假锁全表
正好避免了在校验期间UPDATE
/DELETE
(但防不了INSERT
)(默认事务隔离级别是READ COMMITTED
而非mysql默认的REPEATABLE READ
#32 (comment) 当然您也可以改)了pgsql表导致假阴性
4.3.2. 由于table inheritance
的本质:
SELECT * FROM tbm."tbmc_f97650_thread"
也会出现
SELECT * FROM mysql."tbmc_f97650_thread"
中的所有结果:也就是被继承表并集了所有继承表(在这里通过查询mysql),相当于
SELECT * FROM ONLY tbm."tbmc_f97650_thread" UNION ALL SELECT * FROM mysql."tbmc_f97650_thread"
- 设有表
tbmi_ocr_box_Latn
结构:
mysql端:
CREATE TABLE `tbmi_ocr_box_Latn` (
`imageId` int unsigned NOT NULL,
`frameIndex` int unsigned NOT NULL,
`centerPointX` smallint unsigned NOT NULL,
`centerPointY` smallint unsigned NOT NULL,
`width` smallint unsigned NOT NULL,
`height` smallint unsigned NOT NULL,
`rotationDegrees` smallint unsigned NOT NULL,
`recognizer` enum('TesseractHorizontal','TesseractVertical','PaddleOCRv4','PaddleOCRv3') NOT NULL,
`confidence` tinyint unsigned NOT NULL,
`text` text NOT NULL,
PRIMARY KEY (`imageId`,`frameIndex`,`centerPointX`,`centerPointY`,`width`,`height`,`rotationDegrees`,`recognizer`)
) ENGINE=InnoDB CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
由pgloader在pgsql端按照2.3.
中mysql://
源下默认的CASTING
所创建:
create table tbm."tbmi_ocr_box_Latn" (
"imageId" bigint not null,
"frameIndex" bigint not null,
"centerPointX" integer not null,
"centerPointY" integer not null,
width integer not null,
height integer not null,
"rotationDegrees" integer not null,
recognizer "tbmi_ocr_box_Latn_recognizer" not null,
confidence smallint not null,
text text not null,
primary key ("imageId", "frameIndex", "centerPointX", "centerPointY", width, height, "rotationDegrees", recognizer)
);
5.1. 根据某so纯路人
symmdifftable有4种写法:
-- symmetric difference FULL OUTER JOIN USING (...columns)
SELECT COUNT(*) FROM mysql."tbmi_ocr_box_Latn" a FULL OUTER JOIN ONLY tbm."tbmi_ocr_box_Latn" b
USING ("imageId", "frameIndex", "centerPointX", "centerPointY", "width", "height", "rotationDegrees", "recognizer", "confidence", "text")
WHERE (a."imageId" IS NULL AND a."frameIndex" IS NULL AND a."centerPointX" IS NULL AND a."centerPointY" IS NULL AND a."width" IS NULL AND a."height" IS NULL AND a."rotationDegrees" IS NULL AND a."recognizer" IS NULL AND a."confidence" IS NULL AND a."text" IS NULL)
OR (b."imageId" IS NULL AND b."frameIndex" IS NULL AND b."centerPointX" IS NULL AND b."centerPointY" IS NULL AND b."width" IS NULL AND b."height" IS NULL AND b."rotationDegrees" IS NULL AND b."recognizer" IS NULL AND b."confidence" IS NULL AND b."text" IS NULL);
1 row retrieved starting from 1 in 5 s 745 ms (execution: 5 s 698 ms, fetching: 47 ms)
5.1.1.1. 一键生成
SELECT
'SELECT COUNT(*) FROM mysql."' ||
table_name ||
'" a FULL OUTER JOIN ONLY tbm."' ||
table_name ||
E'" b \n USING ("' ||
string_agg(column_name, '", "' ORDER BY ordinal_position) ||
E'")\nWHERE (a."' ||
string_agg(column_name, '" IS NULL AND a."' ORDER BY ordinal_position) ||
E'" IS NULL)\n OR (b."' ||
string_agg(column_name, '" IS NULL AND b."' ORDER BY ordinal_position) ||
'" IS NULL);'
FROM information_schema.columns
WHERE table_schema = 'tbm' AND table_name = 'tbmi_ocr_box_Latn'
GROUP BY table_name;
5.1.1.2. 也可以使用NATURAL FULL OUTER JOIN ONLY
https://www.postgresql.org/docs/current/queries-table-expressions.html 省略USING (...)
Finally, NATURAL is a shorthand form of USING: it forms a USING list consisting of all column names that appear in both input tables. As with USING, these columns appear only once in the output table. If there are no common column names, NATURAL JOIN behaves like JOIN ... ON TRUE, producing a cross-product join.
5.1.2. 但5.1.1.
和5.1.1.2.
并不能用于有列类型允许NULL
之存在的表如7.
中的tbm_bilibiliVote
表有着
[...]
`authorExpGrade` int DEFAULT NULL,
`voteBy` text CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci,
`voteFor` text CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci,
`replyContent` json DEFAULT NULL,
[...]
因为根据sql特色之 https://en.wikipedia.org/wiki/Three-valued_logic NULL = NULL
是UNKNOWN
而UNKNOWN
转bool
是false
于是不会被ON
/WHERE
等clause视作相同 https://stackoverflow.com/questions/14366004/sql-server-join-missing-null-values
5.1.2.1. 因而需要将5.1.1.
中的USING (...columns)
展开成对所有列=
operator以及额外的OR (IS NULL AND IS NULL)
SELECT COUNT(*) FROM mysql."tbm_bilibiliVote" a FULL OUTER JOIN ONLY tbm."tbm_bilibiliVote" b ON
(a."pid" = b."pid") AND
(a."authorUid" = b."authorUid") AND
(a."authorExpGrade" = b."authorExpGrade" OR (a."authorExpGrade" IS NULL AND b."authorExpGrade" IS NULL)) AND
(a."isValid" = b."isValid") AND
(a."voteBy" = b."voteBy" OR (a."voteBy" IS NULL AND b."voteBy" IS NULL)) AND
(a."voteFor" = b."voteFor" OR (a."voteFor" IS NULL AND b."voteFor" IS NULL)) AND
(a."replyContent" = b."replyContent") AND
(a."postTime" = b."postTime")
WHERE (a."pid" IS NULL AND a."authorUid" IS NULL AND a."authorExpGrade" IS NULL AND a."isValid" IS NULL AND a."voteBy" IS NULL AND a."voteFor" IS NULL AND a."replyContent" IS NULL AND a."postTime" IS NULL)
OR (b."pid" IS NULL AND b."authorUid" IS NULL AND b."authorExpGrade" IS NULL AND b."isValid" IS NULL AND b."voteBy" IS NULL AND b."voteFor" IS NULL AND b."replyContent" IS NULL AND b."postTime" IS NULL);
5.1.2.1.1. 不对NOT NULL
的列做额外判断是为了避免7.1.
中的
FULL JOIN is only supported with merge-joinable or hash-joinable join conditions
5.1.2.2. 一键生成
SELECT
'SELECT COUNT(*) FROM mysql."' ||
table_name ||
'" a FULL OUTER JOIN ONLY tbm."' ||
table_name ||
E'" b ON\n' ||
string_agg(
' (a."' ||
column_name ||
'" = b."' ||
column_name ||
CASE WHEN is_nullable::bool = true THEN
'" OR (a."' ||
column_name ||
'" IS NULL AND b."' ||
column_name ||
'" IS NULL))'
ELSE '")'
END,
E' AND\n' ORDER BY ordinal_position
) ||
E'\nWHERE (a."' ||
string_agg(column_name, '" IS NULL AND a."' ORDER BY ordinal_position) ||
E'" IS NULL)\n OR (b."' ||
string_agg(column_name, '" IS NULL AND b."' ORDER BY ordinal_position) ||
'" IS NULL);'
FROM information_schema.columns
WHERE table_schema = 'tbm' AND table_name = 'tbm_bilibiliVote'
GROUP BY table_name;
-- symmetric difference FULL OUTER JOIN row(table)
SELECT COUNT(*) FROM mysql."tbmi_ocr_box_Latn" a FULL OUTER JOIN ONLY tbm."tbmi_ocr_box_Latn" b ON row(a) = row(b) WHERE a IS NULL OR b IS NULL;
1 row retrieved starting from 1 in 6 s 464 ms (execution: 6 s 345 ms, fetching: 119 ms)
5.1.3.1. 其中row(t)
是 https://www.postgresql.org/docs/current/sql-expressions.html#SQL-SYNTAX-ROW-CONSTRUCTORS
5.1.3.2. 这也避免了5.1.2.
5.1.4. https://stackoverflow.com/questions/6337871/how-can-i-speed-up-a-diff-between-tables
-- non symmetric difference EXCEPT mysql
SELECT COUNT(*) FROM (SELECT * FROM ONLY tbm."tbmi_ocr_box_Latn" a EXCEPT SELECT * FROM mysql."tbmi_ocr_box_Latn" b) t;
1 row retrieved starting from 1 in 5 s 137 ms (execution: 4 s 903 ms, fetching: 234 ms)
由于EXCEPT
不是symmdiff之FULL OUTER JOIN
而是SELECT a.* FROM a LEFT OUTER JOIN b
所以只能执行两遍2x耗时正如同没有FULL OUTER JOIN
可用的mysql人 https://stackoverflow.com/questions/4796872/how-can-i-do-a-full-outer-join-in-mysql
-- non symmetric difference EXCEPT tbm
SELECT COUNT(*) FROM (SELECT * FROM mysql."tbmi_ocr_box_Latn" a EXCEPT SELECT * FROM ONLY tbm."tbmi_ocr_box_Latn" b) t;
1 row retrieved starting from 1 in 5 s 793 ms (execution: 5 s 527 ms, fetching: 266 ms)
5.1.4.1. 由于未知bug
-- symmetric difference EXCEPT
SELECT COUNT(*) FROM (
SELECT * FROM ONLY tbm."tbmi_ocr_box_Latn" a EXCEPT SELECT * FROM mysql."tbmi_ocr_box_Latn" b
UNION ALL
SELECT * FROM mysql."tbmi_ocr_box_Latn" a EXCEPT SELECT * FROM ONLY tbm."tbmi_ocr_box_Latn" b
) t;
UNION ALL
后始终为0行即便单独查询有行
1 row retrieved starting from 1 in 8 s 678 ms (execution: 8 s 595 ms, fetching: 83 ms)
5.1.4.2. 这也避免了5.1.2.
5.1.5. 典型的误用NOT EXISTS
-- non symmetric difference WHERE NOT EXISTS mysql
SELECT COUNT(*) FROM (SELECT * FROM ONLY tbm."tbmi_ocr_box_Latn" a WHERE NOT EXISTS (SELECT * FROM mysql."tbmi_ocr_box_Latn" b)) t;
1 row retrieved starting from 1 in 764 ms (execution: 250 ms, fetching: 514 ms)
因为SELECT * FROM mysql."tbmi_ocr_box_Latn"
在此显然返回非0行
(因而只需要读一行mysql."tbmi_ocr_box_Latn"
几百ms后就会立即返回)从而使得该predicate在非空表上恒真
-- non symmetric difference WHERE NOT EXISTS tbm
SELECT COUNT(*) FROM (SELECT * FROM mysql."tbmi_ocr_box_Latn" a WHERE NOT EXISTS (SELECT * FROM ONLY tbm."tbmi_ocr_box_Latn" b)) t;
1 row retrieved starting from 1 in 516 ms (execution: 125 ms, fetching: 391 ms)
5.1.5.1. 由于未知bug
-- symmetric difference WHERE NOT EXISTS
SELECT COUNT(*) FROM (
SELECT * FROM ONLY tbm."tbmi_ocr_box_Latn" a WHERE NOT EXISTS (SELECT * FROM mysql."tbmi_ocr_box_Latn" b)
UNION ALL
SELECT * FROM mysql."tbmi_ocr_box_Latn" a WHERE NOT EXISTS (SELECT * FROM ONLY tbm."tbmi_ocr_box_Latn" b)
) t;
UNION ALL
后始终为0行
即便单独查询有非0行
1 row retrieved starting from 1 in 776 ms (execution: 203 ms, fetching: 573 ms)
5.1.6. 除5.1.5.
外的3种执行期间均需pgsql消耗1x表大小
的内存(总比3.2.
好)和3x表大小
的临时表
存储
5.1.6.1. 可以修改临时表空间
在fs上的默认路径/var/lib/postgresql/16/main/base/pgsql_tmp
https://dba.stackexchange.com/questions/170661/in-postgres-how-do-i-adjust-the-pgsql-tmp-setting/170665#170665
tmp={目的路径} &&
mkdir -p $tmp /PG_16_202307071 &&
chown -R postgres: $tmp &&
chmod -R 700 $tmp
create tablespace temp_tablespace owner postgres location '{目的路径}';
grant create on tablespace temp_tablespace to public;
alter system set temp_tablespaces = temp_tablespace;
select pg_reload_conf();
create temp table a(b text);
5.2. 总行数对比
SELECT a, b, b - a diff FROM (SELECT (SELECT COUNT(*) FROM mysql."tbmi_ocr_box_Latn") a, (SELECT COUNT(*) FROM ONLY tbm."tbmi_ocr_box_Latn") b) t;
a | b | diff
--------+--------+--------
472655 | 472655 | 0
(1 row)
- 令文件
compare.sql
有在mysql中codegen出基于5.1.3.
、4.3.
和5.2.
校验用的一键生成
pgsql
SELECT CONCAT(
'CREATE FOREIGN TABLE IF NOT EXISTS mysql."',
TABLE_NAME,
'"() INHERITS (tbm."',
TABLE_NAME,
'") SERVER mysql OPTIONS (dbname \'tbm\', table_name \'',
TABLE_NAME,
'\');\n',
'SELECT a, b, b - a diff FROM (SELECT (SELECT COUNT(*) FROM mysql."',
TABLE_NAME,
'") a, (SELECT COUNT(*) FROM ONLY tbm."',
TABLE_NAME,
'") b) t;\n',
'SELECT COUNT(*) FROM mysql."',
TABLE_NAME,
'" a FULL OUTER JOIN ONLY tbm."',
TABLE_NAME,
'" b ON row(a) = row(b) WHERE a IS NULL OR b IS NULL;'
)
FROM information_schema.TABLES
WHERE TABLE_SCHEMA = 'tbm'
AND TABLE_NAME REGEXP '^tbm(_|i_(?!imageInReply)|c_f(78579|2265748|17019292|25459979|27546680))'
ORDER BY TABLE_NAME;
REGEXP部分同2.1.
6.1. 一键bash
cd pgloader && # 1.1.1.中git clone的pgloader目录路径
(
/usr/bin/time -v build/bin/pgloader schema.load &&
/usr/bin/time -v build/bin/pgloader data.load &&
PGPASSWORD=password /usr/bin/time -v psql -aUusername tbm < compare.sql
) 2>&1 | tee -a stdout+err
- 设有表
tbm_bilibiliVote
结构 https://github.com/n0099/bilibiliVote
mysql端:
CREATE TABLE `tbm_bilibiliVote` (
`pid` bigint NOT NULL,
`authorUid` bigint NOT NULL,
`authorExpGrade` int DEFAULT NULL,
`isValid` tinyint(1) NOT NULL,
`voteBy` text CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci,
`voteFor` text CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci,
`replyContent` json DEFAULT NULL,
`postTime` datetime NOT NULL,
PRIMARY KEY (`pid`),
KEY `authorUid` (`authorUid`),
KEY `isVaild` (`isValid`) USING BTREE,
KEY `voteFor` (`voteFor`(100),`isValid`) USING BTREE,
KEY `postTime` (`postTime`)
) ENGINE=InnoDB CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
由pgloader在pgsql端按照2.3.
中mysql://
源下默认的CASTING
所创建:
create table tbm."tbm_bilibiliVote" (
pid bigint primary key not null,
"authorUid" bigint not null,
"authorExpGrade" integer,
"isValid" boolean not null,
"voteBy" text,
"voteFor" text,
"replyContent" json,
"postTime" timestamp with time zone not null
);
create index "idx_163841_authorUid" on "tbm_bilibiliVote" using btree ("authorUid");
create index "idx_163841_isVaild" on "tbm_bilibiliVote" using btree ("isValid");
create index "idx_163841_postTime" on "tbm_bilibiliVote" using btree ("postTime");
create index "idx_163841_voteFor" on "tbm_bilibiliVote" using btree ("voteFor", "isValid");
7.1. 使用5.1.3.
中的symmetric difference FULL OUTER JOIN row(table)
SELECT COUNT(*) FROM mysql."tbm_bilibiliVote" a FULL OUTER JOIN ONLY tbm."tbm_bilibiliVote" b ON row(a) = row(b) WHERE a IS NULL OR b IS NULL;
ERROR: FULL JOIN is only supported with merge-joinable or hash-joinable join conditions
7.1.1. 使用5.1.5.
中的non symmetric difference EXCEPT
SELECT COUNT(*) FROM (SELECT * FROM ONLY tbm."tbm_bilibiliVote" a EXCEPT SELECT * FROM mysql."tbm_bilibiliVote" b) t;
SELECT COUNT(*) FROM (SELECT * FROM mysql."tbm_bilibiliVote" a EXCEPT SELECT * FROM ONLY tbm."tbm_bilibiliVote" b) t;
ERROR: could not identify an equality operator for type json
7.1.2. 执行5.1.2.1.
SELECT * FROM mysql."tbm_bilibiliVote" a FULL OUTER JOIN ONLY tbm."tbm_bilibiliVote" b ON
a."authorUid" = b."authorUid" AND
a."isValid" = b."isValid" AND
a."pid" = b."pid" AND
a."postTime" = b."postTime" AND
a."replyContent" = b."replyContent" AND
a."voteBy" = b."voteBy" AND
a."voteFor" = b."voteFor" AND
a."authorExpGrade" = b."authorExpGrade"
WHERE (a."pid" IS NULL AND a."authorUid" IS NULL AND a."authorExpGrade" IS NULL AND a."isValid" IS NULL AND a."voteBy" IS NULL AND a."voteFor" IS NULL AND a."replyContent" IS NULL AND a."postTime" IS NULL)
OR (b."pid" IS NULL AND b."authorUid" IS NULL AND b."authorExpGrade" IS NULL AND b."isValid" IS NULL AND b."voteBy" IS NULL AND b."voteFor" IS NULL AND b."replyContent" IS NULL AND b."postTime" IS NULL);
会有 https://stackoverflow.com/questions/32843213/operator-does-not-exist-json-json
ERROR: operator does not exist: json = json
Hint: No operator matches the given name and argument types. You might need to add explicit type casts.
7.1.3. 7.
中可见列replyContent
是json
类型
- a."replyContent" = b."replyContent" AND
+ a."replyContent"::jsonb = b."replyContent"::jsonb AND
count |
---|
29510 |
这恰好等于5.2.
SELECT COUNT(*) FROM mysql."tbm_bilibiliVote" UNION ALL SELECT COUNT(*) FROM ONLY tbm."tbm_bilibiliVote";
count |
---|
14755 |
14755 |
之和
7.1.4. 但即便是json::text
- a."replyContent" = b."replyContent" AND
+ a."replyContent"::text = b."replyContent"::text AND
也仍然相同
7.1.5. 可以在mysql端将该json
列转text
ALTER TABLE tbm_bilibiliVote MODIFY replyContent TEXT NOT NULL;
后再重新按照2.2.
仅导入该表(INCLUDING ONLY TABLE NAMES MATCHING 'tbm_bilibiliVote'
)
7.2. 但即便两端的列replyContent
重新导入后均为text
类型也仍然有所有行
不同
实际上是由于列postTime
mysql无时区类型datetime
由2.3.
中的pgloadermysql://
默认的CASTING
type datetime to timestamptz
转为pgsql有时区类型timestamptz
时由pgloader根据系统时区
UTC+8
而非pgsql时区
UTC+0
转换为UTC
SELECT a.pid, a."postTime", b."postTime" FROM mysql."tbm_bilibiliVote" a JOIN ONLY tbm."tbm_bilibiliVote" b ON a.pid = b.pid ORDER BY pid;
pid | postTime | postTime |
---|---|---|
124498114574 | 2019-03-10 12:38:17.000000 +00:00 | 2019-03-10 04:38:17.000000 +00:00 |
124498200265 | 2019-03-10 12:44:12.000000 +00:00 | 2019-03-10 04:44:12.000000 +00:00 |
... |
7.2.1. 由于timestamptz
实际上并没有额外存储时区UTC offset
https://stackoverflow.com/questions/5876218/difference-between-timestamps-with-without-time-zone-in-postgresql
因而其并不能解决 https://z.n0099.net/#narrow/near/86397 https://old.reddit.com/r/PostgreSQL/comments/xpygbh/when_would_i_ever_use_timestamp_over_timestamptz/ https://news.ycombinator.com/item?id=20212671 中争论的带时区引用未来指定datetime
问题
https://www.w3.org/International/wiki/WorkingWithTimeZones#Past_and_Future_Events
The SQL data types 'date', 'time', and 'timestamp' are field based time values which are intended to be zone offset independent: they are actually, technically, floating time values! The data type 'timestamp with time zone' is the zone offset-dependent equivalent of 'timestamp' in SQL.
https://www.w3.org/International/wiki/WorkingWithTimeZones#Floating_Time
- Floating Time
Some observed time values are not related to a specific moment in incremental time. Instead, they need to be combined with local information to determine a range of acceptable incremental time values. We refer to these sorts of time values as "floating times". Floating times are not attached and should never be attached to a particular time zone.
Examples of floating time events include a user’s birth date, an employee’s hire or termination date, or a list of company holidays.
For example, suppose that January 1st is considered a holiday in your application. The day "January 1" has a floating-time status as a "holiday". That "day" can begin as early as midnight GMT-14:00 and end as late as midnight of January 2 GMT+12:00, depending on local time. This covers an incremental time range of fifty hours.
再叠加mysql_fdw不论当前pgsql时区
https://www.postgresql.org/docs/current/datatype-datetime.html#DATATYPE-TIMEZONES 是什么都不会在将从mysql读到的datetime
转换为timestamptz
时根据时区重新计算时间而是只修改offset(也就是使用跟mysql类型datetime
语义相同的pgsql类型timestamp
的语义)
SET SESSION TIME ZONE 'Asia/Shanghai';
SELECT a.pid, a."postTime", b."postTime" FROM mysql."tbm_bilibiliVote" a JOIN ONLY tbm."tbm_bilibiliVote" b ON a.pid = b.pid ORDER BY pid;
pid | postTime | postTime |
---|---|---|
124498114574 | 2019-03-10 12:38:17.000000 +08:00 | 2019-03-10 12:38:17.000000 +08:00 |
124498200265 | 2019-03-10 12:44:12.000000 +08:00 | 2019-03-10 12:44:12.000000 +08:00 |
因而只需要修改pgsql时区
SET SESSION TIME ZONE 'Asia/Shanghai';
SELECT COUNT(*) FROM ONLY mysql."tbm_bilibiliVote" a FULL OUTER JOIN ONLY tbm."tbm_bilibiliVote" b ON row(a) = row(b) WHERE a IS NULL OR b IS NULL;
便校验一致
7.2.2. 如果执行了7.1.5.
需再在pgsql端将该列类型改回json
或jsonb
DROP FOREIGN TABLE mysql."tbm_bilibiliVote"; -- 避免`ALTER TABLE`由于`4.3.2.`而返回假阳性之`ERROR: "tbm_bilibiliVote" is not a table`
ALTER TABLE tbm."tbm_bilibiliVote" ALTER "replyContent" TYPE jsonb USING "replyContent"::jsonb;
-- 修改`6.`中的`TABLE_NAME REGEXP '...'`predicate为`TABLE_NAME = 'tbm_bilibiliVote'`并eval其输出以重新执行校验
VACUUM FULL tbm."tbm_bilibiliVote";
- 设有表
tbmi_metadata_gif
结构:
CREATE TABLE `tbmi_metadata_gif` (
`imageId` int unsigned NOT NULL,
`repeatCount` smallint unsigned NOT NULL,
`colorTableMode` enum('Global','Local') NOT NULL,
`globalColorTableLength` int NOT NULL,
`comments` json DEFAULT NULL,
PRIMARY KEY (`imageId`)
) ENGINE=InnoDB CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
由pgloader在pgsql端按照2.3.
中mysql://
源下默认的CASTING
所创建:
create table tbm.tbmi_metadata_gif (
"imageId" bigint primary key not null,
"repeatCount" integer not null,
"colorTableMode" "tbmi_metadata_gif_colorTableMode" not null,
"globalColorTableLength" integer not null,
comments json
);
8.1. 按照7.1.3.
SELECT COUNT(*) FROM mysql."tbmi_metadata_gif" a FULL OUTER JOIN ONLY tbm."tbmi_metadata_gif" b ON
(a."imageId" = b."imageId") AND
(a."repeatCount" = b."repeatCount") AND
(a."colorTableMode" = b."colorTableMode") AND
(a."globalColorTableLength" = b."globalColorTableLength") AND
(a."comments"::jsonb = b."comments"::jsonb OR (a."comments" IS NULL AND b."comments" IS NULL))
WHERE (a."imageId" IS NULL AND a."repeatCount" IS NULL AND a."colorTableMode" IS NULL AND a."globalColorTableLength" IS NULL AND a."comments" IS NULL)
OR (b."imageId" IS NULL AND b."repeatCount" IS NULL AND b."colorTableMode" IS NULL AND b."globalColorTableLength" IS NULL AND b."comments" IS NULL);
有
ERROR: unsupported Unicode escape sequence
Detail: \u0000 cannot be converted to text.
Where: JSON data, line 1: ... ", "Created by TechSmith\u0000...
8.1.1. 但按照7.1.4.
就无差异
- (a."comments"::jsonb = b."comments"::jsonb OR (a."comments" IS NULL AND b."comments" IS NULL))
+ (a."comments"::text = b."comments"::text OR (a."comments" IS NULL AND b."comments" IS NULL))
8.1.2. 这是由于jsonb
类型不同于本质text
的json
,会将json解析后存储为pgsql内置类型之集,再由于text
类型受c人最爱的
https://en.wikipedia.org/wiki/Null-terminated_string 影响不支持存储256个字节
中的唯一一个0x00
导致jsonb
也无法存储\u0000
转义后的0x00
进其内部的text
turbot/steampipe-postgres-fdw#118
https://www.postgresql.org/docs/current/datatype-json.html
The jsonb type also rejects \u0000 (because that cannot be represented in PostgreSQL's text type)
Table 8.23. JSON Primitive Types and Corresponding PostgreSQL Types
JSON primitive type PostgreSQL type Notes string text \u0000 is disallowed, as are Unicode escapes representing characters not available in the database > encoding number numeric NaN and infinity values are disallowed boolean boolean Only lowercase true and false spellings are accepted null (none) SQL NULL is a different concept
8.2. 该表该列中共有如下\u0000
https://codepoints.net/U+0000
SELECT COUNT(*) count, comments::text c
FROM ONLY tbm."tbmi_metadata_gif"
WHERE comments::text LIKE '%\u0000%'
GROUP BY c
ORDER BY count DESC;
count | c |
---|---|
31 | [" ", "Created by TechSmith\u0000"] |
2 | ["tvc\u0000"] |
1 | ["Author:\t\tGod\_job\_dave\u0000\u0000\u0000???\u0001\u0000\u0000\u0000\u0000P??\u0001\u0000"] |
1 | ["Optimized by Ulead SmartSaver!\u0000"] |
1 | ["CREATOR: gd-jpeg v1.0 \(using IJG JPEG v62\), quality = 75\n\u0000"] |
8.2.1. 使用如下3层如同6.
中codegen迫真元编程
自我复制的quine病毒
可在mysql端检查所有含有0x00
的text
类型及其各个长度变种 https://stackoverflow.com/questions/13932750/tinytext-text-mediumtext-and-longtext-maximum-storage-sizes 列并生成ALTER
列类型为BLOB
(pgloader在pgsql端按照2.3.
中mysql://
源下默认的CASTING
会转换为可存储所有256个字节
的bytea
)和SELECT
查阅的一键生成
mysql:
SET SESSION group_concat_max_len = 18446744073709551615; -- 同`3.1.`
SELECT CONCAT(
'SELECT t.*, CONCAT(
\'ALTER TABLE tbm.`\',
t,
\'` MODIFY `\',
c,
\'` \',
REPLACE(COLUMN_TYPE, \'text\', \'blob\'),
CASE WHEN COLUMNS.IS_NULLABLE = \'true\' THEN \' NULL;\'
ELSE \' NOT NULL;\'
END
) `alter`, CONCAT(
\'SELECT * FROM tbm.`\',
t,
\'` WHERE CAST(`\',
c,
\'` AS BINARY) LIKE CONCAT(\'\'%\'\', 0x00, \'\'%\'\');\'
) `select`
FROM (\n',
GROUP_CONCAT(
' SELECT COUNT(*) count, \'',
TABLE_NAME,
'\' t, \'',
COLUMN_NAME,
'\' c FROM `',
TABLE_NAME,
'` WHERE CAST(`',
COLUMN_NAME,
'` AS BINARY) LIKE CONCAT(\'%\', 0x00, \'%\')'
SEPARATOR ' UNION ALL\n'
),
'\n) t
JOIN information_schema.COLUMNS
ON count != 0 AND TABLE_SCHEMA = \'tbm\'
AND TABLE_NAME = t AND COLUMN_NAME = c;'
)
FROM information_schema.COLUMNS
WHERE TABLE_SCHEMA = 'tbm' AND COLUMN_TYPE LIKE '%text%'
AND TABLE_NAME REGEXP '^tbm(_|i_(?!imageInReply)|c_f(78579|2265748|17019292|25459979|27546680))'
ORDER BY TABLE_NAME;
REGEXP部分同2.1.
count | t | c | alter | select |
---|---|---|---|---|
4321 | tbmi_metadata_embedded_exif | userComment | ALTER TABLE tbm.`tbmi_metadata_embedded_exif` MODIFY `userComment` BLOB NOT NULL; |
SELECT * FROM tbm.`tbmi_metadata_embedded_exif` WHERE CAST(`userComment` AS BINARY) LIKE CONCAT('%', 0x00, '%'); |
2612 | tbmi_metadata_embedded_exif | xpAuthor | ALTER TABLE tbm.`tbmi_metadata_embedded_exif` MODIFY `xpAuthor` BLOB NOT NULL; |
SELECT * FROM tbm.`tbmi_metadata_embedded_exif` WHERE CAST(`xpAuthor` AS BINARY) LIKE CONCAT('%', 0x00, '%'); |
0.00s user 0.02s system 0% cpu 42:49.75 total
8.3. 值得注意的是2.2.
导入 dimitri/pgloader#1573 不会有截断0x00
后字节的WARNING
且6.
校验 EnterpriseDB/mysql_fdw#299 过程中也无法发现以0x00
结尾text
中的最后一个0x00
字节消失
- 设有表
tbmi_hash
结构:
CREATE TABLE `tbmi_hash` (
`imageId` int unsigned NOT NULL,
`frameIndex` int unsigned NOT NULL,
`pHash` binary(8) NOT NULL,
`averageHash` bigint unsigned NOT NULL,
`blockMeanHash` binary(32) NOT NULL,
`marrHildrethHash` binary(72) NOT NULL,
`thumbHash` tinyblob NOT NULL,
PRIMARY KEY (`imageId`,`frameIndex`)
) ENGINE=InnoDB CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
由pgloader在pgsql端按照2.3.
中mysql://
源下默认的CASTING
所创建:
create table tbm.tbmi_hash (
"imageId" bigint not null,
"frameIndex" bigint not null,
"pHash" bytea not null,
"averageHash" bigint not null,
"blockMeanHash" bytea not null,
"marrHildrethHash" bytea not null,
"thumbHash" bytea not null,
primary key ("imageId", "frameIndex")
);
9.1. 由于pgsql没有mysql特有(isosql
不要求)带unsigned
修饰的的int类型 https://stackoverflow.com/questions/20810134/why-unsigned-integer-is-not-available-in-postgresql/59802732#59802732 正如同m$ft精神mvp
最爱的中小实体企业c#工控上位机人上壬
https://z.n0099.net/#narrow/near/94726 除非使用扩展 https://github.com/petere/pguint 因而
- `averageHash` bigint unsigned NOT NULL,
+ "averageHash" bigint not null,
导致该列上限值从
9.2. 实际上所谓的opencv_imghash
lumina37/aiotieba#63 (comment) 之averageHash
可运算的正整数
语义之类型存储而应直接视作斑点二进制
https://en.wikipedia.org/wiki/Binary_large_object
ALTER TABLE tbmi_hash ADD COLUMN averageHash_2 BINARY(8) NOT NULL AFTER averageHash;
UPDATE tbmi_hash SET averageHash_2 = UNHEX(LPAD(HEX(averageHash), 16, '00')); -- https://stackoverflow.com/questions/58611436/how-do-you-convert-an-integer-into-its-condensed-binary-equivalent-e-g-binary3/58657797#comment138021380_58657797
SELECT COUNT(*) FROM tbmi_hash WHERE averageHash_2 != UNHEX(LPAD(HEX(averageHash), 16, '00'));
-- expecting 0 rows returned, then:
-- ALTER TABLE tbmi_hash DROP COLUMN averageHash, RENAME COLUMN averageHash_2 TO averageHash;
9.3. 一键生成
找出所有存在值 bigint unsigned
列的mysql
SET SESSION group_concat_max_len = 18446744073709551615; -- 同`3.1.`
SELECT CONCAT(
'SELECT * FROM (\n',
GROUP_CONCAT(
' SELECT COUNT(*) c, \'',
TABLE_NAME,
'\' t FROM `',
TABLE_NAME,
'` WHERE `',
COLUMN_NAME,
'` > 9223372036854775807'
SEPARATOR ' UNION ALL\n'
),
'\n) t WHERE c != 0;'
)
FROM information_schema.COLUMNS
WHERE TABLE_SCHEMA = 'tbm' AND COLUMN_TYPE = 'bigint unsigned'
AND TABLE_NAME REGEXP '^tbm(_|i_(?!imageInReply)|c_f(78579|2265748|17019292|25459979|27546680))'
ORDER BY TABLE_NAME;
REGEXP部分同2.1.
9.4.
DROP TABLE tbm."tbmi_hash" CASCADE;
类比7.1.5.
重新导入该表
CREATE FOREIGN TABLE IF NOT EXISTS mysql."tbmi_hash"()
INHERITS (tbm."tbmi_hash")
SERVER mysql
OPTIONS (dbname 'tbm', table_name 'tbmi_hash');