uri is a extension to add uri data type for postgresql, it allows to insert data in uri format and provide all functions to extract uri parts, validate and compare uris. Some functions are provided to check if an uri exists, check size and content-type of remote uri (using libcurl) and local uri.
It has the following operators =
, <>
, <
, <=
, >
, >=
, @>
and <@
.
uri columns can be indexed using btree and hash indexes.
Only hierarchical URI are supported by this extension. http
, file
, ssh
,
telnet
and ftp
scheme are hierarchical URI; mailto
, urn
, tag
and about
aren't.
The uri extension needs libcurl-dev to be installed:
apt-get install libcurl4-openssl-dev
or
yum install libcurl-openssl-devel
You need to install uriparser. Use
apt-get install liburiparser1 liburiparser-dev
or
yum install uriparser uriparser-devel
Extraction of mime type from a local file is done with the use of libmagic. libmagic
is found with all Linux or BSD like distributions and comes with the file
command, you will be able to install the
development binary package using
apt-get install libmagic-dev
or
yum install file-devel
Use the following command to build and install the extension,
pg_config
must be in found from your PATH environment variable.
make
sudo make install
To test the extension run:
make installcheck
To use the extension in your database execute:
CREATE EXTENSION uri;
The extension provide some useful functions to access all parts of a URI per RFC 3986.
uri_get_scheme(uri)
returns protocol part of uri as texturi_get_auth(uri)
returns user part of uri as texturi_get_host(uri)
returns host part of uri as texturi_get_port(uri)
returns port part of uri as texturi_get_portnum(uri)
returns port part uri as integeruri_get_path(uri)
returns path part of uri as texturi_get_query(uri)
returns query part of uri as texturi_get_fragment(uri)
returns fragment part of uri as text
The following are two URI examples and their component parts:
foo://example.com:8042/over/there?name=ferret#nose
\_/ \______________/\_________/ \_________/ \__/
| | | | |
scheme authority path query fragment
Other functions:
uri_get_str(uri)
returns uri as texturi_is_absolute(uri)
returns true if uri is absoluteuri_is_absolute_path(uri)
returns true if uri path is absoluteuri_localpath_exists(uri)
returns true if uri exists as a regular local path (not symlink).uri_remotepath_exists(uri)
returns true if uri exists as a remote url.uri_path_exists(uri)
returns true if uri exists as a local regular path (not symlink) or remote url (local/remote is autodetected).uri_localpath_content_type(uri)
returns the content_type of a local file using libmagic.uri_remotepath_content_type(uri)
returns the content_type of a remote url.uri_path_content_type(uri)
returns the content_type of the url (local/remote is autodetected).uri_localpath_size(uri)
returns the size of a local regular file (not symlink).uri_remotepath_size(uri)
returns the size of a remote url.uri_path_size(uri)
returns the size of a local path (not symlink) or remote url (local/remote is autodetected).uri_escape(text)
returns the encoded URL of the given string using curl_easy_escape().uri_unescape(text)
returns the decoded URL of the given string using curl_easy_unescape().uri_rebase_url(uri, uri)
returns the url resulting from a uri path rebased on an other uri. A base always end with a / any extra path after last / is removed from the base.uri_get_relative_path(uri, uri)
returns the path of a URI relative to its base.uri_remotepath_header(uri, text)
returns the HTTP header of the remote file as 'text' or 'json' following the format specified as second argument.
In all functions URIs are normalized as they are parsed. Normalisation is performed according to section 6.2.2 of
RFC3986, and includes adjusting the case of any scheme, hostname and percent-encoded characters so as to be
consistent, as well as removing redundant components from the path (for example, a path of /a/b/c/../d/../../e
will
be normalised to /a/e
).
This also mean that this extension stores normalized URI and not the original string. For example:
test_uri=# INSERT INTO t1 (url) VALUES ('file:///etc/postgresql/9.3/main/../../9.6/main/postgresql.conf');
INSERT 0 1
test_uri=# SELECT * FROM t1 WHERE url = 'file:///etc/postgresql/9.3/main/../../9.6/main/postgresql.conf';
id | url
----+-------------------------------------------------
1 | file:///etc/postgresql/9.6/main/postgresql.conf
test_uri=# SELECT * FROM t1 WHERE url = 'file:///etc/postgresql/9.6/main/postgresql.conf';
id | url
----+-------------------------------------------------
1 | file:///etc/postgresql/9.6/main/postgresql.conf
If you want to retrieve the original value file:///etc/postgresql/9.3/main/../../9.6/main/postgresql.conf
this will not be possible anymore.
Function uri_localpath_content_type()
, uri_localpath_exists()
, uri_localpath_size()
, uri_remotepath_content_type()
,
uri_remotepath_exists()
and uri_remotepath_size()
and so on meta function uri_path_content_type()
, uri_path_exists()
and uri_path_size()
return NULL on uri access failure. The error returned by libcurl or the operating
system you have to use your own script to test the URL. In the future this may change if users ask for a buildin
function to report URL access error, at now I can't find any reason to do that.
Function uri_remotepath_exists()
returns true when HTTP code returned by libcurl < 400, false when HTTP code 404 is returned
and NULL otherwise (HTTP code >= 400).
Functions uri_remotepath_content_type()
returns the content-type found by libcurl, NULL otherwise. A WARNING with the HTTP
error code is outputed on failure.
Function uri_localpath_content_type()
returns the mime type of the file using libmagic, or the operating error message in
case of failure.
=
check that 2 uri data are equals<>
check that 2 uri data are different<
check that an uri is literally lower than an other<=
check that an uri is literally lower or equal than an other>
check that an uri is literally greater than an other>=
check that an uri is literally greater or equal than an other@>
check that left uri contains right uri<@
check that left uri is contained in right uri
CREATE EXTENSION uri;
CREATE TABLE example (id integer, url uri);
INSERT INTO example VALUES (1, 'http://pgcluu.darold.net/index.html#top');
INSERT INTO example VALUES (2, 'file:///opt/git/uri/README');
INSERT INTO example VALUES (3, 'http://mydomain.com/pub/../public/../index 1.html');
INSERT INTO example VALUES (4, 'http://username:passwd@mydomain.com/pub/index.php?view=menu&detail=1');
INSERT INTO example VALUES (5, 'http://192.168.1.1/');
SELECT * FROM example WHERE url='http://192.168.1.1/';
SELECT uri_get_scheme(url), uri_get_host(url1) FROM example;
SELECT uri_get_auth(url), uri_get_port(url1) FROM example;
SELECT uri_get_path(url), uri_get_query(url1) FROM example;
SELECT uri_get_fragment(url) FROM example;
SELECT * FROM example WHERE uri_get_path(url)='/index.html';
SELECT * FROM example WHERE uri_get_path(url) ~ '.*index.php';
SELECT * FROM example WHERE url @> '192.168.1.1';
SELECT * FROM example WHERE '192.168.1.1' <@ url;
The following example URIs illustrate several URI schemes and variations in their common syntax components:
INSERT INTO example VALUES (1, 'ftp://ftp.is.co.za/rfc/rfc1808.txt');
INSERT INTO example VALUES (2, 'http://www.ietf.org/rfc/rfc2396.txt');
INSERT INTO example VALUES (3, 'ldap://[2001:db8::7]/c=GB?objectClass?one');
INSERT INTO example VALUES (4, 'telnet://192.0.2.16:80/');
INSERT INTO example VALUES (5, 'file:///etc/postgresql/9.4/main/postgresql.conf.dist');
It is possible to create btree or hash indexes on uri data type.
CREATE INDEX test2_index_uri ON example USING btree(url uri_btree_ops);
CREATE INDEX test1_index_uri ON example USING hash(url uri_hash_ops);
The operator classes uri_btree_ops
and uri_hash_ops
can be omitted as they
are used as default operator class for the uri data type. With these operator
classes, all URIs are normalized before they are compared. So if you search records
where uri file:///etc/postgresql/9.3/main/../../9.6/main/postgresql.conf
is found
if will search for file:///etc/postgresql/9.6/main/postgresql.conf
into the index
or the table if the index is not used.
Gilles Darold gilles@darold.net
This extension is free software distributed under the PostgreSQL Licence.
- Copyright (c) 2015-2022, Gilles Darold
Some parts of the code are original code copied from liburi licensed under the terms of the Apache License, Version 2.0. The code have been embeded to the extension to avoid dependencies requiring compilation from sources. liburi can be found here: https://github.com/bbcarchdev/liburi
- Copyright (c) 2012 Mo McRoberts
- Copyright (c) 2014-2017 BBC