Skip to content

CurtTilmes/raku-dbpg

Repository files navigation

DB::Pg – PostgreSQL access for Perl 6

This is a reimplementation of Perl 6 bindings for PostgreSQL's libpq.

A very nice overview article is available in a blog post by Luca Ferrari: A glance at Raku connectivity towards PostgreSQL.

Basic usage

my $pg = DB::Pg.new;  # You can pass in connection information if you want.

Execute a query, and get a single value:

say $pg.query('select 42').value;
# 42

Insert some values using placeholders:

$pg.query('insert into foo (x,y) values ($1,$2)', 1, 'this');

Note, placeholders use the $1, $2, ... syntax instead of ? See PREPARE for more information.

Execute a query returning a row as an array or hash;

say $pg.query('select * from foo where x = $1', 42).array;
say $pg.query('select * from foo where x = $1', 42).hash;

Execute a query returning a bunch of rows as arrays or hashes:

.say for $pg.query('select * from foo').arrays;
.say for $pg.query('select * from foo').hashes;

If you have no placeholders/arguments and aren't retrieving information, you can use execute. It does not PREPARE the query.

$pg.execute('insert into foo (x,y) values (1,2)');

Connection Information

The most basic connection is just to pass in no options:

my $pg = DB::Pg.new;

This is similar to just running psql on the command line. It will take advantage of any of the standard postgres environment variables.

You can also construct a standard libpq connect string. and pass it in.

For example:

my $pg = DB::Pg.new(conninfo => 'host=myhost port=5432 dbname=mydb');

You could also pull those from environment variables with something like this:

my $conninfo = join " ",
        ('dbname=' ~ (%*ENV<DB_NAME> || die("missing DB_NAME in environment"))),
        ("host=$_" with %*ENV<DB_HOST>),
        ("user=$_" with %*ENV<DB_USER>),
        ("password=$_" with %*ENV<DB_PASSWORD>);
my $db = DB::Pg.new(:$conninfo);

But you might as well use the standard PG* environment variables.

You can also just put all your connection information in a Connection Service File and just use a conninfo string like "service=foo".

This is the same as just setting environment variable PGSERVICE=foo.

Connection Caching

Database connection handles are created on demand, and cached for reuse in a connection pool. Similarly, statement handles are prepared, cached and reused.

When the first query is called, a new database connection will be created. After the results are read from the connection, the connection will be returned and cached in the pool. When a later query is performed, that cached connection will be reused.

If multiple simultaneous queries occur, perhaps in different threads, each will get a new connection so they won't interfere with one another.

For example, you can perform database actions while iterating through results from a query:

for $pg.query('select * from foo').hashes -> %h
{
    $pg.query('update bar set ... where x = $1...$2...', %h<x>, %h<y>);
}

You can even do arbitrary queries in multiple threads without worrying about connections:

say await do for ^10 {
    start $pg.query('select $1::int, pg_sleep($1::float/10)', $_).value
}

Connection caching is a nice convenience, but it does require some care from the consumer. If you call query with an imperative statement (insert, update, delete) the connection will automatically be returned to the pool for re-use. For a query that returns results, such as select, in order to reliably return the connection to the pool for the next user, you must do one of two things:

  1. Read all the results. Once the last returned row is read, the database connection handle will automatically get returned for reuse.

  2. Explicitly call .finish on the results object to prematurely return it.

Results

Calling query with a select or something that returns data, a DB::Pg::Results object will be returned.

The query results can be consumed from that object with the following methods:

  • .value - a single scalar result
  • .array - a single array of results from one row
  • .hash - a single hash of results from one row
  • .arrays - a sequence of arrays of results from all rows
  • .hashes - a sequence of hashes of results from all rows

You can also query for some information about the results on the object directly:

  • .rows - Total number of rows returned
  • .columns - List of column names returned
  • .types - List of Perl types of columns returned

For example:

my $results = $pg.query('select * from foo');

say $results.rows;
say $results.columns;
say $results.types;

.say for $results.hashes;

If the query isn't a select or otherwise doesn't return data, such as an INSERT, UPDATE, or DELETE, it will return the number of rows affected.

Database

Though you can just call .query() on the main DB::Pg object, sometimes you want to explicitly manage the database connection. Use the .db method to get a DB::Pg::Database object, and call .finish explicitly on it to return it to the pool when you are finished with it.

The database object also has .query() and .execute() methods, they just don't automatically .finish to return the handle to the pool. You must explicitly do that after use.

These are equivalent:

.say for $pg.query('select * from foo').arrays;
my $db = $pg.db;
.say for $db.query('select * from foo').arrays;
$db.finish;

The database object also has some extra methods for separately preparing and executing a query:

my $db = $pg.db;
my $sth = $db.prepare('insert into foo (x,y) values ($1,$2)');
$sth.execute(1, 'this');
$sth.execute(2, 'that');
$db.finish;

.prepare() returns a DB::Pg::Statement object.

It can be more efficient to perform many actions in this way and avoid the overhead of returning the connection to the pool only to immediately get it back again.

Transactions

The database object can also manage transactions with the .begin, .commit and .rollback methods.

my $db = $pg.db;
my $sth = $db.prepare('insert into foo (x,y) values ($1,$2)');
$db.begin;
$sth.execute(1, 'this');
$sth.execute(2, 'that');
$db.commit;
$db.finish;

The begin/commit ensure that the statements between them happen atomically, either all or none.

Transactions can also dramatically improve performance for some actions, such as performing thousands of inserts/deletes/updates since the indices for the affected table can be updated in bulk once for the entire transaction.

If you .finish the database prior to a .commit, an uncommitted transaction will automatically be rolled back.

As a convenience, .commit also returns the database object, so you can just $db.commit.finish.

Cursors

When a query is performed, all the results from that query are immediately returned from the server to the client. For exceptionally large queries, this can be problematic, both waiting the time for the whole query to execute, and the memory for all the results. Cursors provide a better way.

for $pg.cursor('select * from foo where x = $1', 27) -> @row
{
    say @row;
}

The cursor method will fetch N rows at a time from the server (can be controlled with the :fetch parameter, defaults to 1,000). The :hash parameter can be used to retrieve hashes for the rows instead of arrays.

for $pg.cursor('select * from foo', fetch => 500, :hash) -> %r
{
    say %r;
}

Bulk Copy In

PostgreSQL has a COPY facility for bulk copy in and out of the database.

This is accessed with the DB::Pg::Database methods .copy-data and .copy-end. Pass blocks of data in with .copy-data, and call .copy-end when complete.

my $db = $pg.db;
$db.query('copy foo from stdin (format csv)'); # Any valid COPY command
$db.copy-data("1,2\n4,5\n6,12\n")
$db.copy-end;
$db.finish;

As a convenience, these methods return the database object, so they can easily be chained (though you will probably loop the copy-data call.)

$pg.db.execute('copy foo from stdin').copy-data("1 2\n12 34234\n").copy-end.finish;

Bulk Copy Out

Bulk copy out can performed too, a COPY command will return a sequence from an iterator which will return each line:

for $pg.query('copy foo to stdout (format csv)') -> $line
{
    print $line;
}

Listen/Notify

PostgreSQL also supports an asynchronous LISTEN command that you can use to receive notifications from the database. The .listen() method returns a supply that can be used within a react block. You can listen to multiple channels, and all listens will share the same database connection.

react {
    whenever $pg.listen('foo') -> $msg
    {
        say $msg;
    }
    whenever $pg.listen('bar') -> $msg
    {
        say $msg;
    }
}

Use .unlisten to stop listening to a specific channel. When the last listened supply is unlistened, the react block will exit.

$pg.unlisten('foo')

PostgreSQL notifications can be sent with the .notify method:

$pg.notify('foo', 'a message');

For now, listen() requires the epoll module to be installed, and will die if it isn't installed.

Type Conversions

The DB::Pg::Converter object is used to convert between PostgreSQL types and Perl types. It has two maps, one from oid types to PostgreSQL type names, and one from the type names to Perl types.

For example, the oid 23 maps to the PostgreSQL type int which maps to the Perl type Int.

DB::Pg::Converter has a multiple dispatch method convert() that is used to convert types.

Extra roles can be mixed in to the default converter to enable it to convert to and from other types.

The converter() method on the main DB::Pg object will return the current converter, then does can be used to add a role with extra conversion methods.

Here is a short example that causes the PostgreSQL 'json' and 'jsonb' types to be converted automatically.

use DB::Pg;
use JSON::Fast;

my $pg = DB::Pg.new;

my class JSON {}  # Just a fake type, since JSON uses native Perl arrays/hashes

$pg.converter does role JSONConverter
{
    submethod BUILD { self.add-type(json => JSON, jsonb => JSON) }
    multi method convert(JSON:U, Str:D $value) { from-json($value) }
    multi method convert(Mu:D $value, JSON:U) { to-json($value) }
}

There are three parts to this conversion. First the BUILD adds the type mappings, then there are two methods, the first converts from a string (Str:D) to a JSON:U type. The second will be used when a parameter requires a JSON object. If the object already has a Str method that results in a suitable string for PostgreSQL (often the case), the second method can be omitted. (Or if you are only reading a type from the database, and never passing it to the server.)

Several Converters are bundled with this module, and by default they are added to the Converter automatically:

  • DateTime (date, timestamp, timestamptz -> Date, DateTime)
  • JSON (json, jsonb)
  • UUID (uuid -> UUID via LibUUID)
  • Geometric (point, line, lseg, box, path, polygon, circle)

The Geometric types are available in DB::Pg::GeometricTypes.

The DateTime converter relies on the PostgreSQL datestyle configuration. It must be set to a style compatible with Raku to be able to convert. In particular, some legacy styles still use 2 digit year representations that are impossible to unambiguosly map to a specific date. It is recommended that you set datestyle to 'iso'. This can either be done for the whole server (in postgresql.conf, or 'alter database set datestyle iso'), or per client (with 'set datestyle to iso' or by setting environment variable PGDATESTYLE to iso).

If you don't want any of those converters, just pass in an empty converters array, or with just the ones you want:

my $pg = DB::Pg.new(converters => <DateTime JSON>)

If you want a different type of conversion than those canned types, just exclude the default one and install your own as above.

Note: I'm looking for better ways to arrange this -- comments (file an issue) welcome!

Arrays

Most types of arrays are handled by default. When selecting, they will be converted to Perl Array objects. Likewise, to pass arrays to the server, just pass a Perl Array object.

Exceptions

All database errors, including broken SQL queries, are thrown as exceptions.

Exceptions for a query result may have additional fields as reported by PostgreSQL with the mapping below (perl exception field ➡ PostgreSQL field name):

  • message ➡ PG_DIAG_MESSAGE_PRIMARY
  • message-detail ➡ PG_DIAG_MESSAGE_DETAIL
  • message-hint ➡ PG_DIAG_MESSAGE_HINT
  • context ➡ PG_DIAG_CONTEXT
  • type ➡ PG_DIAG_SEVERITY_NONLOCALIZED
  • type-localized ➡ PG_DIAG_SEVERITY
  • state ➡ PG_DIAG_SQLSTATE
  • statement-position ➡ PG_DIAG_STATEMENT_POSITION
  • internal-position ➡ PG_DIAG_INTERNAL_POSITION
  • internal-query ➡ PG_DIAG_INTERNAL_QUERY
  • schema ➡ PG_DIAG_SCHEMA_NAME
  • table ➡ PG_DIAG_TABLE_NAME
  • column ➡ PG_DIAG_COLUMN_NAME
  • datatype ➡ PG_DIAG_DATATYPE_NAME
  • constraint ➡ PG_DIAG_CONSTRAINT_NAME
  • source-file ➡ PG_DIAG_SOURCE_FILE
  • source-line ➡ PG_DIAG_SOURCE_LINE
  • source-function ➡ PG_DIAG_SOURCE_FUNCTION

Please see the PostgreSQL documentation for a detailed description of what each field contains.

NOTE

For now, I've got the async pub/stuff using epoll, which is Linux specific, so this is tied to Linux. Patches welcome!

Acknowledgements

Inspiration taken from the existing Perl6 DBIish module as well as the Perl 5 Mojo::Pg from the Mojolicious project.

License

See NASA Open Source Agreement for more details.

Copyright

Copyright © 2017 United States Government as represented by the Administrator of the National Aeronautics and Space Administration. No copyright is claimed in the United States under Title 17, U.S.Code. All Other Rights Reserved.