Proposal to address confusion between PostgreSQL concepts "schema" and "search_path" #918
Description
After several days of frustrating debugging ( #916 ), I would like to propose a change to the terminology that Laravel employs in relation to PostgreSQL.
In essence, my frustration and confusion stem directly from the fact that the existing code seems to conflate PostgreSQL's schema
and search_path
concepts. As tempting as it may be to write-off the conflation as a "mundane detail", it feels worthwhile to correct.
Until I discovered that the schema
connection parameter could in fact accept an array of schemas (and not merely a single name in the form of a string), via laravel/framework#15535 , I was absolutely baffled as to why the framework would ever set the search_path
to a single schema
. I discovered this "undocumented feature" only after rooting-around in the source (and found the above PR only much later); I have never seen it mentioned nor documented elsewhere. And I went to the source first, not after struggling for several days to reach this point. I just couldn't find anything in relation to the search_path
(because I discovered it to be the problem, on the PostgreSQL side), and now I know why. I wish I had found this Issue much sooner (which inspired the above PR), as it would have saved me a lot of time and angst:
A schema
refers to a single namespace, whereas the search_path
is comprised of one or more schemas. Simply put, the PostgreSQL manual describes it as, "a list of schemas to look in". While hugely useful, the search_path
is equally complicated, which is why I feel so strongly that this dinstinction needs to be made, and the source corrected accordingly. Even if the source is not corrected, the connection configuration parameter name, schema
, should be changed to searchpath
. To be clear, I am proposing only that the variable and method names be changed to reflect their actual purpose, with no functional changes (beyond renaming schema
to searchpath
in the connection configuration array; BC could be preserved if desired).
Somewhat surprisingly, PostgreSQL's search_path
is barely documented and even less well understood, and the conflation at issue here is therefore largely understandable.
Many such articles exist, but the following article details just how complex the relationship between schema
and search_path
is. Many of the complexities don't emerge until more sophisticated use-cases are undertaken, such as those involving functions, stored procedures, views, and large-scale data manipulation, especially in the context of multiple schemas.
http://www.postgresonline.com/journal/archives/279-Schema-and-search_path-surprises.html
In fact, it doesn't even make sense to define a schema on a connection, because a connection has no need for, nor any concept of, a schema; a connection needs a database, but not a schema. It is the search_path
that concerns PostgreSQL connections. It is unfortunate that PostgreSQL itself provides the SET SCHEMA ...
capability, which is defined as follows:
SET SCHEMA 'value' is an alias for SET search_path TO value. Only one schema can be specified using this syntax.
This confuses the distinction between schema
and search_path
even further.
Despite the fact that what I was trying to achieve all along could be accomplished simply by using an array for the schema
value, there are advantages to be gained from implementing separate connection configuration properties for schema
and search_path
. The schema
can then be restricted to its intended purpose, which is to "address" (prefix) objects in schemas that are not within the search_path
, or to disambiguate between objects of the same name within schemas that are within the search_path
). At the same time, search_path
can be used for its intended purpose, which is wildly complex, and beyond the scope of what can be discussed reasonably in this venue. The point is simply to keep them separate.
"Getting this right" feels instrumental to a flexible and predictable PostgreSQL configuration, especially where PostgreSQL extension usage is concerned, because extensions are enabled on a per-schema basis in PostgreSQL.
I could be wrong (and often am), but the only benefit I see in defining a schema
on a connection is to be used as a prefix when addressing a table in an Eloquent model or similar. In other words, prior to this "revelation", I was using the connection's schema
only in order to switch between MySQL and PostgreSQL without having to change anything in the application's models.
Something like this:
trait EnhancesModels
{
public function getDriverName($connectionName = null)
{
return DB::connection($connectionName)->getPdo()->getAttribute(\PDO::ATTR_DRIVER_NAME);
}
public function setFullyQualifiedTableNames()
{
$driver = $this->getDriverName();
$table = $this->tablePrefix . $this->table;
if ($driver === 'pgsql') {
$this->table = $this->getConnection()->getConfig('schema') . '.' . $table;
}
elseif ($driver === 'mysql') {
$this->table = $this->getConnection()->getConfig('database') . '.' . $table;
}
}
}
Maybe there is a better way to implement this, in which case I see no need for the schema
connection property at all. My point is only that protected $table = 'myschema.users';
, which would be necessary otherwise, won't fly if PostgreSQL is swapped-out for MySQL.
I am curious how other people feel about this, particularly those using PostgreSQL.
Thank you for your time and consideration.