Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ST_Read_Meta, don't throw in GDAL filesystem stat #227

Merged
merged 5 commits into from
Jan 10, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 47 additions & 2 deletions docs/docs.md
Original file line number Diff line number Diff line change
Expand Up @@ -283,8 +283,6 @@ select st_centroid('POLYGON((0 0, 0 1, 1 1, 1 0, 0 0))'::geometry);
-- POINT(0.5 0.5)
```

End of example

## ST_Collect

_Collects geometries into a collection geometry_
Expand Down Expand Up @@ -1720,6 +1718,53 @@ The following formats are currently recognized by their file extension:
| GeoPackage | .gpkg |
| FlatGeoBuf | .fgb |

## ST_Read_Meta

_Read metadata from a variety of geospatial file formats_

- ST_Read_Meta(file __VARCHAR__)
- ST_Read_Meta(files __VARCHAR[]__)

### Description

The `ST_Read_Meta` table function accompanies the [ST_Read](#st_read) table function, but instead of reading the contents of a file, this function scans the metadata instead.

Since the data model of the underlying GDAL library is quite flexible, most of the interesting metadata is within the returned `layers` column, which is a somewhat complex nested structure of DuckDB `STRUCT` and `LIST` types.

### Examples
Find the coordinate reference system authority name and code for the first layers first geometry column in the file
```sql
SELECT
layers[1].geometry_fields[1].crs.auth_name as name,
layers[1].geometry_fields[1].crs.auth_code as code
FROM st_read_meta('../../tmp/data/amsterdam_roads.fgb');
```

```
┌─────────┬─────────┐
│ name │ code │
│ varchar │ varchar │
├─────────┼─────────┤
│ EPSG │ 3857 │
└─────────┴─────────┘
```

Identify the format driver and the number of layers of a set of files
```sql
SELECT file_name, driver_short_name, len(layers) FROM st_read_meta('../../tmp/data/*');
```

```
┌───────────────────────────────────────────┬───────────────────┬─────────────┐
│ file_name │ driver_short_name │ len(layers) │
│ varchar │ varchar │ int64 │
├───────────────────────────────────────────┼───────────────────┼─────────────┤
│ ../../tmp/data/amsterdam_roads_50.geojson │ GeoJSON │ 1 │
│ ../../tmp/data/amsterdam_roads.fgb │ FlatGeobuf │ 1 │
│ ../../tmp/data/germany.osm.pbf │ OSM │ 5 │
└───────────────────────────────────────────┴───────────────────┴─────────────┘
```

## ST_ReadOSM

_Reads compressed OpenStreetMap data_
Expand Down
67 changes: 67 additions & 0 deletions docs/src/functions/table/st_read_meta.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
---
{
"type": "table_function",
"title": "ST_Read_Meta",
"id": "st_read_meta",
"signatures": [
{
"parameters": [
{
"name": "file",
"type": "VARCHAR"
}
]
},
{
"parameters": [
{
"name": "files",
"type": "VARCHAR[]"
}
]
}
],
"summary": "Read metadata from a variety of geospatial file formats",
"tags": []
}
---

### Description

The `ST_Read_Meta` table function accompanies the [ST_Read](#st_read) table function, but instead of reading the contents of a file, this function scans the metadata instead.

Since the data model of the underlying GDAL library is quite flexible, most of the interesting metadata is within the returned `layers` column, which is a somewhat complex nested structure of DuckDB `STRUCT` and `LIST` types.

### Examples
Find the coordinate reference system authority name and code for the first layers first geometry column in the file
```sql
SELECT
layers[1].geometry_fields[1].crs.auth_name as name,
layers[1].geometry_fields[1].crs.auth_code as code
FROM st_read_meta('../../tmp/data/amsterdam_roads.fgb');
```

```
┌─────────┬─────────┐
│ name │ code │
│ varchar │ varchar │
├─────────┼─────────┤
│ EPSG │ 3857 │
└─────────┴─────────┘
```

Identify the format driver and the number of layers of a set of files
```sql
SELECT file_name, driver_short_name, len(layers) FROM st_read_meta('../../tmp/data/*');
```

```
┌───────────────────────────────────────────┬───────────────────┬─────────────┐
│ file_name │ driver_short_name │ len(layers) │
│ varchar │ varchar │ int64 │
├───────────────────────────────────────────┼───────────────────┼─────────────┤
│ ../../tmp/data/amsterdam_roads_50.geojson │ GeoJSON │ 1 │
│ ../../tmp/data/amsterdam_roads.fgb │ FlatGeobuf │ 1 │
│ ../../tmp/data/germany.osm.pbf │ OSM │ 5 │
└───────────────────────────────────────────┴───────────────────┴─────────────┘
```
4 changes: 4 additions & 0 deletions spatial/include/spatial/gdal/functions.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,10 @@ struct GdalCopyFunction {
static void Register(DatabaseInstance &db);
};

struct GdalMetadataFunction {
static void Register(DatabaseInstance &db);
};

} // namespace gdal

} // namespace spatial
3 changes: 2 additions & 1 deletion spatial/src/spatial/gdal/file_handler.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -189,11 +189,12 @@ class DuckDBFileSystemHandler : public VSIFilesystemHandler {
pstatbuf->st_mode = S_IFCHR;
break;
default:
// HTTPFS returns invalid type for everything basically.
if(FileSystem::IsRemoteFile(file_name)) {
pstatbuf->st_mode = S_IFREG;
}
else {
throw IOException("Unknown file type");
return -1;
}
}

Expand Down
1 change: 1 addition & 0 deletions spatial/src/spatial/gdal/functions/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ set(EXTENSION_SOURCES
${EXTENSION_SOURCES}
${CMAKE_CURRENT_SOURCE_DIR}/st_drivers.cpp
${CMAKE_CURRENT_SOURCE_DIR}/st_read.cpp
${CMAKE_CURRENT_SOURCE_DIR}/st_read_meta.cpp
${CMAKE_CURRENT_SOURCE_DIR}/st_write.cpp
PARENT_SCOPE
)
217 changes: 217 additions & 0 deletions spatial/src/spatial/gdal/functions/st_read_meta.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,217 @@
#include "duckdb/parser/parsed_data/create_table_function_info.hpp"
#include "duckdb/parser/expression/constant_expression.hpp"
#include "duckdb/parser/expression/function_expression.hpp"
#include "duckdb/parser/tableref/table_function_ref.hpp"
#include "duckdb/common/multi_file_reader.hpp"

#include "spatial/common.hpp"
#include "spatial/gdal/functions.hpp"
#include "spatial/gdal/file_handler.hpp"

#include "ogrsf_frmts.h"
#include <cstring>

namespace spatial {

namespace gdal {

//------------------------------------------------------------------------------
// Bind
//------------------------------------------------------------------------------

struct GDALMetadataBindData : public TableFunctionData {
vector<string> file_names;
};

static LogicalType GEOMETRY_FIELD_TYPE = LogicalType::STRUCT({
{"name", LogicalType::VARCHAR},
{"type", LogicalType::VARCHAR},
{"nullable", LogicalType::BOOLEAN},
{"crs", LogicalType::STRUCT({
{"name", LogicalType::VARCHAR},
{"auth_name", LogicalType::VARCHAR},
{"auth_code", LogicalType::VARCHAR},
{"wkt", LogicalType::VARCHAR},
{"proj4", LogicalType::VARCHAR},
{"projjson", LogicalType::VARCHAR},
})},
});

static LogicalType STANDARD_FIELD_TYPE = LogicalType::STRUCT({
{"name", LogicalType::VARCHAR},
{"type", LogicalType::VARCHAR},
{"subtype", LogicalType::VARCHAR},
{"nullable", LogicalType::BOOLEAN},
{"unique", LogicalType::BOOLEAN},
{"width", LogicalType::BIGINT},
{"precision", LogicalType::BIGINT},
});

static LogicalType LAYER_TYPE = LogicalType::STRUCT({
{"name", LogicalType::VARCHAR},
{"feature_count", LogicalType::BIGINT},
{"geometry_fields", LogicalType::LIST(GEOMETRY_FIELD_TYPE)},
{"fields", LogicalType::LIST(STANDARD_FIELD_TYPE)},
});

static unique_ptr<FunctionData> Bind(ClientContext &context, TableFunctionBindInput &input,
vector<LogicalType> &return_types, vector<string> &names) {

auto file_name = input.inputs[0].GetValue<string>();
auto result = make_uniq<GDALMetadataBindData>();

result->file_names = MultiFileReader::GetFileList(context, input.inputs[0], "gdal metadata", FileGlobOptions::ALLOW_EMPTY);

names.push_back("file_name");
return_types.push_back(LogicalType::VARCHAR);

names.push_back("driver_short_name");
return_types.push_back(LogicalType::VARCHAR);

names.push_back("driver_long_name");
return_types.push_back(LogicalType::VARCHAR);

names.push_back("layers");
return_types.push_back(LogicalType::LIST(LAYER_TYPE));

// TODO: Add metadata, domains, relationships
/*
names.push_back("metadata");
return_types.push_back(LogicalType::VARCHAR);

names.push_back("domains");
return_types.push_back(LogicalType::VARCHAR);

names.push_back("relationships");
return_types.push_back(LogicalType::VARCHAR);
*/

return std::move(result);
}

//------------------------------------------------------------------------------
// Init
//------------------------------------------------------------------------------
struct GDALMetadataState : public GlobalTableFunctionState {
idx_t current_file_idx = 0;
};

static unique_ptr<GlobalTableFunctionState> Init(ClientContext &context, TableFunctionInitInput &input) {
auto result = make_uniq<GDALMetadataState>();
return std::move(result);
}

//------------------------------------------------------------------------------
// Scan
//------------------------------------------------------------------------------

static Value GetLayerData(GDALDatasetUniquePtr &dataset) {
vector<Value> layer_values;

for(const auto &layer : dataset->GetLayers()) {
child_list_t<Value> layer_value_fields;

layer_value_fields.emplace_back("name", Value(layer->GetName()));
layer_value_fields.emplace_back("feature_count", Value(static_cast<int64_t>(layer->GetFeatureCount())));

vector<Value> geometry_fields;
for(const auto &field : layer->GetLayerDefn()->GetGeomFields()) {
child_list_t<Value> geometry_field_value_fields;
auto field_name = field->GetNameRef();
if(std::strlen(field_name) == 0) {
field_name = "geom";
}
geometry_field_value_fields.emplace_back("name", Value(field_name));
geometry_field_value_fields.emplace_back("type", Value(OGRGeometryTypeToName(field->GetType())));
geometry_field_value_fields.emplace_back("nullable", Value(static_cast<bool>(field->IsNullable())));

auto crs = field->GetSpatialRef();
if(crs != nullptr) {
child_list_t<Value> crs_value_fields;
crs_value_fields.emplace_back("name", Value(crs->GetName()));
crs_value_fields.emplace_back("auth_name", Value(crs->GetAuthorityName(nullptr)));
crs_value_fields.emplace_back("auth_code", Value(crs->GetAuthorityCode(nullptr)));

char* wkt_ptr = nullptr;
crs->exportToWkt(&wkt_ptr);
crs_value_fields.emplace_back("wkt", wkt_ptr ? Value(wkt_ptr) : Value());
CPLFree(wkt_ptr);

char* proj4_ptr = nullptr;
crs->exportToProj4(&proj4_ptr);
crs_value_fields.emplace_back("proj4", proj4_ptr ? Value(proj4_ptr) : Value());
CPLFree(proj4_ptr);

char* projjson_ptr = nullptr;
crs->exportToPROJJSON(&projjson_ptr, nullptr);
crs_value_fields.emplace_back("projjson", projjson_ptr ? Value(projjson_ptr) : Value());
CPLFree(projjson_ptr);

geometry_field_value_fields.emplace_back("crs", Value::STRUCT(crs_value_fields));
}

geometry_fields.push_back(Value::STRUCT(geometry_field_value_fields));
}
layer_value_fields.emplace_back("geometry_fields", Value::LIST(GEOMETRY_FIELD_TYPE, std::move(geometry_fields)));

vector<Value> standard_fields;
for(const auto &field : layer->GetLayerDefn()->GetFields()) {
child_list_t<Value> standard_field_value_fields;
standard_field_value_fields.emplace_back("name", Value(field->GetNameRef()));
standard_field_value_fields.emplace_back("type", Value(OGR_GetFieldTypeName(field->GetType())));
standard_field_value_fields.emplace_back("subtype", Value(OGR_GetFieldSubTypeName(field->GetSubType())));
standard_field_value_fields.emplace_back("nullable", Value(field->IsNullable()));
standard_field_value_fields.emplace_back("unique", Value(field->IsUnique()));
standard_field_value_fields.emplace_back("width", Value(field->GetWidth()));
standard_field_value_fields.emplace_back("precision", Value(field->GetPrecision()));
standard_fields.push_back(Value::STRUCT(standard_field_value_fields));
}
layer_value_fields.emplace_back("fields", Value::LIST(STANDARD_FIELD_TYPE, std::move(standard_fields)));

layer_values.push_back(Value::STRUCT(layer_value_fields));
}

return Value::LIST(LAYER_TYPE, std::move(layer_values));
}

static void Scan(ClientContext &context, TableFunctionInput &input, DataChunk &output) {
auto &bind_data = input.bind_data->Cast<GDALMetadataBindData>();
auto &state = input.global_state->Cast<GDALMetadataState>();

auto out_size = MinValue<idx_t>(STANDARD_VECTOR_SIZE, bind_data.file_names.size() - state.current_file_idx);

for(idx_t out_idx = 0; out_idx < out_size; out_idx++, state.current_file_idx++) {
auto file_name = bind_data.file_names[state.current_file_idx];
auto prefixed_file_name = GDALClientContextState::GetOrCreate(context).GetPrefix() + file_name;

GDALDatasetUniquePtr dataset;
try {
dataset = GDALDatasetUniquePtr(
GDALDataset::Open(prefixed_file_name.c_str(), GDAL_OF_VECTOR | GDAL_OF_VERBOSE_ERROR));
} catch (...) {
// Just skip anything we cant open
out_idx--;
out_size--;
continue;
}

output.data[0].SetValue(out_idx, file_name);
output.data[1].SetValue(out_idx, dataset->GetDriver()->GetDescription());
output.data[2].SetValue(out_idx, dataset->GetDriver()->GetMetadataItem(GDAL_DMD_LONGNAME));
output.data[3].SetValue(out_idx, GetLayerData(dataset));
}

output.SetCardinality(out_size);
}

//------------------------------------------------------------------------------
// Register
//------------------------------------------------------------------------------
void GdalMetadataFunction::Register(DatabaseInstance &db) {
TableFunction func("st_read_meta", {LogicalType::VARCHAR}, Scan, Bind, Init);
ExtensionUtil::RegisterFunction(db, MultiFileReader::CreateFunctionSet(func));
}

} // namespace gdal

} // namespace spatial
1 change: 1 addition & 0 deletions spatial/src/spatial/gdal/module.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,7 @@ void GdalModule::Register(DatabaseInstance &db) {
GdalTableFunction::Register(db);
GdalDriversTableFunction::Register(db);
GdalCopyFunction::Register(db);
GdalMetadataFunction::Register(db);
}

} // namespace gdal
Expand Down
Loading