Skip to content

Bug: Error loading Delta table locally #1157

Closed
@brsamb9

Description

@brsamb9

Environment

Delta-rs version:
0.7.0

Binding:

Environment: Local Machine

  • OS: MacOS
  • Other:
    M1 chip

Bug

What happened:
When using DeltaOps(table).load().await?;, this returned the following error Error: InvalidTableLocation(...). Script is written below for a more detailed example.

Under the hood, this seems to be the fault of passing the stripped delta table path (removing file://) into the url::parse command without the file:// prefix as this returns a Err(RelativeUrlWithoutBase).

What you expected to happen:

I expected this to successfully load the delta table as it passed the following check assert!(table.object_store().is_delta_table_location().await?); and every other command (like getting the metadata) seems to work, AFAIK.

I cloned the repo and removed the substitution (i.e. .replace("file://", "")) and then I no longer got this error. But this causes a couple of tests to break.

How to reproduce it:

Adapted script from one of the examples
use arrow::{
  array::{Int32Array, StringArray},
  datatypes::{DataType, Field, Schema as ArrowSchema},
  record_batch::RecordBatch,
};
use deltalake::{
  action::Protocol, arrow, operations::collect_sendable_stream, DeltaTableBuilder,
  DeltaTableMetaData, Schema,
};
use deltalake::{DeltaOps, DeltaTable, SchemaDataType, SchemaField};
use futures::executor;
use std::{collections::HashMap, sync::Arc};

fn get_table_columns() -> Vec<SchemaField> {
  vec![
      SchemaField::new(
          String::from("int"),
          SchemaDataType::primitive(String::from("integer")),
          false,
          Default::default(),
      ),
      SchemaField::new(
          String::from("string"),
          SchemaDataType::primitive(String::from("string")),
          true,
          Default::default(),
      ),
  ]
}

async fn init_delta_table(table_path: &str) -> DeltaTable {
  let metadata = DeltaTableMetaData::new(
      None,
      None,
      None,
      Schema::new(get_table_columns()),
      vec![],
      HashMap::new(),
  );

  let mut table = DeltaTableBuilder::from_uri(table_path).build().unwrap();
  table
      .create(metadata, Protocol::default(), None, None)
      .await
      .unwrap();
  table
}

#[tokio::main(flavor = "current_thread")]
async fn main() -> Result<(), deltalake::DeltaTableError> {
  let table_path = "file:///Users/brs/rust_projs/delta-lake-playground/data/delta-test/";
  let table = deltalake::open_table(table_path)
      .await
      .unwrap_or_else(|_| executor::block_on(init_delta_table(&table_path)));

  // let table = DeltaOps::new_in_memory()
  //     .create()
  //     .with_columns(get_table_columns())
  //     .await?;

  // let batch = get_table_batches();
  // let table = DeltaOps(table).write(vec![batch.clone()]).await?;

  // let (table, _check) = DeltaOps(table).filesystem_check().await?;
  assert!(table.object_store().is_delta_table_location().await?);

  let test = url::Url::parse(&table.object_store().root_uri());
  println!("{test:?}");
  println!("{}", table.object_store().root_uri());
  let test = url::Url::parse(&table_path);
  println!("{test:?}");
  // let test = url::Url::parse(&table_path);

  let (_table, stream) = DeltaOps(table).load().await?;
  let data: Vec<RecordBatch> = collect_sendable_stream(stream).await?;

  println!("{:?}", data);

  Ok(())
}

More details:

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions