Skip to content

BUG-002: Spark Date Type Cannot Convert to DateTime #3

@improveTheWorld

Description

@improveTheWorld

BUG-002: Spark Date Type Cannot Convert to DateTime

Summary

When reading CSV files with date columns, Spark returns a native Date type that cannot be converted to .NET DateTime. The ObjectMaterializer fails because Spark's Date type doesn't implement IConvertible.

Error Message

Cannot convert value '2024-01-01' (type: Date) to DateTime

Affected Components

  • DataFlow.Framework.ObjectMaterializer (OSS)
  • DataFlow.Spark v1.2.0
  • Not affected: Snowflake (uses strings for dates)

Root Cause

Location: MemberMaterializationPlan.cs:463 in DataFlow.Framework.ObjectMaterializer

When Spark reads CSV files, date columns are inferred as Spark's native Date type. This Java-based type is wrapped by Microsoft.Spark but doesn't implement .NET's IConvertible interface, causing Convert.ChangeType() to fail.

// MemberMaterializationPlan.cs - simplified
var value = row[columnIndex];  // Returns Spark Date object
var converted = Convert.ChangeType(value, typeof(DateTime));  // FAILS!

Reproduction Steps

Step 1: Create CSV with date column

id,order_date,amount
1,2024-01-15,500.00
2,2024-01-16,750.00

Step 2: Define model with DateTime property

public class Order
{
    public int Id { get; set; }
    public DateTime OrderDate { get; set; }  // DateTime property
    public double Amount { get; set; }
}

Step 3: Read and materialize

var context = Spark.Connect();
var orders = context.Read.Csv<Order>("path/to/orders.csv");

// This FAILS:
var results = orders.ToList();  // Throws: Cannot convert Date to DateTime

Failing Tests

Project Test
(None currently) Tests avoid date columns as workaround

Current Workarounds

Workaround 1: Use Parquet format

// Parquet preserves .NET types correctly
var orders = context.Read.Parquet<Order>("path/to/orders.parquet");
var results = orders.ToList();  // Works!

Workaround 2: Store dates as strings

public class Order
{
    public int Id { get; set; }
    public string OrderDate { get; set; }  // String instead of DateTime
    public double Amount { get; set; }
}

// Parse after materialization
var results = orders.ToList();
var parsedDates = results.Select(o => DateTime.Parse(o.OrderDate));

Workaround 3: Avoid CSV date columns

Remove date columns from test data and models entirely.

Proposed Fix

Add special handling for Spark Date type in the materializer:

// In MemberMaterializationPlan.cs
if (value is Microsoft.Spark.Sql.Types.Date sparkDate)
{
    // Extract year, month, day and construct DateTime
    return new DateTime(sparkDate.Year, sparkDate.Month, sparkDate.Day);
}

Or use Spark's cast() function to convert to string before pulling to .NET.

Impact

  • Severity: HIGH (for CSV users)
  • Frequency: Medium (Parquet users unaffected)
  • User Impact: Forces Parquet or string-based date handling

Labels

bug, spark, materialization, csv, datetime

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions