-
-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Labels
bugSomething isn't workingSomething isn't working
Description
BUG-002: Spark Date Type Cannot Convert to DateTime
Summary
When reading CSV files with date columns, Spark returns a native Date type that cannot be converted to .NET DateTime. The ObjectMaterializer fails because Spark's Date type doesn't implement IConvertible.
Error Message
Cannot convert value '2024-01-01' (type: Date) to DateTime
Affected Components
DataFlow.Framework.ObjectMaterializer(OSS)DataFlow.Sparkv1.2.0- Not affected: Snowflake (uses strings for dates)
Root Cause
Location: MemberMaterializationPlan.cs:463 in DataFlow.Framework.ObjectMaterializer
When Spark reads CSV files, date columns are inferred as Spark's native Date type. This Java-based type is wrapped by Microsoft.Spark but doesn't implement .NET's IConvertible interface, causing Convert.ChangeType() to fail.
// MemberMaterializationPlan.cs - simplified
var value = row[columnIndex]; // Returns Spark Date object
var converted = Convert.ChangeType(value, typeof(DateTime)); // FAILS!Reproduction Steps
Step 1: Create CSV with date column
id,order_date,amount
1,2024-01-15,500.00
2,2024-01-16,750.00Step 2: Define model with DateTime property
public class Order
{
public int Id { get; set; }
public DateTime OrderDate { get; set; } // DateTime property
public double Amount { get; set; }
}Step 3: Read and materialize
var context = Spark.Connect();
var orders = context.Read.Csv<Order>("path/to/orders.csv");
// This FAILS:
var results = orders.ToList(); // Throws: Cannot convert Date to DateTimeFailing Tests
| Project | Test |
|---|---|
| (None currently) | Tests avoid date columns as workaround |
Current Workarounds
Workaround 1: Use Parquet format
// Parquet preserves .NET types correctly
var orders = context.Read.Parquet<Order>("path/to/orders.parquet");
var results = orders.ToList(); // Works!Workaround 2: Store dates as strings
public class Order
{
public int Id { get; set; }
public string OrderDate { get; set; } // String instead of DateTime
public double Amount { get; set; }
}
// Parse after materialization
var results = orders.ToList();
var parsedDates = results.Select(o => DateTime.Parse(o.OrderDate));Workaround 3: Avoid CSV date columns
Remove date columns from test data and models entirely.
Proposed Fix
Add special handling for Spark Date type in the materializer:
// In MemberMaterializationPlan.cs
if (value is Microsoft.Spark.Sql.Types.Date sparkDate)
{
// Extract year, month, day and construct DateTime
return new DateTime(sparkDate.Year, sparkDate.Month, sparkDate.Day);
}Or use Spark's cast() function to convert to string before pulling to .NET.
Impact
- Severity: HIGH (for CSV users)
- Frequency: Medium (Parquet users unaffected)
- User Impact: Forces Parquet or string-based date handling
Labels
bug, spark, materialization, csv, datetime
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working