Skip to content

[Repo Assist] Fix #1439: InferRows counts CSV rows, not text lines (multiline quoted fields)#1625

Merged
dsyme merged 6 commits intomainfrom
repo-assist/fix-issue-1439-inferrows-multiline-csv-9645f782b479e0ac
Feb 23, 2026
Merged

[Repo Assist] Fix #1439: InferRows counts CSV rows, not text lines (multiline quoted fields)#1625
dsyme merged 6 commits intomainfrom
repo-assist/fix-issue-1439-inferrows-multiline-csv-9645f782b479e0ac

Conversation

@github-actions
Copy link
Contributor

🤖 This is an automated draft PR from Repo Assist, an AI assistant.

Summary

Fixes #1439CsvProvider with InferRows = N failed when the sample file contained quoted fields spanning multiple text lines (e.g. "multi-\nline",2). The type provider raised Expected 2 columns, got 1 because the raw text was truncated mid-row.

Root Cause

Helpers.fsparseTextAtDesignTime used reader.ReadLine() to count rows:

| Some max ->
    let sb = StringBuilder()
    let mutable max = max
    while max > 0 do
        let line = reader.ReadLine()
        if isNull line then max <- 0
        else
            line |> sb.AppendLine |> ignore
            max <- max - 1
    sb.ToString()

This counts text lines, not CSV data rows. A single CSV row containing a quoted multiline field occupies N text lines but should count as just 1 row against the InferRows limit. The truncated string then caused CsvFile.Parse to see an incomplete quoted field, triggering a parse error.

Fix

Pass None as maxNumberOfRows to generateType for CsvProvider. The raw text is no longer pre-truncated by line count. Row limiting during inference is handled correctly by InferColumnTypes which already calls Seq.truncate inferRows — this is the authoritative row limit.

This matches the behaviour of XmlProvider and JsonProvider, which also pass None here.

Performance note: the full sample file is now read into a string at design time. For typical sample files (the intended use — representative excerpts, not production datasets) this is negligible. The CSV row parsing via StringReader remains lazy, so only inferRows rows are actually parsed.

Trade-offs

  • Reads the entire sample file text at design time rather than truncating early. For most sample files this is a non-issue.
  • Does not change runtime behaviour at all; only affects design-time type inference.

Test Status

✅ New test added: InferRows counts CSV rows not text lines for multiline quoted fields
✅ All 250 existing tests pass (dotnet test tests/FSharp.Data.Tests/FSharp.Data.Tests.fsproj -c Release)

Generated by Repo Assist

To install this workflow, run gh aw add githubnext/agentics/workflows/repo-assist.md@ee50a3b7d1d3eb4a8c409ac9409fd61c9a66b0f5. View source at https://github.com/githubnext/agentics/tree/ee50a3b7d1d3eb4a8c409ac9409fd61c9a66b0f5/workflows/repo-assist.md.

Repo Assist and others added 5 commits February 22, 2026 17:56
…0, FsCheck 2.16.6

- FAKE packages: 6.1.3 → 6.1.4 (patch)
- NUnit: 3.13.1 → 3.13.3 (patch)
- FsUnit: 4.0.4 → 4.2.0 (minor)
- FsCheck: 2.15.1 → 2.16.6 (minor)

Build: passes (0 errors)
Tests: all offline tests pass; network tests skip due to sandbox

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… quoted fields

The maxNumberOfRows text-truncation in parseTextAtDesignTime counted text
lines using reader.ReadLine(), which broke CSV files where a single data row
spans multiple text lines due to quoted fields (e.g. "multi-\nline",2).

Fix: pass None as maxNumberOfRows so the raw text is never pre-truncated.
Row-count limiting is already handled correctly by InferColumnTypes via
Seq.truncate inferRows - this has always been the authoritative row limit.
The performance cost is reading the full sample file as a string; this is
the same cost as all other providers (XmlProvider, JsonProvider) which also
pass None here.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions
Copy link
Contributor Author

✅ Pull request created: #1625

@dsyme dsyme marked this pull request as ready for review February 22, 2026 22:07
@dsyme dsyme closed this Feb 22, 2026
@dsyme dsyme reopened this Feb 22, 2026
Base automatically changed from repo-assist/deps-update-2026-02-22-22f87c2bfdc6ec83 to main February 22, 2026 22:10
@dsyme dsyme merged commit 3ba62bb into main Feb 23, 2026
2 checks passed
@dsyme dsyme deleted the repo-assist/fix-issue-1439-inferrows-multiline-csv-9645f782b479e0ac branch February 23, 2026 00:14
github-actions bot added a commit that referenced this pull request Feb 23, 2026
Add entries for:
- #1613: CSS pseudo-class NotSupportedException fix (#1383)
- #1617: ConvertDateTimeOffset xs:dateTime fallback fix (#1437)
- #1618: Microsoft.Build security bump
- #1619: XmlProvider EmbeddedResource GetSchema fix (#1310)
- #1621: StrictBooleans parameter for CsvProvider
- #1625: CsvProvider.InferRows multiline quoted field fix (#1439)
- #1626: XSD group reference cycle guard (#1419)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CsvProvider: Incorrect truncation of sample file (by InferRows)

1 participant