Skip to content
/ splitcsv Public

A lightweight Go utility to split large CSV files into N parts. Supports custom delimiters, optional headers in each output, and manual split size. Useful for handling big datasets without loading the entire file into memory.

License

Notifications You must be signed in to change notification settings

sarff/splitcsv

Repository files navigation

CSV Splitter

A simple command-line tool to split large CSV files into multiple smaller parts with customizable options.

Features

  • Split CSV files into any number of parts
  • Automatically generates output filenames with _part1, _part2, etc. suffixes
  • Preserves original file extension and directory
  • Optional header inclusion in all output files
  • Support for custom column separators
  • Even distribution of rows across parts
  • Handles files with any number of rows efficiently
  • Smart newline handling: Automatically replaces line breaks with spaces in quoted CSV fields while preserving them in unquoted fields

Installation

go build -o splitcsv main.go

Usage

./splitcsv -in <input-file> [options]

Required Flags

  • -in - Input CSV file path (required)

Optional Flags

  • -parts - Number of parts to split into (default: 2)
  • -header - Include header row in all output files (default: true)
  • -comma - Column separator character (default: ",")

Examples

Basic Usage

Split a CSV file into 2 parts (default):

./splitcsv -in data.csv

Output: data_part1.csv, data_part2.csv

Split into Multiple Parts

Split into 5 parts:

./splitcsv -in sales_data.csv -parts 5

Output: sales_data_part1.csv, sales_data_part2.csv, ..., sales_data_part5.csv

Without Headers

Split without including headers in output files:

./splitcsv -in data.csv -parts 3 -header=false

Custom Separator

Split a semicolon-separated file:

./splitcsv -in european_data.csv -parts 4 -comma ";"

Complex Example

Split a large file with tab separator into 10 parts without headers:

./splitcsv -in huge_dataset.tsv -parts 10 -comma "\t" -header=false

How It Works

  1. Row Counting: First pass counts total data rows (excluding header)
  2. Distribution: Calculates optimal row distribution across parts
  3. File Generation: Creates output files with _partN suffix
  4. Data Writing: Distributes rows evenly, with extra rows going to first parts

Row Distribution Logic

For a file with 100 data rows split into 3 parts:

  • Part 1: 34 rows
  • Part 2: 33 rows
  • Part 3: 33 rows

Extra rows are distributed to the first parts to ensure even splitting.

Smart Newline Handling

The tool automatically handles CSV fields that contain line breaks:

  • Quoted fields: Line breaks (\n and \r\n) inside quoted fields are automatically replaced with spaces
  • Unquoted fields: Line breaks in unquoted fields are preserved as-is
  • Escaped quotes: Properly handles escaped quotes ("") within quoted fields

Example:

name,description,price
"Product A","This is a long
description with line breaks",100
Product B,Simple description,200

After processing:

name,description,price
"Product A","This is a long description with line breaks",100
Product B,Simple description,200

This ensures that CSV files with multi-line content in quoted fields remain properly formatted and compatible with standard CSV parsers.

File Naming Convention

Output files follow this pattern:

{original_name}_part{N}{original_extension}

Examples:

  • data.csvdata_part1.csv, data_part2.csv
  • sales_2024.csvsales_2024_part1.csv, sales_2024_part2.csv
  • export.tsvexport_part1.tsv, export_part2.tsv

Error Handling

The tool will exit with an error message if:

  • Input file doesn't exist or can't be read
  • Input file has no data rows
  • Number of parts is less than 1
  • Separator is not a single character
  • Output files can't be created

Performance

  • Memory efficient: processes files row by row
  • Two-pass reading: first for counting, second for splitting
  • Supports files of any size (limited only by available disk space)

Requirements

  • Go 1.16 or later
  • Read permission for input file
  • Write permission for output directory

License

This project is open source and available under the MIT License.

About

A lightweight Go utility to split large CSV files into N parts. Supports custom delimiters, optional headers in each output, and manual split size. Useful for handling big datasets without loading the entire file into memory.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages