Skip to content

Go language library for reading and writing Microsoft Excel™ (XLAM / XLSM / XLSX / XLTM / XLTX) spreadsheets

License

Notifications You must be signed in to change notification settings

OmniMCP-AI/excelize

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Excelize - High-Performance Excel Library

A high-performance fork of xuri/excelize with advanced optimizations for large-scale Excel formula calculations and batch operations.

Go Version License Performance

🚀 Key Features

Core Capabilities

  • Read and write Excel files (XLSX/XLSM/XLAM/XLTM/XLTX)
  • Full formula calculation engine supporting 100+ Excel functions
  • Batch operations API for high-performance scenarios
  • Sparse data structure for memory-efficient storage
  • Thread-safe concurrent operations

Performance Optimizations

This fork extends the original Excelize with significant performance enhancements:

Feature Standard Excelize This Fork Improvement
216,000 formula calculation OOM crash ❌ 24 min ✅ From unusable to usable
10,000 SUMIFS (same pattern) 10-30 min 10-60 sec 10-100x
Batch update + recalc All formulas Only affected 10-100x
Memory usage peak OOM LRU + GC -70%
Max formulas supported <50,000 216,000+ 4x+

Advanced Features

1. DAG-Based Dependency Resolution

Intelligent formula dependency analysis with topological sorting:

  • Automatic dependency graph construction
  • Layer-based parallel calculation
  • Cycle detection at build time
  • Incremental recalculation (only affected formulas)
// Recalculate all formulas with dependency awareness
err := f.RecalculateAllWithDependency(RecalculateOptions{
    EnableParallel: true,
    WorkerCount:    runtime.NumCPU(),
})

Benefits: 2-16x speedup through parallel layer execution, 70% reduction in calculation layers.

2. Batch SUMIFS/AVERAGEIFS Optimization

Pattern detection and vectorized computation for repeated formulas:

// These 10,000 formulas will be optimized automatically:
// A1: =SUMIFS(Data!$H:$H, Data!$D:$D, $A1, Data!$A:$A, $D1)
// A2: =SUMIFS(Data!$H:$H, Data!$D:$D, $A2, Data!$A:$A, $D2)
// ... (same pattern with different criteria cells)

results, err := f.CalcCellValuesDependencyAware("Sheet1", cells, options)

How it works:

  • Detects formula patterns (same ranges, different criteria)
  • Pre-builds hash index for criteria matching
  • Single-pass range scan instead of 10,000 scans
  • Result: 100-1000x speedup for large pattern groups

3. Multi-Level Caching System

Eight cache levels for optimal performance:

Cache Purpose Size Limit
calcCache Formula calculation results Per-sheet map
rangeCache Range value matrices (LRU) 50 entries
matchIndexCache MATCH hash indexes Unlimited
ifsMatchCache SUMIFS/COUNTIFS criteria Unlimited
rangeIndexCache Range value lookups Unlimited
colStyleCache Column style information Per-file map
formulaSI Shared formula indexes Per-sheet map
xmlAttr XML attribute cache Per-file map

Cache invalidation: Automatically cleared on data updates, with batch mode deferral.

4. Specialized Function Optimizations

  • SUMIFS/COUNTIFS/AVERAGEIFS: Pattern detection + hash indexing
  • INDEX/MATCH: Pre-built indexes for repeated lookups
  • SUMPRODUCT: Vectorized computation
  • INDIRECT: Smart dependency tracking
  • Array formulas: Batch evaluation

📦 Installation

go get github.com/xuri/excelize/v2

🔧 Quick Start

Basic Usage

package main

import (
    "fmt"
    "github.com/xuri/excelize/v2"
)

func main() {
    // Create a new workbook
    f := excelize.NewFile()
    defer f.Close()

    // Set cell value
    f.SetCellValue("Sheet1", "A1", "Hello")
    f.SetCellValue("Sheet1", "B1", "World")

    // Set formula
    f.SetCellFormula("Sheet1", "C1", "=A1&\" \"&B1")

    // Calculate formula
    result, _ := f.CalcCellValue("Sheet1", "C1")
    fmt.Println(result) // "Hello World"

    // Save file
    f.SaveAs("hello.xlsx")
}

High-Performance Batch Operations

// Batch set values (170x faster than loops)
values := map[string]interface{}{
    "A1": 100,
    "B1": "text",
    "C1": 3.14,
    // ... thousands of cells
}
f.SetCellValues("Sheet1", values)

// Batch calculate formulas (38x faster cold, 322x cached)
cells := []string{"D1", "D2", "D3", /* ... */}
results, err := f.CalcCellValues("Sheet1", cells)

Dependency-Aware Recalculation

// Recalculate only formulas affected by changes
options := excelize.RecalculateOptions{
    EnableParallel:       true,
    WorkerCount:          8,
    EnableSubExprCache:   true,
    EnableBatchOptimize:  true,
}

err := f.RecalculateAllWithDependency(options)
if err != nil {
    log.Fatal(err)
}

Optimized SUMIFS Pattern

// These formulas will automatically use batch optimization:
formulas := map[string]string{
    "A1": "=SUMIFS(Data!$H:$H, Data!$D:$D, $A1, Data!$A:$A, $D1)",
    "A2": "=SUMIFS(Data!$H:$H, Data!$D:$D, $A2, Data!$A:$A, $D2)",
    // ... 10,000 similar formulas
}

for cell, formula := range formulas {
    f.SetCellFormula("Sheet1", cell, formula)
}

// Calculate all at once with pattern detection
cells := make([]string, 0, 10000)
for i := 1; i <= 10000; i++ {
    cells = append(cells, fmt.Sprintf("A%d", i))
}

results, _ := f.CalcCellValuesDependencyAware("Sheet1", cells, excelize.CalcCellValuesDependencyAwareOptions{
    EnableBatchOptimize: true,
    EnableParallel:      true,
})

🏗️ Architecture

Core Components

excelize.go           Main File struct, workbook management, caching
├─ cell.go            Cell operations (get/set value, formula, style)
├─ batch_dependency.go DAG dependency graph, topological sorting
├─ batch_dag_scheduler.go Layer-based parallel scheduler
├─ batch_sumifs.go     SUMIFS/AVERAGEIFS pattern optimization
├─ batch_index_match.go INDEX/MATCH optimization
├─ calc_formula.go     Formula calculation engine (21k+ lines)
├─ calc_dependency_aware.go Dependency-aware batch calculation
├─ calc_optimized.go   Concurrent & optimized calculation
├─ lru_cache.go        LRU cache for range matrices
└─ sheet.go            Worksheet management, row/column ops

Calculation Flow

User Request
    ↓
┌───────────────────────────────────┐
│ 1. Build Dependency Graph (DAG)  │
│    - Parse all formulas           │
│    - Extract dependencies         │
│    - Topological sort (layers)    │
└───────────────────────────────────┘
    ↓
┌───────────────────────────────────┐
│ 2. Optimize Calculation Plan      │
│    - Detect SUMIFS patterns       │
│    - Merge similar formulas       │
│    - Build hash indexes           │
└───────────────────────────────────┘
    ↓
┌───────────────────────────────────┐
│ 3. Layer-by-Layer Calculation     │
│    - Layer 0: No dependencies     │
│      → Parallel batch calc        │
│    - Layer 1: Depends on Layer 0  │
│      → Parallel batch calc        │
│    - ...                          │
└───────────────────────────────────┘
    ↓
Results (with caching)

📊 Performance Benchmarks

Real-World Test Case

Scenario: Cross-border replenishment calculation with 216,000 formulas across 17 sheets.

Results:

Standard Excelize:  OOM crash after 2 hours
This Fork:          24 minutes, 6.2 GB peak memory

Breakdown:
  - Dependency graph build: 30 seconds
  - Layer 0 (data columns):  5 seconds (parallel)
  - Layer 1 (SUMIFS batch):  8 minutes (10,000 formulas)
  - Layer 2-8 (cascading):   16 minutes

Micro-Benchmarks

$ go test -bench=BenchmarkBatch -benchmem

BenchmarkBatchSetFormulas-8         170 cells/ms    5120 B/op
BenchmarkBatchCalcCold-8             38 cells/ms   12288 B/op
BenchmarkBatchCalcCached-8          322 cells/ms    2048 B/op
BenchmarkSUMIFSPattern10k-8          10 seconds (100x faster than naive)

🧪 Testing

This project includes comprehensive test coverage across unit tests, integration tests, benchmarks, and manual test suites.

Test Suite Overview

Test Category Files Coverage Purpose
Unit Tests 100+ *_test.go files Core functionality Formula engine, cell ops, batch optimizations
Integration Tests Embedded in test files End-to-end scenarios Real-world workbook processing
Benchmarks 50+ benchmark functions Performance validation Track optimization improvements
Manual Tests Excel 2016 test plan Compatibility verification Human-verified Excel parity
Fixtures 20+ sample files in test/ Test data Various Excel formats and edge cases

Running Tests

Quick Test Commands

# Run all tests with race detection (recommended)
go test -v -timeout 60m -race ./... -coverprofile='coverage.txt' -covermode=atomic

# Run specific test suites
go test -run TestBatch -v                    # All batch optimization tests
go test -run TestSUMIFS -v                   # SUMIFS pattern tests
go test -run TestRecalculate -v              # Dependency graph tests
go test -run TestCalc -v                     # Formula calculation tests

# Run benchmarks
go test -bench=. -benchmem                   # All benchmarks
go test -bench=BenchmarkBatch -benchmem      # Batch operation benchmarks
go test -bench=BenchmarkSUMIFS -benchmem     # SUMIFS optimization benchmarks

# Memory profiling
go test -memprofile mem.prof -bench=BenchmarkBatchSetFormulas
go tool pprof mem.prof

# CPU profiling
go test -cpuprofile cpu.prof -bench=BenchmarkRecalculate
go tool pprof cpu.prof

# Run tests with coverage report
go test -cover ./...
go test -coverprofile=coverage.out ./...
go tool cover -html=coverage.out

Test Categories

1. Formula Calculation Tests (calc_*_test.go)

Tests for the formula engine covering 100+ Excel functions:

  • Basic arithmetic and math functions (SUM, AVERAGE, ROUND, etc.)
  • Logical functions (IF, AND, OR, IFERROR, etc.)
  • Lookup functions (VLOOKUP, INDEX, MATCH, etc.)
  • Statistical functions (STDEV, MEDIAN, PERCENTILE, etc.)
  • Text functions (CONCATENATE, LEFT, RIGHT, MID, etc.)
  • Date/Time functions (DATE, NOW, DATEDIF, etc.)
  • Array formulas and complex nested formulas

Test Files: calc_test.go, calc_formula_test.go, calc_optimization_test.go, calc_performance_test.go

2. Batch Optimization Tests (batch_*_test.go)

Performance-critical tests for batch operations:

  • SUMIFS Pattern Detection: batch_sumifs_patterns_test.go
  • INDEX/MATCH Optimization: batch_index_match_patterns_test.go, batch_index_match_regression_test.go
  • AVERAGEIFS Optimization: batch_average_offset_test.go
  • Dependency Graph: batch_dependency_test.go, batch_dependency_integration_test.go
  • DAG Scheduler: batch_dag_scheduler_test.go
  • Cross-Sheet Formulas: batch_formula_cross_sheet_test.go

Key Metrics Validated:

  • 10-100x speedup for repeated pattern formulas
  • Correct dependency ordering via topological sort
  • Memory efficiency (LRU cache limits)
  • Concurrent calculation correctness

3. Cell & Range Operation Tests

  • Individual cell value/formula operations (cell_test.go)
  • Batch cell operations (cell_batch_test.go)
  • Range operations (cell_range_test.go)
  • Shared formulas (cell_range_shared_formula_test.go)
  • Style caching (cell_style_cache_test.go)

4. Concurrency & Thread Safety Tests

  • Race condition detection (concurrency_test.go, -race flag)
  • Concurrent writes (concurrent_write_test.go)
  • Cache synchronization across goroutines
  • Lock-free optimization validation

5. Real-World Scenario Tests

  • Cross-Border Replenishment: crossborder_replenishment_test.go

    • 216,000 formulas across 17 sheets
    • Tests complete dependency graph + recalculation
    • Validates memory management (should not OOM)
  • Excel 2016 Compatibility: excel2016_system_test.go

    • Mirrors QA team's manual test plan
    • Validates formula parity with Excel 2016

6. Benchmarks (*_benchmark_test.go)

Performance regression detection:

# Example benchmark results
BenchmarkBatchSetFormulas-8         170 cells/ms    5120 B/op    12 allocs/op
BenchmarkBatchCalcCold-8             38 cells/ms   12288 B/op    45 allocs/op
BenchmarkBatchCalcCached-8          322 cells/ms    2048 B/op     8 allocs/op
BenchmarkSUMIFSPattern10k-8          10 seconds    (100x faster than naive)
BenchmarkRecalculateDependency-8     24 minutes    6.2GB peak   (216k formulas)

Manual Testing Suite

Located in test/manual/, this suite provides Excel 2016 compatibility verification:

Test Plan (test/manual/excel2016_system_test_plan.md)

Comprehensive test cases covering:

  • Basic Formulas (TC-001 to TC-100+): Math, logic, text, date functions
  • Compound Formulas: Nested IF, INDEX+MATCH, SUMIFS combinations
  • Table Scenarios: Pivot tables, data validation, conditional formatting
  • Boundary Conditions: Large datasets, extreme values, error handling
  • Business Scenarios: Sales analysis, inventory management, financial reports

Test Data (test/manual/data/)

Pre-built workbooks for manual testing:

  • test_data_numeric.xlsx - Numeric edge cases
  • test_data_text.xlsx - Text/string scenarios
  • test_data_date.xlsx - Date/time handling
  • test_data_mixed.xlsx - Mixed data types
  • test_data_large.xlsx - Large dataset (100k+ rows)
  • test_data_business.xlsx - Real-world business scenarios
  • excel_formula_parity.xlsx - Excel vs Excelize formula results comparison

Execution Logs

Track test results over time:

  • execution_log_template.csv - Template for test runs
  • execution_log_2024q1.csv - Historical results

Data Generation

Rebuild test data deterministically:

go run test/manual/tools/generate_data.go

Test Fixtures (test/ directory)

Sample Excel files for automated tests:

  • CalcChain.xlsx - Calculation chain verification
  • Book1.xlsx - General workbook operations
  • SharedStrings.xlsx - Shared string table handling
  • MergeCell.xlsx - Merged cell operations
  • BadWorkbook.xlsx - Error handling validation
  • encryptAES.xlsx / encryptSHA1.xlsx - Encryption support
  • OverflowNumericCell.xlsx - Numeric precision edge cases
  • test/images/ - Image files for picture insertion tests

Examples (test/examples/)

Sample programs demonstrating library usage:

  • recalc.go - Complete dependency-aware recalculation example
    go run test/examples/recalc.go

Test Best Practices

When adding new features:

  1. Add unit tests for core functionality
  2. Add benchmarks if performance-critical
  3. Update manual test plan if Excel compatibility affected
  4. Run with -race flag to catch concurrency bugs
  5. Profile memory for large-scale operations
  6. Verify against Excel for formula calculation correctness

Continuous Integration

Tests run on every commit with:

  • Go 1.24+ on Linux, macOS, Windows
  • Race detector enabled
  • Coverage reporting
  • Benchmark comparison vs main branch

📚 Documentation

🔧 Advanced Configuration

Memory Management

// Configure cache sizes
f.rangeCache = newLRUCache(100) // Default: 50 entries

// Manual cache clearing
f.calcCache.Range(func(key, value interface{}) bool {
    f.calcCache.Delete(key)
    return true
})

// Force garbage collection after large operations
runtime.GC()

Timeout & Error Handling

options := excelize.CalcCellValuesDependencyAwareOptions{
    Timeout:             5 * time.Second, // Per-cell timeout
    SkipErrorCells:      true,            // Continue on errors
    EnableCircularCheck: true,            // Detect circular refs
}

⚠️ Known Limitations

  1. Circular references: Detected but not fully resolved (Excel-like iteration not implemented)
  2. External links: Not supported in dependency graph
  3. Volatile functions: NOW(), RAND() recalculated every time (by design)
  4. Array formulas: Limited support for complex multi-cell arrays
  5. Cross-workbook references: Not supported in batch optimization

🛣️ Roadmap

  • Incremental recalculation API (update single cell, recalc dependents only)
  • GPU acceleration for SUMPRODUCT on large ranges
  • Streaming calculation for files >1GB
  • WebAssembly compilation for browser-based calculation
  • Machine learning-based formula pattern detection
  • Support for more Excel functions (currently 100+, target 300+)

📄 License

BSD-3-Clause License (same as original Excelize)

Copyright (c) 2016-2025 The excelize Authors. All rights reserved.

🙏 Acknowledgments

This project is built on top of xuri/excelize, an excellent Go library for Excel file manipulation. All core functionality and the formula calculation engine are from the original project.

Optimizations added in this fork:

  • DAG-based dependency resolution with topological sorting
  • Batch SUMIFS/AVERAGEIFS optimization (pattern detection)
  • Multi-level caching system (8 cache layers)
  • Parallel layer calculation with worker pools
  • INDEX/MATCH hash indexing

📧 Contact

For questions about the optimizations in this fork, please open an issue on GitHub.

For questions about the core Excelize library, refer to the official documentation.


Performance Tip: For workbooks with >10,000 formulas, always use RecalculateAllWithDependency() instead of individual CalcCellValue() calls. The dependency-aware approach can be 10-100x faster.

test for real case perf

go run test/examples/recalc_perf.go test/real-ecomm/step3-template-10k-formulas.xlsx  > perf-10k.log 2>&1

About

Go language library for reading and writing Microsoft Excel™ (XLAM / XLSM / XLSX / XLTM / XLTX) spreadsheets

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Go 99.9%
  • Other 0.1%