Skip to content

hanzi-arithmetic is a Python library for doing math directly with Chinese numerals. It parses Hanzi strings like “六” or “七十”, performs arithmetic, and returns results in proper Chinese form (e.g. “六” + “七十” → “七十六”)

License

Notifications You must be signed in to change notification settings

MCDong/hanzi-arithmetic

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

hanzi-arithmetic

A comprehensive Python library for parsing, formatting, and performing arithmetic operations with Chinese numerals (汉字数字). Supports traditional and simplified Chinese, regional variants, financial formats, and full mathematical operations.

Python Version License: MIT

✨ Features

  • Full Arithmetic Support: All Python operators (+, -, *, /, //, %, **) with Chinese numerals
  • Multi-Regional Support: Mainland China (CN), Taiwan (TW), Hong Kong (HK) variants
  • Multiple Writing Systems: Simplified, Traditional, and Financial (uppercase) formats
  • Decimal & Negative Numbers: Complete support for floating-point and negative values
  • Intelligent Parsing: Handles complex patterns, zero-bridging, and regional differences
  • Round-Trip Consistency: Parse → Format → Parse yields identical results
  • Large Number Support: Up to 10^44 (載) with proper unit handling
  • Profile System: Configurable formatting and parsing behavior

🚀 Quick Start

Installation

pip install hanzi-arithmetic

Basic Usage

from hanzi_arithmetic import chinese

# Create Chinese numbers
num1 = chinese("三千五百")      # 3500
num2 = chinese("六百五十")      # 650

# Perform arithmetic
result = num1 + num2
print(result)  # 四千一百五十 (4150)

# Mix with regular numbers
result = num1 * 2 + 100
print(result)  # 七千一百 (7100)

# Access the numeric value
print(int(result))    # 7100
print(float(result))  # 7100.0

Regional Variants

from hanzi_arithmetic import chinese, CN_EVERYDAY, TW_HK_EVERYDAY, FINANCIAL_CN

# Mainland China (Simplified)
cn_num = chinese(12000, CN_EVERYDAY)
print(cn_num)  # 一万二千

# Taiwan/Hong Kong (Traditional)
tw_num = chinese(12000, TW_HK_EVERYDAY)
print(tw_num)  # 一萬二千

# Financial/Banking Format
financial = chinese(12000, FINANCIAL_CN)
print(financial)  # 壹万贰仟

📚 Comprehensive Examples

Complex Arithmetic Operations

from hanzi_arithmetic import chinese

# Chained operations
result = chinese("一万") + chinese("五千") - 3000 + chinese("八百")
print(result)  # 一万二千八百

# Mixed type operations
price = chinese("九万八千")
discount = price * 0.15  # 15% discount
final_price = price - discount
print(final_price)  # 八万三千三百

# Division and remainders
total = chinese("十万")
parts = total / 7
remainder = total % 7
print(f"Each part: {parts}, Remainder: {remainder}")

Decimal Numbers

# Decimal parsing and formatting
decimal_num = chinese("三点一四一五九")  # 3.14159
print(float(decimal_num))  # 3.14159

# Decimal arithmetic
result = chinese("十点五") + 2.3
print(result)  # 十二点八

Large Numbers and Units

# Large number handling
trillion = chinese("一万亿")  # 1 trillion (Mainland)
tw_trillion = chinese("一兆", TW_HK_EVERYDAY)  # 1 trillion (Taiwan)

print(int(trillion))     # 1000000000000
print(int(tw_trillion))  # 1000000000000

# Very large numbers
huge_num = chinese("九千九百万亿")
print(f"Value: {int(huge_num):,}")  # 9,900,000,000,000,000

Financial and Banking Applications

from hanzi_arithmetic import FINANCIAL_CN, FINANCIAL_TW_HK

# Financial formats (anti-fraud uppercase)
amount = chinese(1234567, FINANCIAL_CN)
print(amount)  # 壹佰贰拾叁万肆仟伍佰陆拾柒

# Banking calculations
principal = chinese("十万", FINANCIAL_CN)  # 100,000
interest_rate = 0.045  # 4.5%
interest = principal * interest_rate
total = principal + interest
print(f"Principal: {principal}")  # 壹拾万
print(f"Interest: {chinese(int(interest), FINANCIAL_CN)}")  # 肆仟伍佰
print(f"Total: {chinese(int(total), FINANCIAL_CN)}")  # 壹拾万肆仟伍佰

Text Processing and NLP Applications

# Document processing
text_amounts = ["三万五千", "十二万八千", "五十万"]
total_amount = sum(chinese(amount).value for amount in text_amounts)
print(f"Total: {chinese(total_amount)}")  # 六十五万三千

# Multi-format parsing (handles different input styles)
inputs = ["两万", "兩萬", "二万", "貳萬"]  # Different ways to write 20,000
values = [chinese(inp).value for inp in inputs]
print(f"All equal: {all(v == 20000 for v in values)}")  # True

🌏 Regional and Format Support

Profile System

The library uses profiles to handle different regional and formatting preferences:

from hanzi_arithmetic import (
    CN_EVERYDAY,      # Mainland China, everyday use
    TW_HK_EVERYDAY,   # Taiwan/Hong Kong, everyday use
    FINANCIAL_CN,     # Mainland China, financial format
    FINANCIAL_TW_HK   # Taiwan/Hong Kong, financial format
)

number = 1234567

for profile in [CN_EVERYDAY, TW_HK_EVERYDAY, FINANCIAL_CN, FINANCIAL_TW_HK]:
    formatted = chinese(number, profile)
    print(f"{profile.__class__.__name__}: {formatted}")

Key Differences

Feature CN Everyday TW/HK Everyday CN Financial TW/HK Financial
Large Units 万亿 (wan yi) 兆 (zhao) 万亿
Script Simplified Traditional Simplified Traditional
Digits 一二三... 一二三... 壹贰叁... 壹貳參...
Zero

🔧 Advanced Features

Custom Profiles

from hanzi_arithmetic import ChineseNumberProfile, Script, Locale

# Create custom profile
custom_profile = ChineseNumberProfile(
    script=Script.TRADITIONAL,
    locale=Locale.TW,
    use_liang_output=True,  # Use 兩 instead of 二
    accept_archaic=True     # Accept archaic forms like 廿 (20)
)

num = chinese(2000, custom_profile)
print(num)  # Uses custom formatting rules

Exception Handling

from hanzi_arithmetic.exceptions import ParseError, FormatError

try:
    invalid_num = chinese("不是数字")  # Invalid text
except ParseError as e:
    print(f"Parse error: {e}")
    print(f"Problem text: {e.text}")

Zero Bridging and Complex Patterns

# Complex numbers with zero bridging
complex_nums = [
    "一万零三",      # 10,003
    "十万零五十",    # 100,050
    "一千万零七",    # 10,000,007
    "三千零一万",    # 30,010,000
]

for num_text in complex_nums:
    num = chinese(num_text)
    print(f"{num_text} = {int(num):,}")

🎯 Use Cases

Financial Applications

  • Banking systems and financial software
  • Invoice and receipt processing
  • Anti-fraud number verification (financial formats)
  • Accounting and bookkeeping systems

Natural Language Processing

  • Chinese text analysis and extraction
  • Document processing and data mining
  • Multilingual number normalization
  • Language learning applications

Localization & Internationalization

  • Cross-regional number format handling
  • Traditional/Simplified Chinese conversion
  • Cultural adaptation for different markets
  • Government and legal document processing

Research & Academia

  • Historical document analysis
  • Linguistic research on number systems
  • Cultural studies and anthropology
  • Educational tools and curricula

📖 API Reference

Core Classes

ChineseNumber

The main class representing a Chinese number with full arithmetic support.

class ChineseNumber:
    def __init__(self, value: Union[str, int, float, ChineseNumber],
                 profile: Optional[ChineseNumberProfile] = None)

    @property
    def value(self) -> Union[int, float]  # Numeric value
    @property
    def chinese(self) -> str              # Chinese text representation
    @property
    def profile(self) -> ChineseNumberProfile  # Formatting profile

chinese() Factory Function

Convenient way to create ChineseNumber instances.

def chinese(value: Union[str, int, float, ChineseNumber],
           profile: Optional[ChineseNumberProfile] = None) -> ChineseNumber

Profiles

  • CN_EVERYDAY: Mainland China, everyday usage (default)
  • TW_HK_EVERYDAY: Taiwan/Hong Kong, everyday usage
  • FINANCIAL_CN: Mainland China, financial/banking format
  • FINANCIAL_TW_HK: Taiwan/Hong Kong, financial format

Exceptions

  • ChineseNumberError: Base exception class
  • ParseError: Raised when text cannot be parsed
  • FormatError: Raised when number cannot be formatted
  • ValidationError: Raised for grammar rule violations

🧪 Testing

The library includes comprehensive test suites covering:

  • Core integer parsing and formatting
  • Decimal and negative number handling
  • Arithmetic operator overloading
  • Regional variant processing
  • Zero-bridging and complex patterns
  • Round-trip consistency (parse → format → parse)
  • Financial format validation
  • Error handling and edge cases

Run tests with:

pytest tests/

🛠 Development

Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Add tests for new functionality
  4. Ensure all tests pass
  5. Submit a pull request

Requirements

  • Python 3.8+
  • No external dependencies for core functionality
  • pytest for testing

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🏷 Keywords

Chinese numerals, Hanzi numbers, Chinese digits, Traditional Chinese, Simplified Chinese, Taiwan, Hong Kong, Mainland China, financial formatting, banking numbers, arithmetic operations, number parsing, text processing, NLP, natural language processing, multilingual, localization, internationalization, 汉字数字, 中文数字, 繁体中文, 简体中文, 金融格式, 数字处理


Made with ❤️ for the Chinese language processing community.

About

hanzi-arithmetic is a Python library for doing math directly with Chinese numerals. It parses Hanzi strings like “六” or “七十”, performs arithmetic, and returns results in proper Chinese form (e.g. “六” + “七十” → “七十六”)

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages