Description
It is known that unicode support in CSV in Python 2 is tricky (see documentation of the csv moduel, and #14). This is originally written in Python 3 (back then panflute only support Python 3 before I ported it to be compatible in Python 2 as well), and use the same trick applied on panflute to support Python 2. The original thought is having partial Python 2 support (without unicode in CSV) is better than no support at all. But then unavoidably people do use this with Python 2 and unicode.
#14 proposed a fix that could solve the unicode problem. However, for various reason, an alternative CSV parser is considered:
-
It seems that CSV module in Python 2 and 3 behaves slightly differently. The last thing I want is Python 2 and 3 users see different behavior, leading this package to be less maintainable (frankly I don't want to deal with differences between Python 2 and 3...).
-
As in Added column-filter functionality #16, Filtering Subcells of CSV #17 that people would like to extend the functionality of pantable to be able to filter subcells from the CSV input. This feature might make the efficiency of CSV parser more critical. It is because if there's no filtering capability, we can reasonably assume the table size is small (for LaTeX, constraints by pages, for others like HTML, at least it is not too big to be rendered by a browser efficiently). But with filtering, the source CSV can be arbitrarily large, and only a small subset of table cells are filtered.
-
When using another CSV parser not from the standard library, then we need to either deal with that extra dependency, or conditional import, making the end users making the choice (and installing).
Criteria, based on the above 2 reasons, and other concerns:
- uniform Python 2/3 behavior
- unicode support
- high efficiency
- try to avoid conditional import so that I don't need to deal with different behaviors from different CSV parser (Python 2 & 3 CSV module, & that conditionally imported CSV module)
- try to make the dependency small and easy to install
Potential choices are:
- unicodecsv
- fastcsv
- numpy csv parser
- pandas csv parser
I like the pandas CSV parser since it is well known to be very fast. And I want some of pandas' capability to generate plots from tables. But it needs to be compiled, and alternative CPU architecture might or might not be supported (at least no pre-built binaries).