Skip to content

Commit

Permalink
Migrate to csv crate rewrite.
Browse files Browse the repository at this point in the history
This commit resists the urge to refactor/rewrite xsv and ports it over
to the new CSV API. It made a lot of things cleaner and even improved
the performance of core commands like `count`, `sample`, `search`,
`select` and `slice`.

This also removes the last remaining (dubious) uses of `unsafe` within
xsv.

Benchmarks before/after:

    benchmark               before                      after
    count                   0.26s   175.05  MB/sec      0.11   413.76  MB/sec
    flatten                 4.53s   10.04   MB/sec      4.54   10.02   MB/sec
    flatten_condensed       4.72s   9.64    MB/sec      4.45   10.22   MB/sec
    frequency               1.91s   23.82   MB/sec      1.82   25.00   MB/sec
    index                   0.28s   162.54  MB/sec      0.12   379.28  MB/sec
    sample_10               0.43s   105.84  MB/sec      0.18   252.85  MB/sec
    sample_1000             0.44s   103.44  MB/sec      0.18   252.85  MB/sec
    sample_100000           0.50s   91.02   MB/sec      0.29   156.94  MB/sec
    search                  0.59s   77.14   MB/sec      0.27   168.56  MB/sec
    select                  0.41s   111.00  MB/sec      0.14   325.09  MB/sec
    sort                    2.59s   17.57   MB/sec      2.18   20.87   MB/sec
    slice_one_middle        0.22s   206.88  MB/sec      0.08   568.92  MB/sec
    slice_one_middle_index  0.01s   4551.36 MB/sec      0.01   4551.36 MB/sec
    stats                   1.26s   36.12   MB/sec      1.09   41.75   MB/sec
    stats_index             0.19s   239.54  MB/sec      0.15   303.42  MB/sec
    stats_everything        2.13s   21.36   MB/sec      1.94   23.46   MB/sec
    stats_everything_index  1.00s   45.51   MB/sec      0.93   48.93   MB/sec
  • Loading branch information
BurntSushi committed May 23, 2017
1 parent bc5f456 commit 0f58a98
Show file tree
Hide file tree
Showing 32 changed files with 919 additions and 703 deletions.
21 changes: 7 additions & 14 deletions .travis.yml
Original file line number Diff line number Diff line change
@@ -1,17 +1,3 @@
#language: rust
#rust:
# - 1.9.0
# - stable
# - beta
# - nightly
#script:
# - cargo build --verbose
# - cargo doc
# - cargo test --verbose
# - if [ "$TRAVIS_RUST_VERSION" = "nightly" ]; then
# cargo bench --verbose;
# fi

language: rust
cache: cargo

Expand All @@ -33,6 +19,13 @@ matrix:
- os: linux
rust: stable
env: TARGET=x86_64-unknown-linux-musl
# Minimum Rust supported channel.
- os: linux
rust: 1.15.0
env: TARGET=x86_64-unknown-linux-gnu
- os: linux
rust: 1.15.0
env: TARGET=x86_64-unknown-linux-musl

before_install:
- export PATH="$PATH:$HOME/.cargo/bin"
Expand Down
39 changes: 19 additions & 20 deletions BENCHMARKS.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,27 +6,27 @@ These benchmarks were run with
which is a random 1,000,000 row subset of the world city population dataset
from the [Data Science Toolkit](https://github.com/petewarden/dstkdata).

These benchmarks were run on an Intel i3930K (6 CPUs, 12 threads) with 32GB of
memory.
These benchmarks were run on an Intel i7-6900K (8 CPUs, 16 threads) with 64GB
of memory.

```
count 0.28 seconds 162.54 MB/sec
flatten 5.31 seconds 8.57 MB/sec
flatten_condensed 5.39 seconds 8.44 MB/sec
frequency 2.54 seconds 17.91 MB/sec
index 0.27 seconds 168.56 MB/sec
sample_10 0.47 seconds 96.83 MB/sec
sample_1000 0.49 seconds 92.88 MB/sec
sample_100000 0.62 seconds 73.40 MB/sec
search 0.71 seconds 64.10 MB/sec
select 0.47 seconds 96.83 MB/sec
sort 3.36 seconds 13.54 MB/sec
slice_one_middle 0.22 seconds 206.88 MB/sec
slice_one_middle_index 0.01 seconds 4551.36 MB/sec
stats 1.37 seconds 33.22 MB/sec
stats_index 0.23 seconds 197.88 MB/sec
stats_everything 3.90 seconds 11.67 MB/sec
stats_everything_index 2.58 seconds 17.64 MB/sec
count 0.11 seconds 413.76 MB/sec
flatten 4.54 seconds 10.02 MB/sec
flatten_condensed 4.45 seconds 10.22 MB/sec
frequency 1.82 seconds 25.00 MB/sec
index 0.12 seconds 379.28 MB/sec
sample_10 0.18 seconds 252.85 MB/sec
sample_1000 0.18 seconds 252.85 MB/sec
sample_100000 0.29 seconds 156.94 MB/sec
search 0.27 seconds 168.56 MB/sec
select 0.14 seconds 325.09 MB/sec
sort 2.18 seconds 20.87 MB/sec
slice_one_middle 0.08 seconds 568.92 MB/sec
slice_one_middle_index 0.01 seconds 4551.36 MB/sec
stats 1.09 seconds 41.75 MB/sec
stats_index 0.15 seconds 303.42 MB/sec
stats_everything 1.94 seconds 23.46 MB/sec
stats_everything_index 0.93 seconds 48.93 MB/sec
```

### Details
Expand All @@ -39,4 +39,3 @@ The `count` command can be viewed as a sort of baseline of the fastest possible
command that parses every record in CSV data.

The benchmarks that end with `_index` are run with indexing enabled.

Loading

0 comments on commit 0f58a98

Please sign in to comment.