GitHub - EelcoHoogendoorn/Numpy_arraysetops_EP: Numpy group_by and set-operations

Numpy indexed operations

This package contains functionality for indexed operations on numpy ndarrays, providing efficient vectorized functionality such as grouping and set operations.

Rich and efficient grouping functionality:
- splitting of values by key-group
- reductions of values by key-group
Generalization of existing array set operation to nd-arrays, such as:
- unique
- union
- difference
- exclusive (xor)
- contains / in (in1d)
Some new functions:
- indices: numpy equivalent of list.index
- count: numpy equivalent of collections.Counter
- mode: find the most frequently occuring items in a set
- multiplicity: number of occurrences of each key in a sequence
- count_table: like R's table or pandas crosstab, or an ndim version of np.bincount

Some brief examples to give an impression hereof:

# three sets of graph edges (doublet of ints)
edges = np.random.randint(0, 9, (3, 100, 2))
# find graph edges exclusive to one of three sets
ex = exclusive(*edges)
print(ex)
# which edges are exclusive to the first set?
print(contains(edges[0], ex))
# where are the exclusive edges relative to the totality of them?
print(indices(union(*edges), ex))
# group and reduce values by identical keys
values = np.random.rand(100, 20)
# and so on...
print(group_by(edges[0]).median(values))

Installation

> conda install numpy-indexed -c conda-forge

or

> pip install numpy-indexed

See: https://pypi.python.org/pypi/numpy-indexed

Design decisions:

This package builds upon a generalization of the design pattern as can be found in numpy.unique. That is, by argsorting an ndarray, many subsequent operations can be implemented efficiently and in a vectorized manner.

The sorting and related low level operations are encapsulated into a hierarchy of Index classes, which allows for efficient lookup of many properties for a variety of different key-types. The public API of this package is a quite thin wrapper around these Index objects.

The two complex key types currently supported, beyond standard sequences of sortable primitive types, are ndarray keys (i.e, finding unique rows/columns of an array) and composite keys (zipped sequences). For the exact casting rules describing valid sequences of key objects to index objects, see as_index().

Todo and open questions:

There may be further generalizations that could be built on top of these abstractions. merge/join functionality perhaps?

Name		Name	Last commit message	Last commit date
Latest commit History 267 Commits
numpy_indexed		numpy_indexed
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.rst		README.rst
appveyor.yml		appveyor.yml
examples.py		examples.py
setup.cfg		setup.cfg
setup.py		setup.py
version.py		version.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Numpy indexed operations

Installation

Design decisions:

Todo and open questions:

About

Releases 12

Packages

Contributors 3

Languages

License

EelcoHoogendoorn/Numpy_arraysetops_EP

Folders and files

Latest commit

History

Repository files navigation

Numpy indexed operations

Installation

Design decisions:

Todo and open questions:

About

Resources

License

Stars

Watchers

Forks

Releases 12

Packages 0

Contributors 3

Languages

Packages