FeaTensor: Parallel Feature Extraction Algorithms for Sparse Tensors

Documentation 📑

This repository provides different implementations for extracting the same features from tensors. The input tensor format is COO (.tns file format).

For more information on this tool, please refer to the manuscript on this link.

Compilation & Running

To compile and run our main code, use following commands, replace placeholders with corresponding values:

TO COMPILE:

make

USAGE:

featen [options]                                                                                                                                               
-i input : input tensor path                                                                                                                                  
-o out : feature info output file                                                                                                                            
-m method : feature extraction method. Options:{map, sort, group, hybrid}  [DEFAULT : map]                                                                                                            
-c csv_out : feture out file to be in csv {1} or json {0} format  [DEFAULT : 1]                                                                                        
-d only3d : features to be in only 3 modes {1} or M modes {0} (only valid for MAP) [DEFAULT : 0]

SIMPLEST USAGE:

./featen -i [input-tensor-path] -o [output-file-path]

EXAMPLE:

./featen -i ../data_tensors/sample_small_3D.tns -o ../features/sample_small_3D_feat.txt

Extraction Methods

All feature extraction methods find the features of the tensor exactly. Users can use whichever method they like. These methods provide different performances depending on the given tensor.

To use one, write the choice number of the method as a parameter (the placeholder : [algorithm choice]).

Method	Description
MAP	Uses hash table implemented with `std::unordered_map` to calculate # of nnz per fiber & slice to do the extraction.
SORT	Sorts all the indices along all modes to make the calculation.
GROUP	Groups the slices and fibers according to their indices, similar to CSF construction.
HYBRID	Uses sorting-based and grouping-based algorithms for different modes, depending on mode properties.

For 3-mode tensors, all the methods extract all features along all modes.

For M-mode tensors with $M \geq 4$, the option named only-3-mode ( only3d = 1 ) extracts the features along only the modes with the three largest sizes. This option is available for all methods in FeaTensor. If this option is not used ( only3d = 0 ), then all the features along all modes are extracted, which is only available for the MAP method for $M \geq 4$.

Feature Set

The program writes the tensor features into a file.

It first prints the general features, whose meanings are:

name  order  sizes[] nnz sparsity sliceCnt fiberCnt nnzSliceCnt nnzFiberCnt slice_sparsity fiber_sparsity

The description of general features is shown in the table below.

Feature Name	Description
name	Name of the tensor
order	Order ( mode count) of the tensor
sizes[]	Sizes of the tensor, which is an array of size order
nnz	Number of nonzeros
sparsity	Nonzero density of the tensor
sliceCnt	# of all possible slices (even if it has no nnz)
fiberCnt	# of all possible fibers (even if it has no nnz)
nnzSliceCnt	# of nonzero slices (which has at least 1 nnz)
nnzFiberCnt	# of nonzero fibers (which has at least 1 nnz)
slice_sparsity	Density of nonzero slices
fiber_sparsity	Density of nonzero fibers

The program then prints the kind- and mode-dependent features for each mode and kind (nonzeros_per_slice, nonzeros_per_fibers, fibers_per_slice).

For features of nonzeros_per_slice and fibers_per_slice kinds, the output is in the form:

id1 id2 all_cnt nz_cnt nz_density max min dev sum avg imbal stDev cv avg_onlynz imbal_onlynz stDev_onlynz cv_onlynz

Here id1 and id2 represent the ignored dimensions of mode-(id1, id2) slice

For features of nonzeros_per_fiber kind, the output is in the form:

id all_cnt nz_cnt nz_density max min dev sum avg imbal stDev cv avg_onlynz imbal_onlynz stDev_onlynz cv_onlynz

Here id represents the ignored dimension of mode-(id) fiber

The description of kind- and mode-dependent features is shown in the table below.

Feature Name	Description
all_cnt	all count including empty
nz_cnt	nonzero count
nz_density	nonzero sparsity (nz_cnt / all_cnt)
max	maximum value
min	minimum value
dev	deviation (max – min)
sum	summation of values
avg	average value (sum / all_cnt)
imbal	imbalance : (max-avg)/max
stDev	standard deviation
cv	coefficient of variation (stDev/avg)
avg_onlynz	average by excluding empty (sum / nz_cnt)
imbal_onlynz	imbal by excluding empty (max- avg_onlynz)/max
stDev_onlynz	stDev by excluding empty
cv_onlynz	cv by excluding empty (stDev_onlynz / avg_onlynz )

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
sample_tensors		sample_tensors
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
featen		featen
main.cpp		main.cpp
reduction.cpp		reduction.cpp
reduction.h		reduction.h
run_tf_csv-sort.sh		run_tf_csv-sort.sh
run_tf_csv.sh		run_tf_csv.sh
run_tf_json.sh		run_tf_json.sh
tenfeat		tenfeat
tensor.cpp		tensor.cpp
tensor.h		tensor.h
util.cpp		util.cpp
util.h		util.h

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FeaTensor: Parallel Feature Extraction Algorithms for Sparse Tensors

Documentation 📑

Compilation & Running

Extraction Methods

Feature Set

About

Releases

Packages

Contributors 2

Languages

License

sparcityeu/FeaTensor

Folders and files

Latest commit

History

Repository files navigation

FeaTensor: Parallel Feature Extraction Algorithms for Sparse Tensors

Documentation 📑

Compilation & Running

Extraction Methods

Feature Set

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages