Skip to content

dduenker/konect-extr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

KONECT Extraction
=================

This collection of network extraction tools is part of the KONECT
project: 

http://konect.uni-koblenz.de

This is code for generating the network datasets used in KONECT. 

In each subdirectory of extr/ you can find code to extract one group of
datasets, usually from one source, and generate a TSV file from it.
If you are looking for the code for a specific network, you can find
the name of the subirectory in the field "extr" of the meta.* file,
which comes with the TSV files you can download from our network pages.

Please note that not every network has the extraction code publically
available.  This is only the code we can make available.  Also, many
datasets have been extracted so long ago that we don't have the code
anymore.  Also, some datasets have been contributed by people that
didn't give us their code.  (If you contribute a dataset to KONECT, the
best is to give us a self-contained directory that uses Stu to generate
the dataset in the correct format.)

Keep in mind that most code in here WAS EXECUTED JUST ONCE.  The code is
of corresponding quality.  We provide the code for the purpose of full
disclosure -- not in order to deliver a full-features extraction
library. 

Some directories here also contain code for analysing the datasets,
which has been used in a few papers.  


Usage
=====

To build the datasets, execute "make" or "stu" inside each subdirectory
of extr/, depending on whether a 'Makefile' or a 'main.stu' is present.
The code downloads the datasets from their online sources and converts
them to the KONECT format.

Note that many directories have additional requirements.  Read the
'Makefile' or 'main.stu' (or rarely, 'README') for more information. 


Contribute
==========

If you want to contribute your dataset to KONECT, you are welcome.  In
order of preference, we accept the following:

(1) A self-contained directory containing Stu code that downloads the
    dataset from a long-term stable location on the Web and transforms
    it to KONECT format.  Preferred programming languages are the shell
    and Perl 5.  Other programming languages are accepted if they are easy
    to execute (i.e., as scripts).  In particular, we won't be unhappy
    about Python, Ruby, Perl 6, Bash, etc.  If your program needs to be
    compiled (e.g., C, C++, Java), then please provide a self-contained
    directory that executes the compilation and the extraction.  We
    don't accept precompiles files (e.g., *.jar, ELF). 
(2) A long-term stable URL of a dataset, in KONECT format.
(3) A long-term stable URL of a dataset, in text format.
(4) Giving us the dataset in KONECT format.
(5) Giving us the dataset in a text format.

If your dataset is in a binary format, please convert it to text first.  

The simpler the better:  we prefer a one-edge-per-line text format to
XML; we prefer XML to a binary format such as XLS. 


Requirements
============

You may need to install the following additional software packages,
depending on the dataset: 

    - Stu
    - Perl 
    - Some other tools; depending on the dataset


Extraction Tools
================

The directories sh/ and c/ contain general-purpose functions for
transforming data into the KONECT format.  They are used by various
directories under extr/, and you may also use them to extract your own
software.  


License
=======

The extraction code and library is distributed under the terms of the
GNU General Public License version 3, which you can find provided in the
file 'COPYING' in this directory. 


Stu
===

We're converting the directories to Stu, a replacement for Make, also
written at the University of Koblenz-Landau.  Stu can be installed from

https://github.com/kunegis/stu/

Stu targets:

default target:  generate all data.

@deploy:  copy over to konect/.  These copy the relevant files into
konect/dat/, but usually don't create the symlinks in uni/.


Code Quality
============

Most code in the directories is broken, because it was last executed
just once, and that was a long time ago in most cases.  Many things have
changed in the time.  If you want to execute any of this code, you'll
most likely need to hack it.  

In particular:

- The URL from which KONECT gets the data is not available anymore. 
- Scripts have been moved to sh/, but uses of it have not been updated
- Extraction directories are now all under extr/, but code has not been
  changed.
- Some code is really old and was written long before I learned proper
  practices for writing shell scripts, etc.

Also, some directories use Stu and older ones Make.

If you need help, please politely ask Jérôme Kunegis <kunegis@gmail.com>
and he may update the code to work.  (Up to websites that have
disappeared; Jérôme can't do anything about those ;)