INC-SC-DBSCAN: Incremental Schema Discovery for RDF Data at Scale

INC-SC-DBSCAN is an incremental schema discovery approach for massive RDF data. It is based on a scalable and incremental density-based clustering algorithm which propagates the changes occurring in the dataset into the clusters corresponding to the classes of the schema.

INC-SC-DBSCAN is implemented in Scala and using the Apache Spark framework.

Building the project

Maven is used to build the project. The Maven wrapper tool allows to build the project without a local maven install. Due to some constraints imposed by the Scala compiler, a JDK 8 is needed.

On Linux

export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/
./mvnw package -DskipTests

On Windows

set JAVA_HOME=C:\path\to\jdk8
mvnw.cmd package -DskipTests

Running the algorithm

The main class is david/sc_dbscan/Main.scala.

spark-submit --class david.sc_dbscan.Main \\
             target/sc_dbscan-1.0-jar-with-dependencies.jar \\
             --eps X.X --coef Y --cap C  --mpts Y --oldData F  dataset

WHERE:
  --eps 	: the similarity threshold epsilon (between 0 and 1)
  --coef 	: a boolean that defines whether it clusters patterns or entities
  --cap 	: the maximum capacity of a computing node (in number of entities)
  --mpts 	: the density thresholg minPts
  --oldData : the clustering result of previous processes
  dataset   : the path to the dataset

For Example

spark-submit --class david.sc_dbscan.Main \\
             target/sc_dbscan-1.0-jar-with-dependencies.jar \\
             --eps 0.8 --coef false --cap 2000  --mpts 3 --oldData DataSets/T1000L10N10000_clusters DataSets/T100L10N10000

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
logs		logs
src		src
LICENSE		LICENSE
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

INC-SC-DBSCAN: Incremental Schema Discovery for RDF Data at Scale

Building the project

Running the algorithm

About

Uh oh!

Releases

Packages

Languages

License

BOUHAMOUM/incremental_sc_dbscan

Folders and files

Latest commit

History

Repository files navigation

INC-SC-DBSCAN: Incremental Schema Discovery for RDF Data at Scale

Building the project

Running the algorithm

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages