Skip to content

Fix typo in the DSA1 implementation #67

Open
@shawnmjones

Description

@shawnmjones

After reworking Hypercane to use '.halg' formatted files as part of the IIPC 2021 Grant work, the DSA1 algorithm implementation is now wrong. We execute the time slice twice instead of the DBSCAN step:

# prevent extra work if we already have it from previous runs
if [ ! -e ${TIME_SLICE_FILE} ]; then
echo "clustering mementos from remainder by time"
hc cluster time-slice -i mementos -a ${ONLY_ENGLISH_FILE} -o ${TIME_SLICE_FILE} -l ${TIME_SLICE_LOG}
fi
# apply DBSCAN to cluster by Simhash distance
DBSCAN_FILE=${WORKING_DIRECTORY}/dsa1-dbscan.tsv
DBSCAN_LOG=${WORKING_DIRECTORY}/dsa1-cluster-dbscan.log
# prevent extra work if we already have it from previous runs
if [ ! -e ${DBSCAN_FILE} ]; then
echo "clustering mementos from remainder by Simhash"
hc cluster time-slice -i mementos -a ${TIME_SLICE_FILE} -o ${DBSCAN_FILE} -l ${DBSCAN_LOG}
fi

It needs to follow AlNoamany's Algorithm again, like it did while working on my dissertation work.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions