Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

code cleaning and fix typos in variable names #3

Merged
merged 78 commits into from
Jun 28, 2017
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
78 commits
Select commit Hold shift + click to select a range
8cb9da0
Merge remote-tracking branch 'csirobigdata/master'
lynnlangit May 19, 2017
e7a5ca9
updated local copy
lynnlangit May 19, 2017
b24ee23
fix typo in variable name
lynnlangit May 19, 2017
3aee2ee
run code formatting tool
lynnlangit May 19, 2017
a7be1a3
fixed spelling errors
lynnlangit May 19, 2017
92d8e51
update .gitignore for intellij
lynnlangit May 19, 2017
805d272
ignore
lynnlangit May 19, 2017
45039db
ignore
lynnlangit May 19, 2017
b26c38c
ignore
lynnlangit May 19, 2017
38bcb5c
ignore
lynnlangit May 19, 2017
9dba715
ignore
lynnlangit May 19, 2017
c42ef97
ignore
lynnlangit May 19, 2017
fc21e5a
ignore
lynnlangit May 19, 2017
d3823a7
ignore
lynnlangit May 19, 2017
a5413d7
clean code per warnings convert nulls to underscores
lynnlangit May 19, 2017
45ac81d
ignore
lynnlangit May 19, 2017
e42b2a0
i
lynnlangit May 19, 2017
b4eb5d6
i
lynnlangit May 19, 2017
d52aa96
i
lynnlangit May 19, 2017
63b20eb
i
lynnlangit May 19, 2017
b866236
f
lynnlangit May 19, 2017
9206ab8
f
lynnlangit May 19, 2017
6876692
f
lynnlangit May 19, 2017
fee741e
f
lynnlangit May 19, 2017
7c59017
f
lynnlangit May 19, 2017
598788f
f
lynnlangit May 19, 2017
684b58a
f
lynnlangit May 19, 2017
407ba6a
f
lynnlangit May 19, 2017
777ab1c
f
lynnlangit May 19, 2017
bce2e2a
f
lynnlangit May 19, 2017
367fcb5
f
lynnlangit May 19, 2017
874cc66
f
lynnlangit May 19, 2017
56f4293
f
lynnlangit May 19, 2017
38422b6
f
lynnlangit May 19, 2017
698867b
f
lynnlangit May 19, 2017
b3e5250
f
lynnlangit May 19, 2017
f66680d
f
lynnlangit May 19, 2017
715a778
f
lynnlangit May 19, 2017
f987f69
f
lynnlangit May 19, 2017
0c4d031
f
lynnlangit May 19, 2017
99eed1d
f
lynnlangit May 19, 2017
0104064
f
lynnlangit May 19, 2017
535af47
f
lynnlangit May 19, 2017
0ff3bed
f
lynnlangit May 19, 2017
b424db1
f
lynnlangit May 19, 2017
2dc1809
f
lynnlangit May 19, 2017
6171f67
f
lynnlangit May 19, 2017
4d54d6c
f
lynnlangit May 19, 2017
55795ae
f
lynnlangit May 19, 2017
1c9d64f
f
lynnlangit May 19, 2017
342a273
f
lynnlangit May 19, 2017
4ace82c
f
lynnlangit May 19, 2017
0767c32
remove untracked
lynnlangit May 19, 2017
683888e
reformatter
lynnlangit May 19, 2017
04bdbde
remove .iml files
lynnlangit May 19, 2017
54208b8
removed untracked
lynnlangit May 19, 2017
2e5a64f
added method Scaladoc info in widekmeans.scala
lynnlangit May 19, 2017
481d16d
Merge remote-tracking branch 'origin/master'
lynnlangit May 19, 2017
bae0709
removed unused imports
lynnlangit May 19, 2017
543a097
refactor CochranAmeritageTest
lynnlangit May 20, 2017
bc2a3c3
remove comments and refactor method names
lynnlangit May 20, 2017
2afb31f
fixed spelling errors in variable names
lynnlangit May 20, 2017
173a744
removing comments
lynnlangit May 22, 2017
b32afda
remove unused imports project-wide
lynnlangit May 22, 2017
53a7362
fixed spelling of length variable
lynnlangit May 22, 2017
706ff2f
remove unused imports
lynnlangit May 23, 2017
2e8353b
fix more spelling errors
lynnlangit May 23, 2017
9ec0e38
fix spelling and add first new unit test
lynnlangit May 23, 2017
72b0428
remove more unused imports
lynnlangit May 23, 2017
eb11c6f
Worked on readability for the Wide K-Means section
plyte May 26, 2017
73cf2d2
Added more comments
plyte May 26, 2017
93c3a84
temp fix to broken build on k-means
lynnlangit May 28, 2017
bfec349
refactoring ml files for human readability
lynnlangit May 29, 2017
1e30159
fixed variable name
lynnlangit May 31, 2017
c2b0580
Reorganization and Scala Docs added
plyte Jun 1, 2017
4c3a9cf
Continued Additions to Scala docs
plyte Jun 1, 2017
71cddd3
update to fix import issues
lynnlangit Jun 12, 2017
a45fafd
fixed typos from last refactoring in DecisionTrees
lynnlangit Jun 15, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
fixed spelling errors in variable names
  • Loading branch information
lynnlangit committed May 20, 2017
commit 2afb31fa8b90f36ebfb81862b6178bf058f3d8cc
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ import org.junit.Test;
import org.apache.spark.mllib.linalg.Vectors
import au.csiro.pbdava.ssparkle.common.utils.Logging

abstract class ClassificattionSplitterTest extends Logging {
abstract class ClassificationSplitterTest extends Logging {

def splitter(labels: Array[Int], nLabels: Int = 2): ClassificationSplitter

Expand All @@ -24,13 +24,13 @@ abstract class ClassificattionSplitterTest extends Logging {


@Test
def testConstansLabelSplit() {
def testConstantsLabelSplit() {
val splitInfo = splitter(Array(1, 1, 1, 1)).findSplit(Vectors.dense(0.0, 1.0, 2.0, 3.0).toArray, Range(0, 4).toArray)
assertEquals(SplitInfo(0, 0.0, 0.0, 0.0), splitInfo)
}

@Test
def testConstantsValuesSplist() {
def testConstantsValuesSplit() {
val splitInfo = splitter(Array(0, 1, 0, 1)).findSplit(Vectors.dense(1.0, 1.0, 1.0, 1.0).toArray, Range(0, 4).toArray)
assertNull(splitInfo)
}
Expand Down Expand Up @@ -60,16 +60,16 @@ abstract class ClassificattionSplitterTest extends Logging {

}

class JClassificationSplitterTest extends ClassificattionSplitterTest {
class JClassificationSplitterTest extends ClassificationSplitterTest {
def splitter(labels: Array[Int], nLabels: Int = 2) = new JClassificationSplitter(labels, nLabels, 4)
}


class JClassificationSplitterUnboundedTest extends ClassificattionSplitterTest {
class JClassificationSplitterUnboundedTest extends ClassificationSplitterTest {
def splitter(labels: Array[Int], nLabels: Int = 2) = new JClassificationSplitter(labels, nLabels)
}

class JConfusionClassificationSplitterTest extends ClassificattionSplitterTest {
class JConfusionClassificationSplitterTest extends ClassificationSplitterTest {
def splitter(labels: Array[Int], nLabels: Int = 2) = new JConfusionClassificationSplitter(labels, nLabels, 4)
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ class PairWiseDistanceTest extends SparkTest {


@Test
def testCorrectlCalculatesPairWiseDistance2D() {
def testCorrectlyCalculatesPairWiseDistance2D() {
val input = sc.parallelize(List(Array[Byte](0, 1), Array[Byte](0, 2), Array[Byte](1, 1)))
val result = PairwiseDistance().compute(input)
assertEquals(1, result.length)
Expand All @@ -44,7 +44,7 @@ class PairWiseDistanceTest extends SparkTest {


@Test
def testCorrectlCalculatesPairWiseDistance3d() {
def testCorrectlyCalculatesPairWiseDistance3d() {
val input = sc.parallelize(List(Array[Byte](0, 1, 1), Array[Byte](0, 2, 0), Array[Byte](0, 1, 0), Array[Byte](0, 2, 1)), 2)
val result = PairwiseDistance().compute(input)
assertEquals(3, result.length)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -27,8 +27,8 @@ class TreeDataCollector(treeStream: Stream[PredictiveModelWithImportance[Vector]
}

override def batchPredict(indexedData: RDD[(Vector, Long)], models: Seq[PredictiveModelWithImportance[Vector]], indexes: Seq[Array[Int]]): Seq[Array[Int]] = {
//TODO I should be prjecting with indexes here
//but it doed not matter in this case
//TODO I should be projecting with indexes here
//but it does not matter in this case
models.zip(indexes).map { case (model, indexes) => model.predictIndexed(indexedData) }
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ class WideDecisionTreeIntegratedTest extends SparkTest {
assertArrayEquals(expected, prediction)


// check variable imporcances
// check variable importances
val expectedImportances = CsvParser.parse(CsvFile("src/test/data/CNAE-9_R_importance.csv")).withRowIndex(0).withColIndex(0)
.firstCol(s"maxdepth_${maxDepth}").mapValues(CsvParser.parseDouble).values.toSeq.toArray

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ class WideDecisionTreeModelTest extends SparkTest {

@Test
def testCorrectlyPredictsComplexTree() {
// lets build a tree with two variables and 5 nodes
// let's build a tree with 2 variables and 5 nodes
val decisionTreeModel = new WideDecisionTreeModel(
SplitNode(majorityLabel = 0, size = 10, nodeImpurity = 0.0, splitVariableIndex = 1L, splitPoint = 1.0, impurityReduction = 0.0,
left = LeafNode(1, 0, 0.0),
Expand All @@ -33,7 +33,7 @@ class WideDecisionTreeModelTest extends SparkTest {

@Test
def testCorrectlyIdentifiedVariabelImportanceForComplexTree() {
// lets build a tree with two variables and 5 nodes
// let's build a tree with 2 variables and 5 nodes
val decisionTreeModel = new WideDecisionTreeModel(
SplitNode(majorityLabel = 0, size = 10, nodeImpurity = 1.0, splitVariableIndex = 1L, splitPoint = 1.0, impurityReduction = 0.0,
left = SplitNode(majorityLabel = 0, size = 4, nodeImpurity = 0.4, splitVariableIndex = 2L, splitPoint = 0.0, impurityReduction = 0.0,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -34,16 +34,16 @@ class WideRadomForrestModelTest extends SparkTest {

@Test
def whenOnePredictorPassesThePrediction() {
val assumedPreditions = Array(1, 2)
val model = new WideRandomForestModel(List(TestPredictorWithImportance(assumedPreditions, null).toMember), nLabels)
val assumedPredictions = Array(1, 2)
val model = new WideRandomForestModel(List(TestPredictorWithImportance(assumedPredictions, null).toMember), nLabels)
val prediction = model.predict(testData)
assertArrayEquals(assumedPreditions, prediction)
assertArrayEquals(assumedPredictions, prediction)
}

@Test
def whenManyPreditorsThenPredictsByVoting() {
val assumedPreditions = List(Array(1, 0), Array(1, 2), Array(1, 0))
val model = new WideRandomForestModel(assumedPreditions.map(TestPredictorWithImportance(_, null).toMember).toList, nLabels)
def whenManyPredictorsThenPredictsByVoting() {
val assumedPredictions = List(Array(1, 0), Array(1, 2), Array(1, 0))
val model = new WideRandomForestModel(assumedPredictions.map(TestPredictorWithImportance(_, null).toMember).toList, nLabels)
val prediction = model.predict(testData)
assertArrayEquals(Array(1, 0), prediction)
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,14 +20,14 @@ class TestFeatureGenerator(val samples: Seq[Feature])(implicit sc: SparkContext)
}


class NoisyEfectLabelGeneratorTest extends SparkTest {
class NoisyEffectLabelGeneratorTest extends SparkTest {

@Test
def testResponseGenertion_var_0_2_prec_0_75() {
val freatureGenerator = OrdinalFeatureGenerator(3, 1000, 500)
val labelGenerator = new NoisyEfectLabelGenerator(freatureGenerator)(1, Map("v_0" -> 1.0, "v_1" -> 0.5, "v_2" -> 0.25), fractionVarianceExplained = 0.2, classThresholdPrecentile = 0.75)
val classes = labelGenerator.getLabels(freatureGenerator.sampleNames)
// we wouild expect 75% of samples in class 0
def testResponseGeneration_var_0_2_prec_0_75() {
val featureGenerator = OrdinalFeatureGenerator(3, 1000, 500)
val labelGenerator = new NoisyEfectLabelGenerator(featureGenerator)(1, Map("v_0" -> 1.0, "v_1" -> 0.5, "v_2" -> 0.25), fractionVarianceExplained = 0.2, classThresholdPrecentile = 0.75)
val classes = labelGenerator.getLabels(featureGenerator.sampleNames)
// we would expect 75% of samples in class 0
assertEquals(0.75, classes.count(_ == 0).toDouble / classes.size, 0.01)

val baseVariance = meanAndVariance(labelGenerator.baseContinuousResponse).variance
Expand All @@ -38,11 +38,11 @@ class NoisyEfectLabelGeneratorTest extends SparkTest {

@Test
def testMultiplicativeResponseGeneration_var_0_2_prec_0_75() {
val freatureGenerator = OrdinalFeatureGenerator(3, 1000, 500)
val labelGenerator = new NoisyEfectLabelGenerator(freatureGenerator)(1, Map("v_0" -> 1.0, "v_1" -> 0.5, "v_2" -> 0.25),
val featureGenerator = OrdinalFeatureGenerator(3, 1000, 500)
val labelGenerator = new NoisyEfectLabelGenerator(featureGenerator)(1, Map("v_0" -> 1.0, "v_1" -> 0.5, "v_2" -> 0.25),
fractionVarianceExplained = 0.2, classThresholdPrecentile = 0.75, multiplicative = true)
val classes = labelGenerator.getLabels(freatureGenerator.sampleNames)
// we wouild expect 75% of samples in class 0
val classes = labelGenerator.getLabels(featureGenerator.sampleNames)
// we would expect 75% of samples in class 0
assertEquals(0.75, classes.count(_ == 0).toDouble / classes.size, 0.01)

val baseVariance = meanAndVariance(labelGenerator.baseContinuousResponse).variance
Expand All @@ -52,40 +52,40 @@ class NoisyEfectLabelGeneratorTest extends SparkTest {

@Test
def testAdditiveEffectCorrectness() {
val freatureGenerator = new TestFeatureGenerator(List(
val featureGenerator = new TestFeatureGenerator(List(
Feature("v_0", Array[Byte](0, 1, 2, 0)),
Feature("v_1", Array[Byte](0, 1, 2, 1)),
Feature("v_2", Array[Byte](0, 1, 2, 2)),
Feature("v_3", Array[Byte](2, 2, 2, 2))
))
val labelGenerator = new NoisyEfectLabelGenerator(freatureGenerator)(1, Map("v_0" -> 0.1, "v_1" -> 0.5, "v_2" -> 2.0),
val labelGenerator = new NoisyEfectLabelGenerator(featureGenerator)(1, Map("v_0" -> 0.1, "v_1" -> 0.5, "v_2" -> 2.0),
fractionVarianceExplained = 0.2, classThresholdPrecentile = 0.75, multiplicative = false)

val classes = labelGenerator.getLabels(freatureGenerator.sampleNames)
val classes = labelGenerator.getLabels(featureGenerator.sampleNames)
assertEquals(DenseVector[Double](-2.6, 0, 2.6, 1.9), labelGenerator.baseContinuousResponse)
}

@Test
def testMultiptlicativeEffectCorrectness() {
val freatureGenerator = new TestFeatureGenerator(List(
def testMultiplicativeEffectCorrectness() {
val featureGenerator = new TestFeatureGenerator(List(
Feature("v_0", Array[Byte](0, 1, 2, 0)),
Feature("v_1", Array[Byte](0, 1, 2, 1)),
Feature("v_2", Array[Byte](0, 1, 2, 2)),
Feature("v_3", Array[Byte](2, 2, 2, 2))
))
val labelGenerator = new NoisyEfectLabelGenerator(freatureGenerator)(1, Map("v_0" -> 0.1, "v_1" -> 0.5, "v_2" -> 2.0),
val labelGenerator = new NoisyEfectLabelGenerator(featureGenerator)(1, Map("v_0" -> 0.1, "v_1" -> 0.5, "v_2" -> 2.0),
fractionVarianceExplained = 0.2, classThresholdPrecentile = 0.75, multiplicative = true)

val classes = labelGenerator.getLabels(freatureGenerator.sampleNames)
val classes = labelGenerator.getLabels(featureGenerator.sampleNames)
assertEquals(DenseVector[Double](-0.1, 1.0, 0.1, -0.2), labelGenerator.baseContinuousResponse)
}


@Test
def testResponseGenertion_var_0_5_prec_0_50() {
val freatureGenerator = OrdinalFeatureGenerator(3, 1000, 500)
val labelGenerator = new NoisyEfectLabelGenerator(freatureGenerator)(1, Map("v_0" -> 1.0, "v_1" -> 0.5, "v_2" -> 0.25), fractionVarianceExplained = 0.5, classThresholdPrecentile = 0.5)
val classes = labelGenerator.getLabels(freatureGenerator.sampleNames)
def testResponseGeneration_var_0_5_prec_0_50() {
val featureGenerator = OrdinalFeatureGenerator(3, 1000, 500)
val labelGenerator = new NoisyEfectLabelGenerator(featureGenerator)(1, Map("v_0" -> 1.0, "v_1" -> 0.5, "v_2" -> 0.25), fractionVarianceExplained = 0.5, classThresholdPrecentile = 0.5)
val classes = labelGenerator.getLabels(featureGenerator.sampleNames)
// we wouild expect 75% of samples in class 0
assertEquals(0.5, classes.count(_ == 0).toDouble / classes.size, 0.01)

Expand Down
4 changes: 2 additions & 2 deletions src/test/scala/au/csiro/variantspark/utils/SampleTest.scala
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ class SampleTest {
}

@Test
def testFreactionSampleWithoutReplacement() {
def testFractionSampleWithoutReplacement() {
val nSize = 100
val fraction = 0.5
val sample = Sample.fraction(nSize, fraction, false)
Expand All @@ -47,7 +47,7 @@ class SampleTest {


@Test
def testFreactionSampleWithReplacement() {
def testFractionSampleWithReplacement() {
val nSize = 100
val fraction = 0.5
val sample = Sample.fraction(nSize, fraction, true)
Expand Down