Skip to content

Feature/scala code/ch02 biman #8

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 43 commits into from
Jan 7, 2022
Merged
Changes from 1 commit
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
75e693e
added kmers for FASTA and FASTQ formats
mahmoudparsian Dec 27, 2021
6ec2ea2
DNA Based count in scala
deepakmca05 Dec 27, 2021
e1ceb10
Indentation fix
deepakmca05 Dec 27, 2021
863b527
improved documentation
mahmoudparsian Dec 27, 2021
e8c6d28
ch02
deepakmca05 Dec 29, 2021
606ad70
Feature/scala code/ch01 (#5)
deepakmca05 Dec 29, 2021
ccfcd22
updated README.md
mahmoudparsian Dec 29, 2021
f4082db
updated README.md
mahmoudparsian Dec 29, 2021
077af38
ch02-changes
deepakmca05 Dec 30, 2021
01df972
ch02-changes
deepakmca05 Dec 30, 2021
c3bbf35
Feature/scala code/ch01 missing class gradle (#7)
deepakmca05 Dec 30, 2021
4c96c7c
added bonus chapter correlation
mahmoudparsian Dec 30, 2021
db3647b
added bonus chapter correlation
mahmoudparsian Dec 30, 2021
1e18e86
updated docs
mahmoudparsian Dec 30, 2021
fa2c1ab
updated docs
mahmoudparsian Dec 30, 2021
7669234
updated docs
mahmoudparsian Dec 30, 2021
a70edfa
updated docs
mahmoudparsian Dec 30, 2021
8207fcf
updated docs
mahmoudparsian Dec 30, 2021
a717c9a
updated docs
mahmoudparsian Dec 30, 2021
a348f11
updated docs
mahmoudparsian Dec 30, 2021
3da5bd5
updated docs
mahmoudparsian Dec 30, 2021
5358172
updated docs
mahmoudparsian Dec 30, 2021
b343f4d
updated docs
mahmoudparsian Dec 30, 2021
fa9eb2a
improved documentation
mahmoudparsian Dec 31, 2021
b36a229
improved documentation
mahmoudparsian Dec 31, 2021
bf2a4ba
improved documentation
mahmoudparsian Dec 31, 2021
4e3b63e
improved documentation
mahmoudparsian Dec 31, 2021
f24f095
improved documentation
mahmoudparsian Dec 31, 2021
dc1e22e
improved documentation
mahmoudparsian Dec 31, 2021
a02276a
improved documentation
mahmoudparsian Dec 31, 2021
4755a19
improved documentation
mahmoudparsian Dec 31, 2021
9d12125
improved documentation
mahmoudparsian Dec 31, 2021
ecc2cb5
improved documentation
mahmoudparsian Dec 31, 2021
efcf612
improved documentation
mahmoudparsian Dec 31, 2021
067596d
improved documentation
mahmoudparsian Dec 31, 2021
cb4048c
improved documentation
mahmoudparsian Dec 31, 2021
c8ef9b9
improved documentation
mahmoudparsian Jan 1, 2022
f6747e9
DNABaseCountFastq
bimanmandal Jan 1, 2022
9f9353c
resolved merge conflict
bimanmandal Jan 1, 2022
d7a1116
added the code changes for chapter 2
bimanmandal Jan 7, 2022
2b59199
added the run_spark_applications_scripts
bimanmandal Jan 7, 2022
6cd297f
added the conditions for 1GB data
bimanmandal Jan 7, 2022
37b2eaf
added the readme file
bimanmandal Jan 7, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Indentation fix
  • Loading branch information
deepakmca05 committed Dec 27, 2021
commit e1ceb103f6538696787cd5069db398f364021a34
Original file line number Diff line number Diff line change
Expand Up @@ -18,30 +18,30 @@ import scala.sys.exit
*/
object DNABaseCountVER1 {

def processFASTARecord(fastaRecord:String) :Map[String,Int] = {
var keyValueList = Map[String,Int]()
if(fastaRecord.startsWith(">"))
def processFASTARecord(fastaRecord: String): Map[String, Int] = {
var keyValueList = Map[String, Int]()
if (fastaRecord.startsWith(">"))
keyValueList += ("z" -> 1)
else {
var chars = fastaRecord.toLowerCase
for(c <- chars)
for (c <- chars)
keyValueList += c.toString -> 1
}
return keyValueList
return keyValueList
}

def main(args: Array[String]) = {
if(args.length !=2) {
println("Usage:" + DNABaseCountVER1 + " <input-path> " )
if (args.length != 2) {
println("Usage:" + DNABaseCountVER1 + " <input-path> ")
exit(-1)
}
//create an instance of SparkSession object
val spark = SparkSession.builder().appName("DNABaseCountVER1").master("local[*]").getOrCreate()
println("spark initialised")
val inputPath = args(1)
println("inputPath :"+ inputPath)
println("inputPath :" + inputPath)
val recordsRDD = spark.sparkContext.textFile(inputPath)
println("recordsRDD.count() : "+ recordsRDD.count())
println("recordsRDD.count() : " + recordsRDD.count())
val recordsAsList = recordsRDD.collect()
print("recordsAsList : ", recordsAsList)
// if you do not have enough RAM, then do the following
Expand All @@ -51,7 +51,7 @@ object DNABaseCountVER1 {
val pairsRDD = recordsRDD.flatMap(processFASTARecord)
pairsRDD.collect.foreach(println)

val frequenciesRDD = pairsRDD.reduceByKey((x,y)=> (x+y))
val frequenciesRDD = pairsRDD.reduceByKey((x, y) => (x + y))
println("frequenciesRDD : debug")
val frequenciesAsList = frequenciesRDD.collect()
println("frequenciesAsList : " + frequenciesAsList.foreach(println))
Expand Down