CS410Assignments · dkrovi2 · Oct 20, 2021 · Nov 9, 2021 · Nov 10, 2021 · Nov 10, 2021
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,4 @@
+.idea/
+out
+build
+target
diff --git a/README.md b/README.md
@@ -1,3 +1,53 @@
-# CourseProject
+# Build Experience Profile from Resumes
 
-Please fork this repository and paste the github link of your fork on Microsoft CMT. Detailed instructions are on Coursera under Week 1: Course Project Overview/Week 9 Activities.
+
+1. **What are the names and NetIDs of all your team members? Who is the captain? The captain will have more administrative duties than team members.**
+
+    * alokk3@illinois.edu
+    * dkrovi2@illinois.edu
+    * jsaxena3@illinois.edu
+    * rathi9@illinois.edu
+
+2. **What is your free topic? Please give a detailed description. What is the task? Why is it important or interesting? What is your planned approach? What tools, systems or datasets are involved? What is the expected outcome? How are you going to evaluate your work?**
+
+  In this project, we use text extraction and retrieval for the following functions:
+
+      * Parse resumes in doc and pdf format
+      * Parse job descriptions in doc and pdf format
+      * Build an analysis engine to extract experience details of a candidate on various tools and technologies
+      * Rank the available set of resumes based on the skill set specified in the job description
+
+  The current keyword based search used by many online websites might not be entirely accurate, as the correlation between the skills and the experience is often missing. 
+
+  For example, for a skill set of ‘Spark’, instead of just searching for the keyword ‘Spark’ in the resume, we want to know  (for scoring purpose) 
+      - if the employee worked in Spark for X number of years, 
+      - did he have experience on Spark, in multiple organizations. 
+
+  We then create a score for each profile/resume based on the skill set mentioned in the query and rank them in order of score (highest to lowest). 
+
+3. **Which programming language do you plan to use?**
+
+    We will use the standard text retrieval tools and programming APIs (MeTA, python, numpy etc) with a customized algorithm to score each resume.
+
+4. **Please justify that the workload of your topic is at least 20 \* N hours, N being the total number of students in your team. You may list the main tasks to be completed, and the estimated time cost for each task.**
+
+    The following are the steps and key milestones for this project: 
+
+    | Task                                                          |  Time needed |             ETA |
+    |:--------------------------------------------------------------|-------------:|----------------:|
+    | Gather representative data set for training and evaluation    |      8 hours |          Nov  8 |
+    | Parsing engine to parse resumes and job descriptions          |     20 hours |          Nov 15 |
+    | Progress report                                               |      2 hours |          Nov 15 |
+    | Analysis engine to analyze resumes                            |     30 hours |          Nov 22 |
+    | Scoring engine to match resumes to provided job description   |     30 hours |          Nov 29 |
+    | Basic UI to search for resumes matching a job description     |     24 hours |          Dec  5 |
+    | Software documentation                                        |      8 hours |          Dec  9 |
+    | **Total**                                                     |**122 hours** |                 |
+
+
+# Contributors
+
+ * alokk3@illinois.edu
+ * dkrovi2@illinois.edu
+ * jsaxena3@illinois.edu
+ * rathi9@illinois.edu
diff --git a/code/parsing-engine/.gitattributes b/code/parsing-engine/.gitattributes
@@ -0,0 +1,6 @@
+#
+# https://help.github.com/articles/dealing-with-line-endings/
+#
+# These are explicitly windows files and should use crlf
+*.bat           text eol=crlf
+
diff --git a/code/parsing-engine/.gitignore b/code/parsing-engine/.gitignore
@@ -0,0 +1,5 @@
+# Ignore Gradle project-specific cache directory
+.gradle
+
+# Ignore Gradle build output directory
+build
diff --git a/code/parsing-engine/build.gradle b/code/parsing-engine/build.gradle
@@ -0,0 +1,66 @@
+/*
+ * This file was generated by the Gradle 'init' task.
+ *
+ * This generated file contains a sample Java application project to get you started.
+ * For more details take a look at the 'Building Java & JVM projects' chapter in the Gradle
+ * User Manual available at https://docs.gradle.org/6.8.2/userguide/building_java_projects.html
+ */
+
+plugins {
+    // Apply the application plugin to add support for building a CLI application in Java.
+    id 'application'
+    id 'com.github.johnrengelman.shadow' version '6.1.0'
+}
+
+repositories {
+    mavenCentral()
+}
+
+dependencies {
+
+    implementation 'org.apache.commons:commons-lang3:3.12.0'
+    implementation 'commons-lang:commons-lang:2.6'
+    implementation 'commons-io:commons-io:2.11.0'
+
+    // PDF Parsing
+    implementation 'org.apache.pdfbox:pdfbox:2.0.24'
+
+    // DOC Parsing
+    implementation group: 'org.apache.lucene', name: 'lucene-core', version: '8.1.0'
+    implementation group: 'org.apache.lucene', name: 'lucene-queryparser', version: '8.1.0'
+    implementation group: 'org.apache.lucene', name: 'lucene-analyzers-common', version: '8.1.0'
+    implementation group: 'com.googlecode.json-simple', name: 'json-simple', version: '1.1.1'
+
+
+    // Use JUnit Jupiter API for testing.
+    implementation 'edu.stanford.nlp:stanford-corenlp:4.3.1'
+    implementation 'edu.stanford.nlp:stanford-corenlp:4.3.1:models'
+    implementation 'com.fasterxml.jackson.core:jackson-databind:2.9.2'
+    implementation 'org.apache.poi:poi-ooxml:4.1.2'
+
+    // Use JUnit Jupiter Engine for testing.
+    testImplementation 'org.junit.jupiter:junit-jupiter-api:5.6.2'
+    testRuntimeOnly 'org.junit.jupiter:junit-jupiter-engine:5.6.2'
+
+    // Lombok.
+    compileOnly 'org.projectlombok:lombok:1.18.22'
+    annotationProcessor 'org.projectlombok:lombok:1.18.22'
+    testCompileOnly 'org.projectlombok:lombok:1.18.22'
+    testAnnotationProcessor 'org.projectlombok:lombok:1.18.22'
+
+    // Logback
+    implementation 'ch.qos.logback:logback-classic:1.2.7'
+}
+
+application {
+    // Define the main class for the application.
+    mainClass = 'edu.illinois.phantom.Main'
+}
+
+tasks.named('test') {
+    // Use junit platform for unit tests.
+    useJUnitPlatform()
+}
+
+mainClassName = 'edu.illinois.phantom.Main'
+build.dependsOn shadowJar
-Original file line number
+Diff line change
@@ -0,0 +1,4 @@
+    .idea/
+    out
+    build
+    target