-
Notifications
You must be signed in to change notification settings - Fork 0
QA Team Contributions:
During the startup process we analyzed the Spark UI and found that there were initially writes to the database before we started receiving data into the DAG.
We added a time out for the spark initialization and removed the unnecessary writes.
There were many small shuffles due to the order by
clauses in our SQL statements.
We determined that there was no added value in sorting the data, as the dashboard was already sorting on their end. After we removed the sorts, we improved the processing time of the data sets.
We began writing our transformations in SparkSQL Syntax because of the lack of a data stream initially. Once we produced our transformations of the data we needed to transition over to Structured Streaming.
The QA team created a Kafka consumer with some test data to produce a stream while the Devops team was setting up the Kafka Producer. This improved the time the feedback loop took to iterate over the results and allowed Spark team to keep development moving forward.
Data being fed into the database was being dropped intimately after it was written due to the read / write loop.
We created a window of state for the Dashboard team to hit with their database views.
Some of the data was corrupt in translation from csv to json. This broke our transformations because of an index out of bounds error.
We added a check to verify that the values were of array size greater than 1 and discarded the data set instead of adding it to the table. The code queries began streaming just as in our local environments without error.
The dashboard team could not parse the data with d3 because of the lack of unique IDs in the csv.
We added a column to the csv to act as the ID in the database, then refactored our queries to persist the new column to the database. A programmatic solution might be to append an id to each data set as it is getting streamed in.
Dashboard team was getting a Cross Origin Requests error with Spring Boot.
QA team provided their knowledge of the @CrossOrigin annotation to resolve the error. This allowed them to make request to the server.
Devops team was having trouble parsing the data into the Kafka stream.
One of the QA team members wrapped the csv data into json and implemented checks to deal with quotes, commas, slashes, and spaces.
Devops team was unable to initially broadcast the Kafka stream to the remote Kafka Consumer.
QA team already experienced this error because of the Dockerized Kafka consumer we set up prior. We implemented the changes to the server.properties file with the advertised listeners and the producer worked as expected.
Other contributions:
- Assisted teams with debugging.
- Cast the vision for initial architecture.
- Educated dashboard on Spring and D3.
- Wrote Unit tests for other modules.
- Refactored and Reviewed committed code
Revature Big Data Cohort @ 020413