About Me

- - - - - me -

I studied Computer Science and swam distance at Boston College. Currently, I make art and tools for artists, run/bike/swim, read and teach the LA County Fire Dept. Junior Lifeguard program.


By: Thomas Suarez, Elliot Mebane and Ryan Reede


Armature's AnimationController allows the system to break the fbx file into steppable frames. We simply play-back the current animation frame on the Mira Prism tracking pad. Future updates the project include finding a better tracking solution (these animations need to be extremely precise) and allowing the user to adjust the size and rig scale of the digital model to perfectly match the static physical one.


After the success of the McMullen Museum of Art Virtual Reality experience, Boston College Ireland commissioned me to build a similar experience for visitors to Dublin for the 2016 ACC kickoff football game between Georgia Tech and Boston College. To push myself to make the experience more impressive than the McMullen's, I decided to shoot and stitch 3D, 360 degree panoramas instead of opting for the same monoscopic flat panorama I had done previously.


From the start, I knew that working in 3D would be quite the challenge. Due to parallax points associated with stitching spheres, the geometry of rotating stereo viewpoints and numerous other issues, getting this entire project in working order would be no small task. Meanwhile (even as recently as 2015), 360 stereo for live video/photo capture was a widely unsolved problem. And those who did have it solved such as Facebook weren't about to release their IP. Aside from capture, bringing the two channels of images into the graphics engine would cause additional concerns. I found some solid research on the web and began to prioritize. As each piece began to come together, the project as a whole seemed more feasible.

The camera Rig


The camera Rig

- -

I started with the camera rig. Without the assets for the content, this project was obviously headed nowhere fast. As mentioned, 3D/360 camera rigs such as Vuze didn't exist when I was building this project so I had to create my own solution. I picked up a pair of Sony a6000 mirorrless camers with matching fisheye lenses as these would be the two stero viewpoints.

- -

Once I had the two cameras, I needed a way to connect them together by the hot shoe mount to stay together so that their lenses would be next to eachother. No consumer products for this sort of thing existed so I decided to 3D-print a part to match the specifications I needs. I measured the cameras down to the millimeter and learned how to build models in Autodesk 123D Design. Iterations of the plastic mount had to accompany for structural integrity of the part (the camera/lens combo is heavy) as well as IPD angles and distances. Shown are different iterations of the part (I used the serrating on the part's edges to get accurate measurements between tests). After countless broken parts and images that wouldn't stitch even remotely correctly, I finally found a solid working model to use as I shot the entirety of the panoramas.

Shooting and Stitching

- -

Shooting and Stitching

- - - -

Shooting with the cameras was a relatively painless part of the process. Given that the cameras are positioned upright during shooting, I had to take six shots around to get coverage for a full panorama. Given how wide the lenses were, the nadir and zenith were picked up well enough by the cameras without having to rotate the rig up or down at all. I was certain to use the shoot/delay function in camera so that for longer exposures, there would be no vibration to damage image quality.


Once all of the images were shot in the ten different locations around BC's campus, I brought all of the images into PTGui for stitching and batch processing. Using the viewpoint correction tool in PTGui, I was able to account decently well for the fact that each individually was slightly off-center during shooting and would lead to inheret stitching errors. Where the viewpoint correction didn't find mathcing features, I brought the entire .psd file PTGui creates into Photoshop and manually skewed and shifted images to stitch them properly. The last step in PTGui I needed to carry out was to convert the image spheres into a cubic (six square images) format for quick and easy processing in realtime graphics engines. When all was said and done here, I had these six squares for each of the individual panoramas I shot. Keeping files organized and seprate from their L/R side equivalents was vital as I now had 120 images to keep track of.  All that was left to do was bring them into a graphics engine to be able to bootstrap the panoramas along with the Oculus SDK for interactive stereo viewing.

Stereo Cubemaps for VR

- -

Stereo Cubemaps for VR


Implementing a 2D cubemap as I did for the Irish VR project is a cinch in Unity. You simply create a new material as a cubemap, drag each of the six squares into place and let the skybox take in this material. Doing this for 3D was significantly more difficult however as the Unity SDK for the Oculus Rift at the time made accessing different components for each eye in the camera prefab difficult. I was able to produce a working version in Unity using some clever tricks such as culling masks, but the image quality was noticibly worse than the Oculus' resolution and the program would drop frames from time to time. I began to look into alternatives and abandoned Unity for Unreal Engine upon finding this project on Github by opamp7. The Stereo Cubemap Importer would handle importing images and projecting them properly for each independent eye. I took a quick course on Lynda to understand the basics of Unreal then dove in using the Stereo Cubemap Importer.


To get my cubic formatted images into Unreal, I had to convert them into a format known as the Carmack Cubic Strip, A 12:1 rato rectangle where each of the twelve squares from left to right were  organized as follows: (left eye first) back horizontally flipped, front horizontally flipped, up rotated CCW 90º then horizontally flipped,  down rotated 90º then horizontally flipped, left horizontally flipped, right  horizontally flipped. The process then repeats for right eye leaving us with a Carmack Cubic strip for a scene.

- -

Clearly, just the process of making these strips is a tedious one, so wrote a Python scripts to build the strips for me. I made sure that all of my files were properly named and in the right directories, then I ran the following script on them to build a strip saving days of work in Photoshop.

- -

The Stereo Cubemap Importer handled a lot of the heavy lifting, but I did need to make some large adjustments to the project to get it to work with CV1 of the Oculus. Once I did however, just assigning the Xbox controller's buttons to next/prev to make cycling though the panoramas easy would complete the project. This only required a simple blueprint then the entire experience was ready for testing.


Testing this meant little more than bringing friends into the Rift, making sure each button work as needed and checking for gaps/poorly stitched sections of the panoramas. I eneded up needing to back into photoshop to make some tweaks so that L/R eye elements would line up better, but that was about it. Once all was said and done, I packed the machines up for Dublin, set it all up and got to watch BC Alumni jog their memory back and enjoy a virtual visit to their college glory days.

- - - - -
Follow Me Dragon is a virtual pet application I helped develop at VRC. It was built in Unreal Engine for ARKit-enabled iOS devices.

- -
For the Spring 2015 exhibit at the McMullen Museum of Art on campus at Boston College my research advisor Prof. Nugent asked me to look at innovative ways to bring digital humanitites into the mix. The art exhibit was centered around the Irish Arts and Crafts movement for the 100th anniversary of Irish Independence. I spitballed a few ideas, but the suggestion of using Virtual Reality to allow users to visist the areas around where the pieces of art they were looking at came from was the one we were most excited about. In the end, I ended up shooting stitching and building my first complete project in VR. I budgeted out costs for an Oculus DK2, a new PC and some rental camera equipment with which I travelled thoughout the Irish countryside twice shooting panoramas.

- classimage -

Next I stitched everything together in PTGui and Photoshop and brough each panorama into Unity as skyboxes in seprate scenes. I wrote a script to change the scene and wired up some buttons to the sides of the Oculus that sat in 3D printed housings to let users adjust what they were seeing. I soldered the buttons to a mini keyboard that was mounted to the front of the Oculus so that one button on the right of the headset would map to the character 'n' for next, and the other button mapped to 'p' for previous.

- classimage Control button sitting in a 3D printed mount. -


Although the scope of this project was technically simple, it was exciting to watch users inside of the experience– it was the first VR experience for nearly every user. Given that the VR installation sat inside of an art museum, the majority of the users were over the age of 60. The project didn't need to be done in 3D, or offer any gaze-based interactivity. The medium was so foreign to them, that just instructing them to move their heads once they put the headset on was enough. Regardless of how simple this expereince was however, it proved to me the power of Virtual Reality as a tool for storytelling.

- - - - -
- -


- classimage - - - -

- Joycestick Official Website -


- In the Associated Press -

- -
- -



Joycestick is reimagining James Joyce's Ulysses in Virtual Reality. The project is under active development by a class of nearly 30 students from Boston College, MIT, Northeastern and Berklee School of Music. As the teaching assistant of the course developing Joycestick, I had the plan and vision for this project over a year prior to the first time the class met. Once development began, I served as the Engineering Lead and shipped both the MVP and second version of Joycestick. The project is currently on tour, stopping at various Digital Humaities and Joyce conferences around the world including Rome, Dublin, Toronto and Singapore and will be coming to Steam this year.

Why Ulysses?

- -

Why Ulysses?


My research advisor at Boston College in Digital Humanities, Prof. Joe Nugent is a Joyce scholar, so we've always had an interest in bringing the intricate world of Ulysses to a medium that holds so much promise for telling complex stories. Furthermore, Ulysses is very much so about a story about experience and imagery– all aspects that make the story an excellent candidate to try in VR. The current direction that Joycestick is heading breaks the story up logically by chapter and represents each through objects and images. These are translated into scenes that vary in degree of interactivity, and as the player moves through the scenes, an aspect of the story is uncovered and they feel rewarded.

Developing Joycestick

- - -

Developing Joycestick

- mlab -
Working on Joycestick at "Reality, Virtually" Hackathon at the Media Lab at MIT.
- -


The dev team for Joycestick comes from a diverse background. We have experience building applications for mobile, web, analytics and even embeded systems. But of course, no one (with the slight exception of myself) had ever touched Unity or built a video game. It was clear we had our work cut out for us and building the early versions of the game relied on our flexibiity as team, and being unafraid to scrap our progress at any time in favor of a more scalable and extensible approach.


Early on, this happened a number of times. As we began to explore a more  object-oriented game architectures through design patterns, branching off in a new direction became essential.

- -
- -

CalypsoMgr.cs manages the scenes in Joycestick modeled after the Calypso chapter.

- - -
- - -

Interacting with JoyceObjects is the key to progressing in Joycestick.

- - - - - - -
- -

Coin.cs is an example of one JoyceObject

- - - -
- - - -

SoftFlicker.cs gives the gas lamps in the scene a realistic flicker.

- - -
- - - - - -
- - - - -

Lessons Learned


The proccess of  building an independent game (especially for a platform as new as Virtual Reality– we're developing for HTC Vive) is rarely a smooth one. Hiccups along the way are to be expected, and with Joycestick, we certainly hit a fair share of roadblocks.  Working with a team of students who are enrolled in multiple other [hard] classes in addition to balancing social lives, clubs and sports is inherently difficult. We managed to make it work by scheduling small deadlines, delegating tasks and staying on top of our Trello boards. That being said, as hard deadlines rolled around, little sleep was had.


But whether shipping an ambitious project in Virtual Reality with CS students with no work/game engine experience, or a feature for a well-known piece of software with an experienced team of developers, similar issues arise. Even in its MVP state, I was extremely proud of the work that I saw done on Joycestick and look forward to what work continues to be done with it. In the meantime, I will be working on porting a lite version of Joycestick to mobile for Google Daydream.


Read more about Joycestick here.

Machine Learning and AI

In the Spring semester of my sophomore year at Boston College I took an Information Systems course with Prof. Ransbotham called Analytics and Business Intelligence. - At the time, I was apprehensive about my studies. I didn't have a major, and my creative pursuits felt worthless as a kid at a university driven by its business school– - one of the best in the nation for undergrads. I was the odd one out in a course - focused on the managerial implications of data science full of MBAs and business students. And while I could - care less about the case studies, I became fascinated with the math, engineering and creative possibilities of the tech being discussed. -


We used Tableau, R and Python to build models for our final project in which we anayzed a - subset of the Yelp Academic dataset. I grew close with Prof. Ransbotham while working on the project, and at the end of the semester, - he strongly recomended that I study computer science (to which I obliged-- my GPA never recovered).

As my understanding of computers and how to program them was reaching an 'over-the-hump' feeling, I was only a few months from graduating. As a capstone of my CS education, - I decided to take Machine Learning (CSCI 3345 Fall '16) with Prof. Alvarez where I would be forced to leave my comfort zone in math and CS. - That experience took my understanding of the theory, methods and applications of artifical intelligence to a whole new level. - The cummulation of my coursework here yielded a project that took a look at the effectivness of Recurrent Neural Networks for - text generation compared to a Markov Model. The textbook is among the favorites in my library.

- - - - - - - - - - -

Finding Similar Images with LS-Hash

Given is the task for which this program was developed:

  • You have to develop an algorithm that given a query image finds the “closest” entries to it on a dataset of images
  • - The solution will be evaluated based on how its run-time scales with the size of the input dataset, the number of input queries and the value of K. It will also be evaluated on how its memory usage scale with the input. -
  • - Finally, your solution will also be evaluated based on its accuracy to solve a classification. Your solution will also be evaluated based on its accuracy to solve a classification task. More specifically, we will run your final code on a folder with a dataset of images and another folder with a several query images. The images will be 2D arrays of 1s and 0s. We will run your code on these input folders for a value of K = 9. These images fall into several category. We know the categories in which each image falls but you do not. We will read the output file produced by your program where you stored the K files closest to each query. We will read it line by line. If the category that most frequently appears on these K files is the same category as the input file, then you get one point. Otherwise you get zero points. This will produce a total score. We will then run your code, just like described above, also for K = 1, 3, 5, 7. Your final accuracy score will the highest score you get among the values of K tested. -
  • -

To solve this, we took a hybrid approach incorporating elements of Machine Learning and Computer Vision in which we extracted features from the images, and 'clustered' them using Locality-Sensitive Hashing. On GitHub here.


Deciding on an Approach

While getting more familiar with different aspects of this project, we debated many different solutions, before landing on our final implementation. The first of which included comparing every query image to every database image, and maintaining a list of each comparison along with the score associated with that comparison. We planned to do a number of operations to the database image, such as rotations and horizontal and vertical flips, in each comparison to account for potential changes to the image data.


After abandoning this idea because it would require too many passes on the data and would not scale well for multiple query images, we moved more towards a computer vision approach, which relied on extracting important features from the image. After deciding on feature extraction, we debated how to plot each image in the feature space and look for nearby neighbors. We looked at methods from Machine Learning to best understand how to do this, and a naive implementation of the k-Nearest Neighbors algorithm immediately come to mind. k-NN works very well in lower dimensional spaces and when time and scare are not the most immediate concern. Unfortunately for us, time was of the essence and we needed to find an algorithm that could scale and perform quicker than O(n*d). Even though we were getting good results from just a few simple features when we tried k-NN, we decided to search for a faster alternative that would scale efficiently.


Through more research, we found two new potential solutions to our problem: a k-d tree and a locality-sensitive hash (LSH). We began to work with both and found that although neither would be retrieving results as accurately as the naive k-NN, we could still perform quite well and much quicker. The ultimate decision then came down to a matter of speed, and even though the performance of the k-d tree and LSH seemed about the same (empirically) on simple features, we went with the LSH method– capable of indexing and querying in higher dimensions with O(1) retrieval– because we were not sure how large our feature vector would become.

Preprocessing the Images


We took a number steps to preprocess the images before generating a feature vector so that the features extracted would be relevant and normalized. First, we de-noised the image to remove small groups of pixels that were surrounded by the opposite value: For each pixel p in the image, we find a 4x4 matrix of pixels from the original image data such that this 4x4's upper left-hand pixel is p. When near an edge or corner, the data used to generate this mini-matrix would wrap around the edges of the original image. Once each 4x4 was extracted, we would check the outside 12 pixels, verify that they were all the same color, and if so, force that the inner four pixels too became this color. Our algorithm also counted the number of instances where the outside 12 pixels were either all 0, or all 1 and after going through the entire image, the most prominent of the two counts would tell us with high likelihood what the background color of the image was. Below is the code for the denoise operation.




Following the denoise function we rotated the image around a major axis.  The major axis was found by placing random lines through the image, and seeing which line hit the most points in the shape. After finding this line, we calculated the angle between this line and the X axis, using the inverse tangent. As long as there was a significant rotation that needed to be made (more than 5 degrees and less and 85 degrees) then the array was rotated using scipy.ndimage.interpolation. We had implemented a deskew function, but since this is only really important in MNIST images, we decided to just use the rotation function because it would work on all different types of shapes, and not just hand-drawn digits.


The final preprocessing step we took was the creation of the bounding box, which would start from the top-left and bottom-right corners of the image and work its way diagonally checking each column and row of image for anything other than the background color. If it finds this, then it stops searching in that direction and return the appropriate dimension from that side. Once the searchers return, the image is then resized to the new dimensions plus padding to make sure we have a little room to work with. The bounding box helped give a better idea of the dimensions of the shape, which was used later. This meant that the dimensions of a longer skinnier shape, like a 1 would be very different from a wider more square shape, like the image of a 5.

Feature Extraction


Now that we were working with clean and normalized images, we moved to feature extraction to generate the feature vector for each image. The first features we calculate deal with pixel counts: foreground pixels, horizontal symmetry, vertical symmetry, number of shape pixels in the right half, and number of shape pixels in the left half of the image. All of these features were extracted on two sweeps which went through half of the pixels in the image, which equates to one sweep through the entire image. Each of these feature values were saved as values in range [0.0, 1.0], in order to normalize the entire feature vector, but also to normalize the values across images that might be different sizes after the bounding box function. We decided on these features because they can tell a lot about the area that a shape encompasses.


Next we also used the FAST corner detection to find the number of corners and the position of the corners in the shape. We then used the position of the corners to find the centroid - since we could not approximate the number of corners each image had, we needed a generalized way to utilize corners and make sure our feature vector was the same standard length. We had implemented a Harris Corner Detection algorithm to find the number of corners, but decided to actually use the FAST implementation because it was much faster and more accurate than our own implementation. Below are a number of methods we used to generate a feature vector for each image.

- -

Locality-Sensitive Hashing (LSH)


After feature extraction, we then used a pure python implementation of LSH to hash on the feature vector for each image. Where as a typical hash function will for instance generate very different values to the two function calls naiveHashMap.put([.90, .82, .12]) and naiveHashMap.put([.92, .82, .11]), a locality-sensitve hash would place these two vectors into very nearby buckets which makes it a valuable structure for Machine Learning-type problems. We tried many different combination of features, as well as different scoring techniques, but decided that normalizing the features to all be a percentage (less than 1) gave us the most accurate results. After hashing all of the database images, and getting the feature vector for the query image, we used LSH.query([query image feature vector], k) method which returns an array of feature vectors the size k (if there are at least k neighbors).


While parsing the database images, we had saved both the images file name, and the images feature vector into two separate lists. After getting all of the feature vectors for the k nearest neighbors, we used this map to find the index of this feature vector, and therefore used this index to find the actual image file name.

In some extreme cases, the LSH query function returned less than k neighbors. In order to get around this and find exactly k neighbors, we then used whatever neighbors it did find, and found their closest neighbors. In the extremely rare case that a query image returned no neighbors, we decided to just print this query image k times. We decided this was the best course of action because there were no other neighbors to look at, and we really did not want to implement a slower algorithm which could give us the correct number of neighbors.


Generalization & Runtimes


While we were working with the MNIST data set for some time, we really tried to focus our approach on a scalable implementation, that could work for other data sets as well. That is why we took so much time to rotate the image, and find the bounding box, and use percentages throughout the entire feature vector. It means that our data was much more normalized and could find similar images that were simple scalers of one another, or that were rotations of one another. That is also why we decided to use the LSH function, because we decided that it was more important to scale time-wise, than it was to get the exact nearest neighbors.

Empirical Runtime: - -

  • - 14.46s - Average Preprocessing time for 1000 db images -
  • -
  • - 16.5676s - Average NN search time for 1000 db images and 1000 query images -
  • -


Before finding the NN for each query image, we preprocesses the image then run neighbors. This means that, in effect, the NN search time also includes the time it takes to preprocess each of the query images. Going by the average preprocessing time, this means that query time only took approximately 2.1076s for each of the query images. This means that our LSH will efficiently scale for even larger data sets.


This project was done by Danielle Nash, Ryan Reede and Drew Hoo.

Code Contributions

Simpsons Text Generation

- -

For our Machine Learning (CSCI 3345, Fall 2016) final project with Prof. Alvarez, we were given the opportunity to work with one of the various ML approaches we studied over the course of the semester in a practical manner. The problem we decided to tackle was text generation. In particular, we wanted to see how well a machine could write more Simpsons scripts based on 600 episodes of text as training data.  We decided to use a Long Short-term Memory Recurrent Neural Network for the procedural text generation, and due to the convincing argument put fourth by this article, we also compared our results with text generated using a simpler method: A Markov Chain.


Hosted on Github here.



Procedural natural language generation is an interesting and novel application of modern machine learning techniques. The ability to generate natural human language has a wide range of applications including software user interfaces that offer dynamic text, natural language summaries of live data, and even the creation of entertainment and comedy media. However, creating text that feels and sounds like natural spoken language is a challenging task. Identifying and understanding the characteristics of existing natural language samples can allow us to leverage these rules in the creation of new language samples.


The aim of this project was to explore some of the available procedural techniques for generating natural language. These modern procedure based approaches extract features from text samples in order to create a To accomplish this, we wanted to acquire a large data set of language that has many strong identifiers and a specific style so that we could reliably extract and apply characteristics of the text.


With this is mind we acquired a data set of all dialogue from the long running television show, the Simpsons. the Simpsons has been running new 30-minute episodes since 1989 and has aired of 605 episodes. the Simpsons data set was large, uniquely-styled, and varied enough in topic to serve as a viable and novel learning target. Using this data we were able to experiment with and analyze two different language generation approaches, implementing Markov chain and recurrent neural network models to create new text samples in the style of the Simpsons.

- -
Technical Introduction


Natural Language Generation (NLG) typically refers to the practice of taking raw data and transforming it into a human-familiar natural format. Natural Language Generation has the business motivation of helping employees more quickly understand data insights as they change day to day (Glascott). NLG differs from other procedural generation tasks in that it typically has a root piece of information the algorithm is communicating, and builds language around it, where the goal of many procedural generation algorithms is to create an entirely new piece of content. However, NLG problems do introduce many challenging tasks around creating syntactically correct sentences and phrasings, and need to be properly tailored and trained to match the complex syntax of human language.


- Procedural generation of natural language can be broken into two main categories. Character level generation creates sequences of text character by character. This method uniquely can introduce misspelled words since it is building language from the characters up. Word level generation uses only words from sample text or a dictionary. The comparison of these two models will be analyzed later in this paper.

In this project our goal was to apply and compare two methods of Procedural Content Generation to the task of generating new and unique natural language samples. Markov Chains are considered one of the simpler approaches to procedural content generation and can be used as a benchmark for other methods, as they construct a large probability function which characterizes sequences in media in order to recreate samples (Rowe). The concept of modeling language as a Markov Chain traces its roots to a 1948 paper written by Claude Shannon which proposed a probabilistic model for describing language (Shannon 1948). Modernly, approaches inspired by this original model have gained increasing popularity in the creation of entertainment-oriented procedural text generators, including many that have built Markov Models from the works of Shakespeare to generate Shakespeare-esque language samples (Smith).

A second method, Recurrent Neural Networks, has gained recent attention for procedurally generating natural language and other styled media. RNN based character level approach was shown to produce realistic results when trained on sets of similar text samples from Wikipedia, and the New York Times (Sutskever et al 2011). Similar to the work of Markov Models, RNNs have also been applied to the novel task of producing Additional refinements to the model have also been explored, including an approach we used here, the use of Long Short-term Memory RNN for sequence generation. LSTM RNNs have been effective in Procedural Content Generation, as shown by one paper which demonstrated their ability to generate new samples of handwriting when trained on a set of human written samples (Graves 2014). -



There are many challenges involved in creating and running new implementations of these methods of generating text. Two of the most notable being finding a balance between over and underfitting, and managing long running times. It was often difficult to tell when a sample had overfit to the input. Since our goal was to create coherent phrases and sentences it often seemed like a good thing when we were able to achieve an output with a full line or two of sensible dialogue, but we found it was often the result of overfitting and the lines had been pulled verbatim from the sample input. Finding the balance between the methods that would overfit and pull entire conversations from the training samples, and underperforming models that introduced too much diversity and produced gibberish outputs was a constant challenge. -


Additionally, long training time on our RNN models proved to be a challenge that we had to strategize around. Ideally we would have been able to test multiple full runs on our entire data set, but this was not feasible in the time frame, or with the resources at hand. For this reason we often tested the effects of parameter adjustments on smaller subsets of our data for runs that would take hours instead of days. This challenge was particularly well highlighted by Sutskever et. al who trained their RNN for five days on 8 high-end Graphics Processing Units to achieve their strong results. Picking and choosing which runs and test to perform in order learn the most about our model and the effects of its parameters was another unique challenge to this problem. -




As evidenced by our preliminary research into various text generation models, the Recurrent Neural Network (RNN) and Markov Chain (MC) arose as two models suitable for comparison in this project. And while we will go into the specifics of each, we will first talk about the Dataset that remained constant throughout the project. Using Todd Schneider’s Rails app, from his github repo film-springfield, we were able to successfully crawl simpsonsworld.com which hosts every Simpsons episode ever aired, along with the script for that episode. The reason for the rails app is access to the Nokogiri Library. Nokogiri is an HTML, XML, SAX, and Reader parser. I has features like the ability to search documents via XPath or CSS3 selectors, which are well suited for this specific task of parsing through the complicated formatting of the Simpsonsworld.com website. The rails app performed all of the heavy lifting by grabbing the relevant script data (i.e.: scene location, time of day, speaker) from simpsons world, wikipedia, and imdb and utilizing a PostgreSQL database to store the information in a well organized and labeled relational database. What we were left with in our CSV file we exported from postgres were rows that corresponded to individual lines from the script. The columns of the data we utilized varies slightly between methods (MC vs RNN) but the major ones were unsurprisingly character (who's speaking the line if it is a spoken line) and text (what was spoken and any sentiment directions for the voice actors in parenthesis). We also had access to the scene ID within each episode so that we can determine when a conversation exchange ended. From here we were able to script some text parsing in Python to generate plain text files of various lengths for training that we could feed into the RNN and MC, since all the RNN needed was raw script data. -

Conceptually, the Markov Chain is fairly simple. It builds a matrix of probabilities between states in a state machine. Building this matrix based on the training data requires iteration through the entire text, and for each n-character long sequence, we add to a hashmap this sequence and the single character that follows it in the text. Once all sequences have been fed into this map, we can calculate probabilities of what character should come next given a sequence already seen as input. In essence, the hashmap contains a probability distribution function (PDF) for each sequence it has been fed. Once these probabilities have been calculated from the training data, we can use the Markov Chain in a generative sense. We feed it a key from the hashmap and using random values, the next character in the string we are building will be selected from the PDF. -


Ultimately, we implemented two versions of the Markov Chain. The first implementation was a character-based model that created chains of a given 'order.' An order three model of this design, for example, would create a PDF as described above for every sequence of three characters and the likelihood that a given single character would follow that sequence. During generation, the model would be given a seed of at least the order length and generate a character based on the stored probabilities. The generator cursor would then shift one character to the next group of characters of the order length (including the character just generated) and make another probabilistic choice, terminating after generating a predetermined length of text.

The order of the model served as a measure of model complexity– lower order models would often underfit the data and produce nonwords or otherwise incomprehensible text while higher order models would overfit the data and copy entire phrases or even lines from the source text. While we were able to reach an acceptable balance between generating mostly comprehensible and original text with orders of n = 4 and 5, we decided to try an additional approach to improve results and implemented a word-based model.

This new model split the text based on whitespace and assigns to each unique word a number of probabilities of what may follow. The word-based model has a few distinct advantages over the character-based: misspellings and nonwords do not appear (unless they appear in the source text) and the generated text is often more comprehensible due to the guarantee that each generated word (rather than single character) follows its preceding word under some existing grammatical structure in the source text. The word-based model could certainly lend itself to overfitting for example, should a unique word only appear once in the source text but with enough data we found those issues to be significantly reduced.

The word-based model in its final form thus created a PDF for every word in the source text and the probabilities of the existing words that follow. Trailing punctuation was included as part of a word, such that "car" and "car," would be considered different words. This tweak preserved punctuation in the output text, which was generated by taking a seed word and inserting an empty space followed by a probabilistically chosen following word.


Our Recurrent Neural Network implementation uses only NumPy and Keras on top of a Theano backend, using prior work by Keras creator Francois Chollet and Jason Brownlee from machinelearningmastery.com as a starting point. For the word-level model, we read in the raw simpsons script data and format it into a list of words and characters in the order that they appear in the text. We do this using a simple regex in order to select only the features from the data that we want our model to consider. We omit uncommon symbols due to the noise that they add to the data, and omit spaces in consideration of the size-related problems we faced during training, seeing that spaces add a considerable amount of size to the data. With these features filtered out of the data, and the data now in list form, we then create two dictionaries allowing us to translate between word IDs and the words that they represent. The word IDs allow the words to be represented numerically in the RNN model, and the mapping between words and their IDs allows the model's output to be interpreted by humans.

The data is represented as a 3D matrix, with dimensions of the number of sequences we train on, the size of each sequence, and the number of words contained in the data. This one-hot representation of the data was severely space-inefficient for the word model because of the large number of words contained in the data, which restricted the amount of data we were able to train on for this model. This space inefficiency inspired our implementation of a similar character-level model that predicted sequences of characters instead of words. The character model's representation was the same as that of the word model, with the only noteworthy difference being that its dimensionality was far smaller due to the number of available characters being less than the number of available words. This change allowed the character model to be trained on a considerably larger number of sequences than the word model.

The model itself had two hidden LSTM layers with 512 nodes each. The model applies dropout for regularization, setting 40% of weights equal to zero at each update during training. Last, the output layer is a fully-connected dense layer with a softmax activation function, having one node per character, where the value of each node is the likelihood of that being the next item in the sequence.

We compile this model and train it for several epochs. The number of epochs was 50 for the parameter tuning stage, and 100+ when trying to get the best model possible after adding more data. Using 20% of the data as test data, and training on batch sizes of 64 sequences, we collect training and test scores from each epoch during training and let the model run to completion. To generate predictions, we find the best model, load its weights, and give the model a random seed from the training data. Taking this seed as a starting point, we then iteratively predict what word will come next in the sequence, generating original text.

Parameters that we played with to observe different results included nodes per layer, total LSTM layers, and sequence lengths of words, or varied window sizes of the input. We discovered that regardless however many nodes we used, the training data loss always tended towards 0. However, increasing nodes had different results for test data. At 128 nodes, test data showed a higher starting point for loss but a very shallow slope of increasing loss as we trained through epochs. For 256 and 512 nodes, the test data's loss showed a rise and then drop initially, and then a steady incline in loss starting at around epoch 10. However, 512 nodes gave a lower loss starting point. The results from tuning the number of nodes held consistent with what we learned this semester - adding more nodes increases complexity, meaning higher variance and a strong likelihood of overfitting. We can see the results of overfitting in the graph, where the test data loss diverges significantly from the training data loss at epoch 10.

We also experimented with the number of layers that our LSTM used. With 1 layer, we saw again a steady decrease in training data loss and a steady increase in test data loss. The gradual decrease in training data loss implies a lack of sensitivity of the model which results in high bias. Thus, the algorithm was likely overgeneralizing and missing the relevant relations between features and target outputs, resulting in a consistently high error. At three and four layers, we see a sharp dropoff in both graphs - indicative of over approximation in the result, and thus high error rate. Both graphs affirm this intuition, as we see that the test data loss jumps at the point where the training data loss drops off suddenly. At layer 2, the training data loss is significantly less smooth than layer 1, but we see a match between the training data loss and the test data loss at around epoch 10, where there is a dropoff in both losses and thus an early positive sign that our model was learning correctly. However, as more epochs pass, the training data loss and test data loss again begin to diverge, possibly again due to overfitting.

High bias can cause an algorithm to miss the relevant relations between features and target outputs (underfitting). In general, it made sense to use a moderate value of every parameter for the value - to maintain a moderate level of complexity with 512 nodes and 2 layers. This was representative of what we learned in class - to achieve a balance within approximation and generalization by leaning towards a generally simpler model.


We also played with the amount of data that the neural net was trained on. The first character based neural net was trained on significantly more data, and the graph shows consistent results with our expectations. Since there is a relatively low finite number of characters used in the Simpsons scripts, more data meant better loss.




While we conducted our comparisons and contrasts between the Markov Chain and the Recurrent Neural Network, what we found was that both had their triumphs and shortcomings. To solve the problem of naive text generation to mimic style and structure, the straightforward Markov Chain yielded viable results. Given enough training data and space complexity to store in memory all the combinations it may run into in a large training text, the Markov Chain was able to reliably produce results that avoided copying dialogue verbatim, but also produced coherent phrases.

- - - -

That being said, the Recurrent Neural Network holds significantly more promise for generating more original, authentic text based on training data. Along with the increased learning curve associated with understanding the way in which RNNs operate, our output was typically more robust when training went well. For instance, the ability for the RNN to pick up the minute details in context is something a Markov Chain would never be able to figure out.

Using our rewritten Character LSTM, we were able to train on the most data of all of our models. The results picked up the most context, and by varying the “diversity” a term that was used by Keras creator Francois Chollet, we managed to get a range of very different results per the weight results of each epoch. The effect of this “diversity” is that it, in a sense,  pre-normalizes the log of the probabilities generated by the lstm, before being re-exponentiated and normalized as usual.  In the end, though spelling errors were about, it picked up the most on the injection of stage direction and character emotions that most scripts use affect an actor’s tone. The result below was generated on an 800kb excerpt of the full scripts with a length of 818245 characters which 69 are unique. This was trained on the same model we have been using for our word model. The difference is how the texts were preprocessed and the max length of the sliding window used to read sequences into the input matrix. As you can see, the net completes the missing characters at the end of the seed string, and then immediately tries to establish a scene and then dialog, something any normal script will do. The results here are promising but after 47 epochs and loss: 0.6263 - val_loss: 1.8713, it still indicates more data will result in better context and text generation.

While the RNN models did not perform as well as we had initially expected, they did show clear promise in understanding temporal structures in sequential data, understanding contextual elements in text sequences in a way that only neural models can. Given more time, we would attempt to train similar models with larger amounts of data, distribute them across multiple GPUs for faster training, trained multiple models concurrently for more efficient parameter tuning, and tested deeper networks for larger amounts of epochs. While deep neural networks have had recent breakthroughs in the domains of image classification, machine translation, and many other areas, sequence generation is still a relatively new application of these models. While this project shows that there is some promise in this area, there is clearly room to grow in the field of neural natural language generation.


This project was done by Drew Hoo, Ryan Reede, Charles Fox, Emily Lu, James LeDoux and Cameron Lunt.

Streaming Data

For Big Data Research Day 2016 at Boston College, I built a demonstration to breakdown how smartphone data is sorganized, streamed and processed in realtime for apllications such as mobile VR. Here is the resulting paper from this research. On Github here.


Smartphones and tablets have incredible hardware built into their systems, but more often than not, the software built for the devices doesn’t tap into it. This project seeks to explore the methods in which massive amount of data points from a mobile sensor can be reliably and quickly reported to a remote server. Thus, with our software we set out to help the layman understand the practical applications of accessing smartphone data streams by reliably transmitting it from a mobile device to a processing service. From mobile Virtual and Augmented Reality, UI/UX research, gaming, navigation and more the applications for transmitting information from mobile devices to servers are endless. We wanted to be able to process and visualize the data coming off the device wirelessly. From the get-go, a number of problems became very clear to us.


Understanding the Data


Of the four V’s that describe the difficulty in managing and working with ‘Big Data’ (volume, variety, veracity and velocity), velocity was the biggest hurdle for us to overcome. The Android tablet we were working with was designed for SLAM applications thanks to its extremely precise sensors and a 2.3Ghz, quad-core mobile processor. As a byproduct of such powerful hardware, our model was capable of producing a ton of data in very little time. In addition to the speed in which the data was being offloaded from the tablet, we needed to find a reliable way to transport the data stream object on the android device to a central server where it can be processed and/or visualized such that a client with limited knowledge of data streams, linear transformations and euler coordinates could understand what the data coming off the tablet represented. Creating a proper visualization of the 3D axis, and transformations became the final major issue for us to tackle given the mentioned properties of our data.


Although the Project Tango device is capable of producing nearly 250,000 data points per second, we understood the complexity associated with processing a live data streams. Additionally, we didn’t want to clog our project with any extraneous data that wouldn’t be vital to our end goal of helping a non-technical person understand the practical applications of such sensor data. Our work focused on the Rotational data that came in in 4 parts (quaternion) from the tablet. More info on a quaternion (from cprogramming):

A quaternion represents two things. It has an x, y, and z component, which represents the axis about which a rotation will occur. It also has a w component, which represents the amount of rotation which will occur about this axis. In short, a vector, and a float. With these four numbers, it is possible to build a matrix which will represent all the rotations perfectly, with no chance of gimbal lock.

Programming Paradigm

There were multiple layers of communication involved in reliably transmitting the sensor data from the tablet to the processing server. Due to the nature of Kafka, and the size of the Kafka library, mobile devices are not recommended to be used as Kafka producers. Kafka is intended to permeate and manage messages, but not necessarily be the the communication agent between devices. For that reason all of the sensor readings were sent through a socket from the mobile device to a host Java instance on the remote server. One the host Java instance received the sensor reading it writes the message to a Kafka topic. By writing to the Kafka topic as soon as the sensor data reaches the server we ensure that no messages get lost, and that we are able to create a robust stream.


The Kafka consumer is a Jetty server process running on the same remote server. The Jetty server consumes Kafka messages from the the sensor data topic and relays those messages to the frontend javascript instance through a Websocket. Kafka consumers cannot be implemented in frontend javascript, so this paradigm must be used. Once the javascript instance, running in an observer’s browser receives the sensor data it uses it to update a 3D graphic on the screen. Using the producer consumer paradigm allows us to have a very generalizable solution for multiple streams and the system could easily be extrapolated to work to receive multiple different types of data from the tablet. Using a queue based paradigm allows for elasticity in the consumption of messages; if the end of the line of communication slows down or pauses the system is safe due to Kafka’s permeation of data. The durability of the system ensures no messages get lost and that the system is pause tolerant. Below is the Server class that sends Kafka KeyedMessages through the socket.

Analysis and Conclusion


Although our project did not leverage machine learning algorithms or make predictions, it was still an extremely valuable exercise for a number of reasons. Primarily, we became much more proficient in working with data streams, streaming objects and how they should be dealt with when communicating between servers and layers. In many enterprise instances of Apache Spark, a live data stream is coming in; rarely a backlog of data in a clean .csv file. Likewise, we familiarized ourselves with Kafka, an industry standard tool for breaking data streams into individually analyzable and fault tolerant chunks. Our animation ended up inducing a noticeable amount of lag time, but this too taught us a valuable lesson: Kafka is not optimized for realtime data visualization, and that the flow of data through a system should occur through as few layers as possible. The point in our system architecture where the data is sent to the browser is the most appropriate place to add an instance of Spark into the mix as the data queuing from Kafka was already set in place. Additionally, the animation smoothness would benefit from comming directly from the tablet instead of the Kafka channel


Project by Ryan Reede and Cam Lunt.

Yelp Academic Dataset

For Business Intelligence and Analytics (Fall '14) with Prof. Ransbotham, our semester-long group project was tasked with analyzing and making data-based predictions on the Yelp Academic dataset. Below is the portion of the project I worked on that decided how we would implement our own rankings feature for restaurants and attractions.


(disclaimer: I knew nothing about Machine Learning at the time)


Good or Popular?


We needed to determine how to factor both the popularity of a restaurant along with its average review to find the [quantitatively speaking] best food in Austin. With 5 possible stars (in 0.5 steps increments) to judge an overall dining experience using a simple upvote formula such as those found on Instagram or Reddit would not work well. Although heavily generalized, the idea there is to take a post-view count and divide it by the number of upvotes to determine how good a post is, but this only works with a binary like or non-like voting system. We have 5 (or 10 really due to 0.5 incrementing) stars to use to determine how ‘good’ a restaurant is so a search on the web to find some sort of weighted rank to factor this in ended up turning in some great answers.



On Math Stack Exchange an answer showed how to use the Bayesian Approach to determine this sort of weighted rank. We applied this formula to our Yelp data as it was for testing, but then applied some tweaks to the weights to make the results more reasonable for our data. For instance, in one test we did, the popularity of the restaurant did not get enough recognition as 5.00 star average restaurants with under 5 reviews were ranked higher than 4.5 star average restaurants with over 50 reviews. Even after some heavy modification, we struggled to find a system that generalized well for all of our data. It was still clear to us however, that using this rank-based aggregate score based on multiple factors was the right way to approach this. Using R, we computed this data and adjuseted the dataframe to store these new values. Once we had this value computed, we looked for ways to visualize this all in a meaningful way.


Data Visualization


Since we were working with data that so heavily dependent on location, finding a way to present our data with mapping in mind was key. We needed a simple solution to be able to let the valuable lat./long. data show how location can affect rankings. Tableau wound up being the tool we used to get around this. Tableau was incredibly intuitive, and managed to recognize our input .csv file as geographic and defaulted to a map view. To make this visualiation even more telling, we adjusted the parameters of the map to get both the scale of the points and the color temperature of the points to reflect our data properly. The scale ended up showing how many people had reviewed the location and the darker the red of the point was a higher score on the Bayesian weighted rank scale.


The first pieces software I got to know well were non-linear editing systems. After my - Dad had shown me the student films he had produced as a kid, I was eager to start producing content myself. - Using iMovie, then Final Cut Express, I began to understand the world of digital video, video processing and compression.

- I started with stop-motion animation at my desk: -



While in college, I produced Mod of Cards which was a 6-episode web series based on Netflix's House of Cards. - Here is some of the other video work I've done recently which includes motion graphics and timelapse work. -

Motion Graphics, Timelapse etc.




Timelapse I shot in December of 2015 in Hong Kong and Dubai using a Sony A7r. Post was doen in After Effects, Premiere and Illustrator.

HK x Dubai from Ryan Reede on Vimeo.

Motion Graphics


- Kinetic typography I did in 2015 based on House of Cards. The text was put together in Illustrator and the individual elements were animated in After Effects. -

Frank Underwood: Kinetic Typography from Ryan Reede on Vimeo.

Mod of Cards

During my Summer 2014 internship at Sync On Set (they just won an Emmy!), I got to know Derek Switaj, the other intern also from Boston College. As it would turn out, Derek was an aspiring writer and he came to me at the end of the Summer with a pilot of a series he was looking to bring to life. I was skeptical at the time because I know as well as anyone in Hollywood that it takes more than just a script and a camera to bring something incredible to life. The premise was House of Cards, but taking place on a college campus. I liked it a lot, but doing a full eight-episode series was not how I envisioned spending my junior year of college. Come the start of the school year, I told my friend Max Prio (we ran a production company on campus at the time) and he felt the same way I did. It was clear that Derek had the fire in him to make this happen, but it would simply take too much time when we had a backlog of paid projects to get cracking on as our production company was finally gaining traction. Initially the plan was this: We’d help Derek make a legit pilot, teach him everything we knew about the technical stuff (cinematography, audio, post etc.) and then he could use that knowledge and the pilot as a bargaining chip to bring in a new crew from the film department.

- classimage Myself, Max and Derek. -

This is Actually Happening

- - -

- - - -

- - -

- The pilot was to be about 30 minutes long and had around 20 scenes in it. To get it shot and cut before the semester break deadline we set for ourselves was a tall order but we managed to get it done by shooting around 4x per week and cuting scenes together in our spare time. It was exhausting but a total blast. We released the first episode during finals and expected that to be it. Derek would find a new crew to take it over and [somehow] get five more episodes produced. -

Mod of Cards- Episode 1 from Mod of Cards on Vimeo.


The pilot was good but not great. Somehow it still managed to get seen over 10,000 times in just a few months and Max and I realized that without us behind the camera this project was about to die off. Without ever stating it explicity, Derek knew all along that we wouldn't just do the pilot and bail. The cast was great, the crew was loyal and we were working on something awesome that never had been done before. Max and I dropped whatever paid work we had and focused the rest of our junior year working to bring  a six-episode Mod of Cards to life– making certain that each new episode was more technically sound and artistically ambitious than what came before.


The rest was history

There's not much more to say about what followed. The core crew of about five people spent about 30-50 hours per week from October 2014 to June 2015 making Mod of Cards. My day became pretty routine. Swim practice, class then shoot and edit. I was a ballboy for the Celtics still somehow. The year flew by and in the end we had produced the first ever drama series on a college campus. I've been fortunate enough to work on some pretty cool projects at Boston College like Joycestick, but nothing will ever compare to Mod of Cards. The process of seeing anything– software, a movie, a TV show or even business– come together is thing of beauty. Especially when everyone you're working with is as passionate about the product of the work as you are.


The late nights that became early mornings on the set of Mod of Cards are among my favorite memories of college. Spring semester 2015 I wound up with a 1.9 GPA and got two D's. I eventually had to leave the swim team to keep Mod of Cards from derailing, one of the toughest decisions I've made. Anyway, all six episodes of Mod of Cards got completed and there are few things I'm more proud of than it.


You can watch the entire series here.

Thanks for reaching out! I'll get back to you soon.

