Skip to content

Commit 45c59d8

Browse files
committed
feat: migrate over projects to new hugo format
Signed-off-by: Matt Struble <4325029+mattstruble@users.noreply.github.com>
1 parent ff85735 commit 45c59d8

19 files changed

+210
-8
lines changed

config/_default/config.toml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,10 @@ languageName = "English"
44
copyright = "© 2020 - 2024 Matt Struble"
55
title = "Matt Struble"
66
#paginate = 10
7+
timeout = "60s"
8+
79
enableRobotsTXT = true
810
enableGitInfo = true
9-
1011
# theme = "FixIt"
1112

1213
[build]
Loading
Loading
Loading
Lines changed: 89 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,89 @@
1+
---
2+
title: Heineken® AR Cheers Campaign
3+
date: "2019-08-13T00:00:00Z"
4+
description: A web app that blended augmented reality and artificial intelligence to create an interactive user experience for the Heineken® Formula 1 campaign. The campaign was the first time a brand has used web-based AR technology to power a live competition globally.
5+
cover:
6+
src: preview.png
7+
alt: An image separated into three columns. The first column shows two Heineken F1 bottles with a white bottle icon overlayed on top. The second column shows a f1 car rising out of a table with the racing lights above. The third column shows an image of winning two tickets to a concert.
8+
---
9+
10+
A web app that blended augmented reality and artificial intelligence to create an interactive user experience for the [Heineken® Formula 1](https://www.heineken.com/formula-1) campaign.
11+
The campaign was the first time a brand has used web-based AR technology to power a live competition globally.
12+
13+
---
14+
15+
## Project Goal
16+
17+
In June I was contracted to develop the image recognition component of the [Heineken® AR Cheers Campaign](https://www.justaftermidnight247.com/case-study/heineken-ar-cheers-campaign/).
18+
I was given six weeks to create the Heineken® logo detection logic, which needed to fit the following criteria:
19+
20+
1. Be lightweight, and quick enough to operate over a mobile browser without exceeding user bandwidth limits
21+
2. Able to accurately detect multiple variant Heineken® logos in the same picture
22+
3. Recognize the following logos:
23+
![A collage of a variety of different heineken branded products.](collage.jpg "Heineken® Standard, Heineken® Stein Glass, Heineken® F1 Singapore, Heineken® Zero, Heineken® Cup")
24+
25+
## Challenges
26+
27+
##### Recognizing Each Heineken® Logo
28+
29+
One of the most important, and most challenging, aspects of any machine learning project is the training data. The more varied data you have, the more accurate the end result will be
30+
in a production environment. Finding photos of beer online isn't an issue, [Untappd](https://untappd.com/) has an active user base logging each time they drink a beer with an accompanying photo, and after some quick
31+
web scraping and cleaning, I was able to generate a training set containing thousands of unique Heineken® photos. The issue was that the Heineken® F1 Singapore
32+
bottle was unreleased at the time, which means it wasn't on Untappd, meaning all the training data for it had to be created.
33+
34+
Luckily, we were shipped sample F1 bottles to train with. However, only so many images can be generated in-house, and they certainly wouldn't get close to replicating the production environment.
35+
We did our best, creating around 500 images of the F1 bottle in different scenarios (outside, inside, on the street, in nature, etc) with different viewing angles and liquid levels.
36+
Still, 500 images pales in comparison to the thousands of other Heineken® samples, which would've lead to an unbalanced model leaning heavily towards everything except the F1 bottle.
37+
Somehow we needed to come up with more training data to round out the final model, but production was running tight and we could only spend so long collecting data.
38+
39+
The solution was to utilize OpenCV to programmatically introduce randomness to the photos we'd already taken, creating a close-enough representation of different environments.
40+
The following features were randomly changed, with an allowance of 5% variance, in order to create the rest of the F1 dataset: horizontal flip, crop, scale, brightness, contrast, hue, saturation,
41+
color, and noise. The end result was a dataset closely matching what was scraped from Untappd, both in size and image variance.
42+
43+
##### Detecting Multiple Logos
44+
45+
![Animation showing two heineken bottles being placed next to each other on a table. With the object detection bounding box shifting between the two.](multi-detect-fail.gif)
46+
47+
After training the intial dataset for a hundred thousand iterations I ran it through some test videos and immediately noticed some issues.
48+
With all of our training data primarily being single objects, the model had no exposure detecting multiple logos at the same time. Good news was that there was already a pipeline set up
49+
for generating fake-real data, making the solution as simple as capturing, and labeling, more photos with a focus of presenting as many permutations of the bottles and glasses we could.
50+
51+
##### Limiting False Positives
52+
53+
Once the multiple-detection issue was fixed, the model was close to being finalized and entered beta testing within the overall development team. The expanded user testing revealed that the model was incorrectly
54+
identifying other labels that closely resembled the target Heineken® logos. This required going back to Untappd and scraping more images of conflicting brands to feed into the model during training. The actual
55+
labels of the brands didn't matter, all that mattered was that the model didn't falsely identify them as Heineken®, this allowed me to simply group them all as "other" and ignore them in the final model's output.
56+
57+
Now all the required data to train the Heineken® detection model had been created, what was left was to retrain the model and hope for the best come launch.
58+
59+
## Results
60+
61+
![Shows two heineken bottles on a table. The heineken graphic appears overhead with an animation of two bottle icons cheering. Then an F1 car comes out of the table and explodes with a message stating that the user didn't win anything from the raffle.](heineken-f1.gif)
62+
63+
The final model utilized the MobileNet architecture, allowing it to fit within the speed constraints, running a sharded Flask instance on AWS. This ensured that any client, anywhere, could query the custom Flask API with an image,
64+
and almost immediately receive back where each logo was in the image, labeled by type, as well as the total logo count.
65+
66+
![A work desk with a dozen different heineken bottles and glasses on it. The camera pans across each one showing the detection boxes staying put on each object as they are picked up and moved around.](multi-detect.gif)
67+
68+
After all was said and done, the final model was able to accurately detect any variation of the logos listed above. One small caveat of this was that it performed much better when the bottles or glasses were closer to the camera, and in hand.
69+
This can be attributed to the training set, both Untappd and ours, being heavily focused on the objects being held in hand and close to the camera.
70+
Ultimately the requirement for the logos to be closer to the camera didn’t detract from the final product, seeing as it was what the final product was going for.
71+
The customer wanted the app to be focused on the Heineken® brand, so the expectation was that their products would be front and center for each raffle submission.
72+
73+
The gifs below highlight some more example environments we ran the model through. You can notice the false positive issue within in the liquor store example as the model makes a few false detections when it flashes
74+
past some corona bottles. This is due to the motion blur, and was mostly fixed above, but as an added precaution the app required the users to hold the phone steady prior to making a prediction.
75+
76+
![Shows the object detection in use in a standard kitchen. With only the heineken bottle being detected and not any other beer can or bottle.](kitchen-positive.gif)
77+
78+
![A small gif of looking through the glass door in a liquor store. We see the object detection flash over the corona bottle, and as the camera pans quickly. But when it slows down on a specific Presidente bottle there is no detection](liquor_store_false.gif)
79+
80+
## Summary
81+
82+
This project was the first time I've released a machine learning model into a live production environment.
83+
In doing so I was introduced to the entire machine learning development pipeline, from data retrieval, to user testing, and finally release and maintenance.
84+
85+
One of the key takeaways was how much training data needs to be reflective of the final environment. In the beginning I thought that training on individual logos would be enough to meet all the model requirements.
86+
However, that proved not to be the case, and the overabundance of single-logo training data actually impeded the model development timeline, pushing back beta testing which in turn limited the time between beta and release.
87+
88+
I am now fully aware of the long turnaround time in fixing a model and, if it isn't properly accounted for, how it can easily push back the development timeline.
89+
Moving forwards I will be more cognisant of including model retraining, and additional data collection, costs in my initial estimates.
Loading
Loading
Loading
Loading
Loading
Loading
Loading
Lines changed: 104 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,104 @@
1+
---
2+
title: Analyzing Climate Change Stance Through Twitter Data
3+
date: "2019-12-01T00:00:00Z"
4+
description: With 22% of US adults indicating they use Twitter, the platform has become a key stage where the climate change conversation unfolds. As such, this project hoped to understand—and visualize—Americans’ views of climate change as seen through the lens of Twitter.
5+
---
6+
7+
With 22% of US adults indicating they use Twitter, the platform has become a key stage where the climate change conversation unfolds. As such, this project hoped to understand—and visualize—Americans’ views of climate change as seen through the lens of Twitter.
8+
9+
### Method
10+
11+
The approach was two-pronged:
12+
13+
1. Develop a multi-layered predictive model trained with labeled data.
14+
2. Create interactive visualizations housed on a dedicated webpage that facilitates comprehension and boosts engagement.
15+
16+
An important distinguishing characteristic of this project is that it aimed to look past the accuracy of an analytical product and relate the sentiment data to demographic characteristics. It also casts a wider net when collecting raw data, incorporating both critical keywords (i.e., “climate change” and “global warming”) and popular hashtags (e.g., #parisagreement, “#climatehoax”) that represent both sides of the conversation.
17+
18+
Like previous approaches, this relies heavily on extracting sentiment from Twitter. Uniquely, the sentiment measurement system operates by combining previous NLP analysis research and hashtag inference. By layering several models, it is effectively creating a boosting network that relies on the weighted sum of all of its sentiment learners to create one strong learner capable of more accurately predicting sentiment.
19+
20+
##### 1. Labeled Data Collection
21+
22+
This project utilized the following labeled datasets for training the predictive models:
23+
24+
1. [Sentiment Analysis – Global Warming/Climate Change](https://www.figure-eight.com/data-for-everyone/)
25+
2. [Climate Change Sentiment](https://github.com/edwardcqian/climate_change_sentiment)
26+
3. [Twitter Climate Change Sentiment Dataset](https://www.kaggle.com/edqian/twitter-climate-change-sentiment-dataset)
27+
28+
##### 2. Preprocessing
29+
30+
To prepare the data for the bag of words sentiment prediction and visualization stages, the raw tweets were cleaned to extract any useful components (e.g., isolate tweet from full text pulled in with Twitter API; extract location). A pipeline was created that removed non-standard characters, tokenized words in a tweet, removed stop words and stems the resulting set of words. A vocabulary was generated from this set of words to vectorize tweets and implement a TFIDF transformation. Then the data was enriched using LDA topics and hashtag encodings.
31+
32+
##### 3. Modeling
33+
34+
Leveraging the labeled climate change dataset, different algorithms were tested for the bag of words model and used an ensemble voting model composed from the following classifiers:
35+
36+
* Multinomial Naïve Bayes
37+
* Multi-layer perceptron classifier model
38+
* Linear support vector classifier
39+
40+
The voting classifier used a hard-voting system with mostly default hyperparameters. Some adjustments were made to the number of iterations for the multi-layer perceptron, and the number of hidden neurons.
41+
42+
The BERT model leveraged a pretrained model and trained over it, tweaking hyperparameters as needed.
43+
44+
### Model Evaluations
45+
46+
##### Bag of Words Model
47+
48+
###### Model Comparison - Accuracy
49+
50+
| Classifier | Training Set | Test Set |
51+
| ------------------------- | ------------- | --------- |
52+
| Logistic Regression | 73.1% | 56.8% |
53+
| Linear SVC | 81.0% | 56.4% |
54+
| Multinomial Naïve Bayes | 70.0% | 55.3% |
55+
| Multi-layered Perception | 88.5% | 57.3% |
56+
| Ensemble | 81.4% | 59.1% |
57+
58+
###### F1 Score - Test Set
59+
60+
| Classifier | Negative | Neutral | Positive |
61+
|:---|:---:|:---:| :---: |
62+
| Logistic Regression | 0.61 | 0.49 | 0.59 |
63+
| Linear SVC | 0.59 | 0.52 | 0.59 |
64+
| Multinomial Naïve Bayes | 0.60 | 0.46 | 0.59 |
65+
| Multi-layered Perception | 0.61 | 0.54 | 0.57 |
66+
| Ensemble | 0.59 | 0.51 | 0.61 |
67+
68+
The ensemble model performed best when using the bag of words approach. However, the large gap between in-sample performance and out-of-sample performance indicates that overfitting may have occurred.
69+
This had to be contended with when working with a limited training sample size, particularly when it came to obtaining enough "negative" tweets. Even still, the confusion matrix shows a strong performance while maintaining an evenly divided error rate.
70+
71+
![](bow_conf_test.png)
72+
73+
##### BERT Model
74+
75+
###### BERT Accuracy
76+
77+
|---
78+
| Iterations | Training | Test
79+
| :- | :-: | :-:
80+
| 1 | 72.5% | 66.9%
81+
| 2 | 92.2% | 65.6%
82+
| 3 | 95.1% | 67.5%
83+
84+
###### BERT F1 Scoring
85+
86+
| Class | Precision | Recall | F1 | Support |
87+
| :--- | :---: | :---: | :---: | :---: |
88+
| Neutral | .68 | .60 | .64 | 1251 |
89+
| Believer | .69 | .72 | .71 | 1274 |
90+
| Skeptic | .68 | .73 | .70 | 1224 |
91+
| Accuracy | | | .68 | 3749 |
92+
93+
The model consists of two parts, the first of which is a text encoder that transforms tweets into vectors. These are then used in the second part as features. This second part is a deep recurrent neural network that comes "out-of-the-box" with pre-trained weights, which are adjusted as new data is applied in the training step. As can be seen, the validation accuracy during training remained roughly constant throughout the training process, which is a sign of severe overfitting to the training data.
94+
95+
![](bert_conf_test.png)
96+
97+
### Results
98+
99+
![](results.png)
100+
101+
The BERT model was used to classify close to 200,000 tweets relating to global warming, geotagged within the United States over the last ten years. At current Americans overwhelmingly embrace anthropogenic climate change.
102+
A possible explanation is that Twitter amplifies the voices of a non-representative part of the American population. Specifically, while 22% of Americans as a whole use Twitter, 32% of college-educated adults, 26% of urban citizens, and 38% of adults between the ages of 18 and 29 do so. This shows a clear skew towards a segment of the population that one would expect to endorse human-caused climate change.
103+
104+
The results of the experiment reveal that the reach of the climate change discussion has been following a drastic upward trend since the start of 2018, with hundreds of tweets being sent out a day. The more people discuss climate change on Twitter, the more “positive” the overall sentiment has become, with AGW believers outweighing the skeptics. This is evident in seeing previously fully skeptic states transition to nearly full believers within the last year.
Loading
Loading

go.mod

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,6 @@ module struble.dev
33
go 1.23.1
44

55
require (
6-
github.com/mattstruble/hugo-theme-catpuccin v0.0.0-20240914163845-c9c76f33762b // indirect
7-
github.com/schnerring/hugo-mod-json-resume v0.0.0-20240912013022-d0a6933840c5 // indirect
6+
github.com/mattstruble/hugo-theme-catpuccin v0.0.0-20240915022133-fdd5aefc83a5 // indirect
7+
github.com/schnerring/hugo-mod-json-resume v0.0.0-20240915013917-03fb24755059 // indirect
88
)

go.sum

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,12 @@
11
github.com/mattstruble/hugo-theme-catpuccin v0.0.0-20240914163845-c9c76f33762b h1:Hcfqk+raK6598tUMqidpoaC6fntNaBDvKLBK1RK3TM8=
22
github.com/mattstruble/hugo-theme-catpuccin v0.0.0-20240914163845-c9c76f33762b/go.mod h1:r9wZ+bhxo8I4mCrAKRU5k7HRzRRGfXYT10Vstpf3hSA=
3+
github.com/mattstruble/hugo-theme-catpuccin v0.0.0-20240915012930-edaf46200e41 h1:Ldh4EB1jtN7JAEJp6mmwx4OJoxFFyf9+1TCFvpf8UyY=
4+
github.com/mattstruble/hugo-theme-catpuccin v0.0.0-20240915012930-edaf46200e41/go.mod h1:r9wZ+bhxo8I4mCrAKRU5k7HRzRRGfXYT10Vstpf3hSA=
5+
github.com/mattstruble/hugo-theme-catpuccin v0.0.0-20240915014350-99946dec0123 h1:AIEjFfvPGEJf5mg2GpbwMo6ZmlP1YNfKvVzbBJOZMjQ=
6+
github.com/mattstruble/hugo-theme-catpuccin v0.0.0-20240915014350-99946dec0123/go.mod h1:r9wZ+bhxo8I4mCrAKRU5k7HRzRRGfXYT10Vstpf3hSA=
7+
github.com/mattstruble/hugo-theme-catpuccin v0.0.0-20240915022133-fdd5aefc83a5 h1:PLBkTjW6abJfuRVy7UUWyRkTkxVuLuV1ANa2Ntcycxs=
8+
github.com/mattstruble/hugo-theme-catpuccin v0.0.0-20240915022133-fdd5aefc83a5/go.mod h1:+QMjNkJVIrbnlGq+hhaLs6BGU2uvhURuEh+tsl50yRw=
39
github.com/schnerring/hugo-mod-json-resume v0.0.0-20240912013022-d0a6933840c5 h1:AV/aZDOq1Y9yS9DlK137/xPhfLgzwg0w+IlxBWw5OS4=
410
github.com/schnerring/hugo-mod-json-resume v0.0.0-20240912013022-d0a6933840c5/go.mod h1:Pc0QvpaSvbHMym3crVDxSe9O2VpL89TSWEHK3JqPRjQ=
11+
github.com/schnerring/hugo-mod-json-resume v0.0.0-20240915013917-03fb24755059 h1:VnCxbvjKiRklX8uMyc2yqJC6AmaiHz5uYnXCpvO9MFk=
12+
github.com/schnerring/hugo-mod-json-resume v0.0.0-20240915013917-03fb24755059/go.mod h1:Pc0QvpaSvbHMym3crVDxSe9O2VpL89TSWEHK3JqPRjQ=

package-lock.json

Lines changed: 4 additions & 4 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

package.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@
4242
"normalize.css": "^8.0.1",
4343
"prism-themes": "^1.9.0",
4444
"prismjs": "^1.29.0",
45-
"simple-icons": "^13.9.0",
45+
"simple-icons": "^13.10.0",
4646
"typeface-fira-code": "^1.1.13",
4747
"typeface-roboto-slab": "^1.1.13"
4848
},

0 commit comments

Comments
 (0)