Skip to content

Commit b5dc21d

Browse files
committed
July 31 post
1 parent 585129f commit b5dc21d

11 files changed

+100
-130
lines changed

404.html

Lines changed: 1 addition & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -4,10 +4,4 @@
44
permalink: /404.html
55
---
66

7-
<div class="text-center">
8-
<h1>Whoops, this page doesn't exist.</h1>
9-
<h1>Move along. (404 error)</h1>
10-
<br/>
11-
12-
<img src="{{ 'assets/img/404-southpark.jpg' | relative_url }}" />
13-
</div>
7+
<h1 class="text-center">404</h1>

_config.yml

Lines changed: 21 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,22 +1,22 @@
11
# --- Basic options --- #
22

33
# Name of website
4-
title: My website
4+
title: PJC
55

66
# Short description of your site
7-
description: A virtual proof that I'm awesome
7+
description: Notes
88

99
# Your name to show in the footer
10-
author: Some Person
10+
author: JJ
1111

1212
# --- List of links in the navigation bar --- #
1313

1414
navbar-links:
15-
About Me: "aboutme"
16-
Resources:
17-
- Beautiful Jekyll: "https://beautifuljekyll.com"
18-
- Learn markdown: "https://www.markdowntutorial.com/"
19-
Author's home: "https://deanattali.com"
15+
# About Me: "aboutme"
16+
# Resources:
17+
# - Beautiful Jekyll: "https://beautifuljekyll.com"
18+
# - Learn markdown: "https://www.markdowntutorial.com/"
19+
# Author's home: "https://deanattali.com"
2020

2121
# --- Logo --- #
2222

@@ -38,11 +38,11 @@ round-avatar: true
3838
# Uncomment the links you want to show and add your information to each one.
3939
# If you don't want to show a link to an RSS feed, set rss to "false".
4040
social-network-links:
41-
email: "someone@example.com"
42-
facebook: deanattali
43-
github: daattali
44-
twitter: daattali
45-
rss: true
41+
# email: "someone@example.com"
42+
# facebook: deanattali
43+
github: pjchungmd
44+
# twitter: daattali
45+
# rss: true
4646
# reddit: yourname
4747
# linkedin: daattali
4848
# xing: yourname
@@ -64,15 +64,15 @@ social-network-links:
6464
# --- General options --- #
6565

6666
# Select which social network share links to show in posts
67-
share-links-active:
68-
twitter: true
69-
facebook: true
70-
linkedin: true
71-
vk: false
67+
#share-links-active:
68+
# twitter: true
69+
# facebook: true
70+
# linkedin: true
71+
# vk: false
7272

7373
# How to display the link to your website in the footer
7474
# Remove this if you don't want a link in the footer
75-
url-pretty: "MyWebsite.com"
75+
# url-pretty: "MyWebsite.com"
7676

7777
# Create a "tags" index page and make tags on each post clickable
7878
link-tags: true
@@ -168,7 +168,8 @@ date_format: "%B %-d, %Y"
168168
# --- You don't need to touch anything below here (but you can if you want) --- #
169169

170170
# Output options (more information on Jekyll's site)
171-
timezone: "America/Vancouver"
171+
#timezone: "America/Vancouver"
172+
timezone: "America/New York"
172173
markdown: kramdown
173174
highlighter: rouge
174175
permalink: /:year-:month-:day-:title/

_posts/2020-02-26-flake-it-till-you-make-it.md

Lines changed: 0 additions & 17 deletions
This file was deleted.

_posts/2020-02-28-test-markdown.md

Lines changed: 0 additions & 78 deletions
This file was deleted.
Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
---
2+
layout: post
3+
title: Extracting Symptom Data from Text
4+
subtitle: How can we ensure that models are reproducible?
5+
#cover-img: /assets/img/path.jpg
6+
#thumbnail-img: /assets/img/thumb.png
7+
#share-img: /assets/img/path.jpg
8+
tags: [Feinstein, NLP]
9+
---
10+
11+
Extracting symptom data from text presents multiple challenges. The first is
12+
identifying which conceptual entities, which may span multiple words, should be
13+
labeled as a symptom, as opposed to physical exam findings, or other objective
14+
findings. However, prior to this, other considerations include dealing with
15+
very messy free text data that can contain (1) misspelled words,
16+
(2) abbreviations (which may or may not be standard), (3) non-standard language
17+
to describe symptoms.
18+
19+
As part of the pipeline, the text must undergo a series of transformations,
20+
most of which result in changes to the original text such that the it cannot be
21+
re-constructed from the new text. Also the transformation operators themselves
22+
can be subject to changes as their dependencies change over time.
23+
24+
This leads to the problem of **reproducibility**. Perhaps a reason why being
25+
able to reproduce results has lagged behind other aspects of machine learning
26+
is that just being able to get things to work well for yourself is difficult
27+
enough, let alone getting it to work for *someone else*.
28+
29+
To tackle this problem in the context of the project to extract symptoms data,
30+
let's consider how we might deal with transformed data and the transformation
31+
operations (which can include human programmed functions or models trained from
32+
data). Because we are using machine learning techniques, there are three items
33+
to consider: (1) data, (2) code, and (3) artifacts. While there are mature
34+
tools availble to version control code (e.g. Git), the process by which data
35+
and artifacts can and should be versioned is less straightforward.
36+
37+
Let's consider what happens when we version control code using a tool like Git.
38+
Once a change has been staged, it is then commited. The commit process involves
39+
the programmer providing a useful comment stating what the commit was for. In
40+
the background, Git keeps track of the diffs between each version of the code
41+
and the comments provide some background information (although of varying
42+
quality) as to what the change is about. Furthermore each version of the code
43+
is immutable.
44+
45+
Currently the methods for version controlling data is haphazard. The most
46+
commonly employed technique seems to be incorporating metadata into the
47+
filename of the data. While a system that utilizes filename metadata might be
48+
useable, the main issue is that the data is mutable - files can be accidentally
49+
overwritten. From a data storage point of view, this can also be very
50+
inefficient since different files may have much in common, leading to redundant
51+
storage.
52+
53+
An issue that code versioning systems do not have to deal with is the problem
54+
of attributing the source of creation. The vast majority of code is written by
55+
humans, and therefore the source information is often found in the commit
56+
comment. Otherwise, if the data is forked, a clear trail exists as to where the
57+
code originated from. However data can come from many sources. It can be raw in
58+
the sense of being pulled from some outside system. It can be processed from an
59+
original source, and have undergone many subsequent transformations, such as
60+
having been processed by either code, or code with a model.
61+
62+
The fact that data now is often processed by models brings us back to the
63+
original difficulty. Models are artifacts that are the by products of both code
64+
and data. Without the reproducibility problem in data solved, data that is
65+
transformed by models propagates this problem to the models as well (since they
66+
are partial offspring of data as well).
67+

assets/css/main.css

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -31,13 +31,13 @@ h1,h2,h3,h4,h5,h6 {
3131
line-height: 1.1;
3232
}
3333
h1 {
34-
font-size: 2.25rem;
34+
font-size: 1.75rem;
3535
}
3636
h2 {
37-
font-size: 1.875rem;
37+
font-size: 1.5rem;
3838
}
3939
h3 {
40-
font-size: 1.5rem;
40+
font-size: 1.25rem;
4141
}
4242
h4 {
4343
font-size: 1.125rem;
@@ -602,15 +602,15 @@ footer .footer-custom-content {
602602
}
603603
.intro-header .page-heading h1 {
604604
margin-top: 0;
605-
font-size: 3.125rem;
605+
font-size: 1.75rem;
606606
}
607607
.intro-header .post-heading h1 {
608608
margin-top: 0;
609-
font-size: 2.1875rem;
609+
font-size: 1.5rem;
610610
}
611611
.intro-header .page-heading .page-subheading,
612612
.intro-header .post-heading .post-subheading {
613-
font-size: 1.6875rem;
613+
font-size: 1.5rem;
614614
line-height: 1.1;
615615
display: block;
616616
font-family: 'Open Sans', 'Helvetica Neue', Helvetica, Arial, sans-serif;

assets/img/avatar-icon.png

827 KB
Loading

assets/img/avatar-icon2.png

18.6 KB
Loading

favicon.ico

198 Bytes
Binary file not shown.

index.html

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
layout: home
3-
title: My website
4-
subtitle: This is where I will tell my friends way too much about me
3+
title: Mind Dump
4+
subtitle: Notes and Ramblings
55
---
66

run.sh

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
#!/bin/sh
2+
3+
bundle exec jekyll serve

0 commit comments

Comments
 (0)