Skip to content

Commit 20fb08c

Browse files
committed
First commit
0 parents  commit 20fb08c

20 files changed

+4147
-0
lines changed

.DS_Store

6 KB
Binary file not shown.

Assignment 1.pdf

130 KB
Binary file not shown.

code/.DS_Store

6 KB
Binary file not shown.

code/README.txt

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
Name: Dorin Keshales
2+
ID: 313298424
3+
4+
I added a file called mlpn_main.py in order to allow you to determine how many hidden layers you want and their sizes.
5+
6+
In train_mlpn I wrote hard coded the number of layers and their sizes according to what I found giving high accuracy on the validation set.
7+
8+
So you have the ability to choose between the two.

code/answers.txt

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
Name: Dorin Keshales
2+
ID: 313298424
3+
4+
1. I got about the same percentage of accuracy with both models - around 85%-86.6%.
5+
Sometimes with the log-linear model I even got accuracy of 87%.
6+
In my opinion, when linear model gets such high accuracy there is no much to MLP with one hidden layer to do in order to improve the accuracy. I mean that a linear model is enough in this case to solve the language identification task well.
7+
8+
9+
2. The best I can get with the MLP1 model with the letter-unigrams features is accuracy of 69%-70%. And with the log-linear model, the best I can get with these features is accuracy of 72%.
10+
In my opinion, the reason for lower percentage of accuracy with the letter-unigrams features in contrary to the letter-bigrams features is that we have much less unigram features than bigrams features. And when looking on the probabilities to each language (which are a consequence of the frequency of the unigrams features in that language) after the Softmax, we are much less sure in our prediction, because we have much less features to count on, when predicting. Therefore, we get more wrong predictions with the letter-unigrams features and the accuracy is lower.
11+
12+
13+
3. In each execution of train_mlp1, I got different number of iterations in which I correctly solve the xor problem. In my opinion, it's caused because of the random initialisation of the weights matrices and bias vectors. Moreover, it is known that perceptron doesn't assure that after he saw an example he will correctly classify this example the next time he will see it. And that's another reason that can explain the difference between the runs.
14+
In order to still be able to answer that question, I used an average of 5 runs which can approximately tell how many iterations it takes to mlp1 to correctly solve the xor problem.
15+
So, on an average of 5 runs, I was able to solve the xor problem in the 34th iteration.

code/grad_check.py

Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
import numpy as np
2+
3+
STUDENT = {'name': 'Dorin Keshales',
4+
'ID': '313298424'}
5+
6+
7+
def gradient_check(f, x):
8+
"""
9+
Gradient check for a function f
10+
- f should be a function that takes a single argument and outputs the cost and its gradients
11+
- x is the point (numpy array) to check the gradient at
12+
"""
13+
fx, grad = f(x) # Evaluate function value at original point
14+
h = 1e-4
15+
16+
# Iterate over all indexes in x
17+
it = np.nditer(x, flags=['multi_index'], op_flags=['readwrite'])
18+
while not it.finished:
19+
ix = it.multi_index
20+
21+
### modify x[ix] with h defined above to compute the numerical gradient.
22+
### if you change x, make sure to return it back to its original state for the next iteration.
23+
### YOUR CODE HERE:
24+
x_plus = x.copy()
25+
x_plus[ix] = x_plus[ix] + h
26+
27+
x_minus = x.copy()
28+
x_minus[ix] = x_minus[ix] - h
29+
30+
fx_plus, grad_plus = f(x_plus)
31+
fx_minus, grad_minus = f(x_minus)
32+
33+
numeric_gradient = np.divide(np.subtract(fx_plus, fx_minus), (2.0 * h))
34+
### END YOUR CODE
35+
36+
# Compare gradients
37+
reldiff = abs(numeric_gradient - grad[ix]) / max(1, abs(numeric_gradient), abs(grad[ix]))
38+
if reldiff > 1e-5:
39+
print("Gradient check failed.")
40+
print("First gradient error found at index %s" % str(ix))
41+
print("Your gradient: %f \t Numerical gradient: %f" % (grad[ix], numeric_gradient))
42+
return
43+
44+
it.iternext() # Step to next index
45+
46+
print("Gradient check passed!")
47+
48+
49+
def sanity_check():
50+
"""
51+
Some basic sanity checks.
52+
"""
53+
quad = lambda x: (np.sum(x ** 2), x * 2)
54+
55+
print("Running sanity checks...")
56+
gradient_check(quad, np.array(123.456)) # scalar test
57+
gradient_check(quad, np.random.randn(3, )) # 1-D test
58+
gradient_check(quad, np.random.randn(4, 5)) # 2-D test
59+
print("")
60+
61+
62+
if __name__ == '__main__':
63+
# If these fail, your code is definitely wrong.
64+
sanity_check()

code/loglinear.py

Lines changed: 133 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,133 @@
1+
import numpy as np
2+
3+
STUDENT = {'name': 'Dorin Keshales',
4+
'ID': '313298424'}
5+
6+
7+
def softmax(x):
8+
"""
9+
Compute the softmax vector.
10+
x: a n-dim vector (numpy array)
11+
returns: an n-dim vector (numpy array) of softmax values
12+
"""
13+
# YOUR CODE HERE
14+
# Your code should be fast, so use a vectorized implementation using numpy,
15+
# don't use any loops.
16+
# With a vectorized implementation, the code should be no more than 2 lines.
17+
#
18+
# For numeric stability, use the identify you proved in Ex 2 Q1.
19+
20+
x -= x.max()
21+
x = np.exp(x) / np.sum(np.exp(x))
22+
23+
return x
24+
25+
26+
def classifier_output(x, params):
27+
"""
28+
Return the output layer (class probabilities)
29+
of a log-linear classifier with given params on input x.
30+
"""
31+
W, b = params
32+
# YOUR CODE HERE.
33+
34+
# Calculating the output of the model and using SoftMax
35+
result = np.dot(x, W) + b
36+
probs = softmax(result)
37+
38+
return probs
39+
40+
41+
def predict(x, params):
42+
"""
43+
Returnss the prediction (highest scoring class id) of a
44+
a log-linear classifier with given parameters on input x.
45+
46+
params: a list of the form [(W, b)]
47+
W: matrix
48+
b: vector
49+
"""
50+
return np.argmax(classifier_output(x, params))
51+
52+
53+
def loss_and_gradients(x, y, params):
54+
"""
55+
Compute the loss and the gradients at point x with given parameters.
56+
y is a scalar indicating the correct label.
57+
58+
returns:
59+
loss,[gW,gb]
60+
61+
loss: scalar
62+
gW: matrix, gradients of W
63+
gb: vector, gradients of b
64+
"""
65+
W, b = params
66+
# YOU CODE HERE
67+
68+
# Calculating the loss
69+
model_output = classifier_output(x, params)
70+
loss = -np.log(model_output[y])
71+
72+
# derivative of the loss by b
73+
gb = model_output.copy()
74+
gb[y] -= 1
75+
76+
# derivative of loss by W
77+
copy_output = model_output.copy()
78+
gW = np.outer(x, copy_output)
79+
gW[:, y] -= x
80+
81+
return loss, [gW, gb]
82+
83+
84+
def create_classifier(in_dim, out_dim):
85+
"""
86+
returns the parameters (W,b) for a log-linear classifier
87+
with input dimension in_dim and output dimension out_dim.
88+
"""
89+
W = np.zeros((in_dim, out_dim))
90+
b = np.zeros(out_dim)
91+
return [W, b]
92+
93+
94+
if __name__ == '__main__':
95+
# Sanity checks for softmax. If these fail, your softmax is definitely wrong.
96+
# If these pass, it may or may not be correct.
97+
test1 = softmax(np.array([1, 2]))
98+
print(test1)
99+
assert np.amax(np.fabs(test1 - np.array([0.26894142, 0.73105858]))) <= 1e-6
100+
101+
test2 = softmax(np.array([1001, 1002]))
102+
print(test2)
103+
assert np.amax(np.fabs(test2 - np.array([0.26894142, 0.73105858]))) <= 1e-6
104+
105+
test3 = softmax(np.array([-1001, -1002]))
106+
print(test3)
107+
assert np.amax(np.fabs(test3 - np.array([0.73105858, 0.26894142]))) <= 1e-6
108+
109+
# Sanity checks. If these fail, your gradient calculation is definitely wrong.
110+
# If they pass, it is likely, but not certainly, correct.
111+
from grad_check import gradient_check
112+
113+
W, b = create_classifier(3, 4)
114+
115+
116+
def _loss_and_W_grad(W):
117+
global b
118+
loss, grads = loss_and_gradients([1, 2, 3], 0, [W, b])
119+
return loss, grads[0]
120+
121+
122+
def _loss_and_b_grad(b):
123+
global W
124+
loss, grads = loss_and_gradients([1, 2, 3], 0, [W, b])
125+
return loss, grads[1]
126+
127+
128+
# for _ in xrange(10):
129+
for _ in range(10):
130+
W = np.random.randn(W.shape[0], W.shape[1])
131+
b = np.random.randn(b.shape[0])
132+
gradient_check(_loss_and_b_grad, b)
133+
gradient_check(_loss_and_W_grad, W)

code/mlp1.py

Lines changed: 110 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,110 @@
1+
import numpy as np
2+
3+
STUDENT = {'name': 'Dorin Keshales',
4+
'ID': '313298424'}
5+
6+
7+
def classifier_output(x, params):
8+
# YOUR CODE HERE.
9+
10+
W, b, U, b_tag = params
11+
12+
# Calculation of the input to the hidden layer
13+
result = np.dot(x, W) + b
14+
15+
# Saving copy of the hidden layer input - before the tanh
16+
global z1
17+
z1 = result.copy()
18+
19+
# Using tanh as activation function
20+
result = np.tanh(result)
21+
22+
# Saving copy of the hidden layer output - after the tanh
23+
global h1
24+
h1 = result.copy()
25+
26+
# Calculating the output of the model and using SoftMax
27+
result = np.dot(result, U) + b_tag
28+
result -= result.max()
29+
probs = np.exp(result) / np.sum(np.exp(result))
30+
31+
return probs
32+
33+
34+
def predict(x, params):
35+
"""
36+
params: a list of the form [W, b, U, b_tag]
37+
"""
38+
return np.argmax(classifier_output(x, params))
39+
40+
41+
def loss_and_gradients(x, y, params):
42+
"""
43+
params: a list of the form [W, b, U, b_tag]
44+
45+
returns:
46+
loss,[gW, gb, gU, gb_tag]
47+
48+
loss: scalar
49+
gW: matrix, gradients of W
50+
gb: vector, gradients of b
51+
gU: matrix, gradients of U
52+
gb_tag: vector, gradients of b_tag
53+
"""
54+
# YOU CODE HERE
55+
56+
W, b, U, b_tag = params
57+
58+
# Calculating the loss
59+
model_output = classifier_output(x, params)
60+
loss = -np.log(model_output[y])
61+
62+
# derivative of the loss by b_tag
63+
gb_tag = model_output.copy()
64+
gb_tag[y] -= 1
65+
66+
# derivative of loss by U
67+
copy_output = model_output.copy()
68+
copy_h1 = h1.copy()
69+
gU = np.outer(copy_h1, copy_output)
70+
gU[:, y] -= copy_h1
71+
72+
# derivative of softmax by h1 which represents the vector after the tanh
73+
ds_dh1 = np.dot(U, model_output) - U[:, y]
74+
75+
# derivative of the vector after the tanh (h1) by the vector before the tanh (z1)
76+
copy_z1 = z1.copy()
77+
dh1_dz1 = 1 - np.square(np.tanh(copy_z1))
78+
79+
# derivative of the loss by b
80+
gb = ds_dh1 * dh1_dz1
81+
# derivative of the loss by W
82+
gW = np.outer(x, gb.copy())
83+
84+
return loss, [gW, gb, gU, gb_tag]
85+
86+
87+
# Initialization function to the weights matrices and the bias vectors.
88+
def my_random(size1, size2=None):
89+
t = 1 if size2 is None else size2
90+
eps = np.sqrt(6.0 / (size1 + t))
91+
return np.random.uniform(-eps, eps, (size1, size2)) if size2 is not None else np.random.uniform(-eps, eps, size1)
92+
93+
94+
def create_classifier(in_dim, hid_dim, out_dim):
95+
"""
96+
returns the parameters for a multi-layer perceptron,
97+
with input dimension in_dim, hidden dimension hid_dim,
98+
and output dimension out_dim.
99+
100+
return:
101+
a flat list of 4 elements, W, b, U, b_tag.
102+
"""
103+
104+
W = my_random(in_dim, hid_dim)
105+
b = my_random(hid_dim)
106+
U = my_random(hid_dim, out_dim)
107+
b_tag = my_random(out_dim)
108+
109+
params = [W, b, U, b_tag]
110+
return params

0 commit comments

Comments
 (0)