Closed
Description
Hello, I would like to test the MNIST code with katib.
I tested it with the code below, but it does not produce any results.
What is the problem? Please tell me why.
p.s. It works very well in the docker.
mnist.yaml
apiVersion: "kubeflow.org/v1alpha1"
kind: StudyJob
metadata:
namespace: kubeflow
labels:
controller-tools.k8s.io: "1.0"
name: mnist
spec:
studyName: mnist
owner: ylim
optimizationtype: maximize
objectivevaluename: "A"
optimizationgoal: 1.0
requestcount: 2
metricsnames:
parameterconfigs:
- name: --t
parametertype: double
feasible:
min: "0.1"
max: "0.9"
workerSpec:
goTemplate:
rawTemplate: |-
apiVersion: batch/v1
kind: Job
metadata:
name: {{.WorkerID}}
namespace: kubeflow
spec:
template:
spec:
containers:
- name: {{.WorkerID}}
image: radics93/mnist:1.0
command:
- "python"
- "mnist.py"
{{- with .HyperParameters}}
{{- range .}}
- "{{.Name}}={{.Value}}"
{{- end}}
{{- end}}
restartPolicy: Never
suggestionSpec:
suggestionAlgorithm: "random"
requestNumber: 2
mnist.py
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
import argparse
parser = argparse.ArgumentParser()
parser.add_argument("--t", type=float, help="???")
args = parser.parse_args()
t = args.t
nb_classes = 10
# MNIST data image of shape 28 * 28 = 784
X = tf.placeholder(tf.float32, [None, 784])
# 0 - 9 digits recognition = 10 classes
Y = tf.placeholder(tf.float32, [None, nb_classes])
W = tf.Variable(tf.random_normal([784, nb_classes]))
b = tf.Variable(tf.random_normal([nb_classes]))
# Hypothesis (using softmax)
hypothesis = tf.nn.softmax(tf.matmul(X, W) + b)
cost = tf.reduce_mean(-tf.reduce_sum(Y * tf.log(hypothesis), axis=1))
train = tf.train.GradientDescentOptimizer(learning_rate=t).minimize(cost)
# Test model
is_correct = tf.equal(tf.argmax(hypothesis, 1), tf.argmax(Y, 1))
# Calculate accuracy
accuracy = tf.reduce_mean(tf.cast(is_correct, tf.float32))
# parameters
num_epochs = 15
batch_size = 100
num_iterations = int(mnist.train.num_examples / batch_size)
sess = tf.Session()
# Initialize TensorFlow variables
sess.run(tf.global_variables_initializer())
# Training cycle
for epoch in range(num_epochs):
avg_cost = 0
for i in range(num_iterations):
batch_xs, batch_ys = mnist.train.next_batch(batch_size)
_, cost_val = sess.run([train, cost], feed_dict={X: batch_xs, Y: batch_ys})
avg_cost += cost_val / num_iterations
# Test the model using test sets
A = accuracy.eval(session=sess, feed_dict={X: mnist.test.images, Y: mnist.test.labels})
sess.close()
print("Accuracy: %f" % A)
Dockerfile
FROM python:3.6
FROM tensorflow/tensorflow
MAINTAINER "AAA"
ENV PYTHONUNBUFFERED=0
ADD . ./
WORKDIR ./
Metadata
Assignees
Labels
No labels