Skip to content

HELP WANTED: MNIST Error #670

Closed
Closed
@warf34

Description

Hello, I would like to test the MNIST code with katib.
I tested it with the code below, but it does not produce any results.
What is the problem? Please tell me why.

p.s. It works very well in the docker.

mnist.yaml

apiVersion: "kubeflow.org/v1alpha1"
kind: StudyJob
metadata:
  namespace: kubeflow
  labels:
    controller-tools.k8s.io: "1.0"
  name: mnist
spec:
  studyName: mnist
  owner: ylim
  optimizationtype: maximize
  objectivevaluename: "A"
  optimizationgoal: 1.0
  requestcount: 2
  metricsnames:
  parameterconfigs:
    - name: --t
      parametertype: double
      feasible:
        min: "0.1"
        max: "0.9"
  workerSpec:
    goTemplate:
        rawTemplate: |-
          apiVersion: batch/v1
          kind: Job
          metadata:
            name: {{.WorkerID}}
            namespace: kubeflow
          spec:
            template:
              spec:
                containers:
                - name: {{.WorkerID}}
                  image: radics93/mnist:1.0
                  command:
                  - "python"
                  - "mnist.py"
                  {{- with .HyperParameters}}
                  {{- range .}}
                  - "{{.Name}}={{.Value}}"
                  {{- end}}
                  {{- end}}
                restartPolicy: Never
  suggestionSpec:
    suggestionAlgorithm: "random"
    requestNumber: 2

mnist.py

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

import argparse

parser = argparse.ArgumentParser()
parser.add_argument("--t", type=float, help="???")
args = parser.parse_args()

t = args.t

nb_classes = 10

# MNIST data image of shape 28 * 28 = 784
X = tf.placeholder(tf.float32, [None, 784])
# 0 - 9 digits recognition = 10 classes
Y = tf.placeholder(tf.float32, [None, nb_classes])

W = tf.Variable(tf.random_normal([784, nb_classes]))
b = tf.Variable(tf.random_normal([nb_classes]))

# Hypothesis (using softmax)
hypothesis = tf.nn.softmax(tf.matmul(X, W) + b)

cost = tf.reduce_mean(-tf.reduce_sum(Y * tf.log(hypothesis), axis=1))
train = tf.train.GradientDescentOptimizer(learning_rate=t).minimize(cost)

# Test model
is_correct = tf.equal(tf.argmax(hypothesis, 1), tf.argmax(Y, 1))
# Calculate accuracy
accuracy = tf.reduce_mean(tf.cast(is_correct, tf.float32))

# parameters
num_epochs = 15
batch_size = 100
num_iterations = int(mnist.train.num_examples / batch_size)

sess = tf.Session()
# Initialize TensorFlow variables
sess.run(tf.global_variables_initializer())
# Training cycle
for epoch in range(num_epochs):
    avg_cost = 0

    for i in range(num_iterations):
        batch_xs, batch_ys = mnist.train.next_batch(batch_size)
        _, cost_val = sess.run([train, cost], feed_dict={X: batch_xs, Y: batch_ys})
        avg_cost += cost_val / num_iterations

# Test the model using test sets
A = accuracy.eval(session=sess, feed_dict={X: mnist.test.images, Y: mnist.test.labels})
sess.close()
print("Accuracy: %f" % A)

Dockerfile

FROM python:3.6
FROM tensorflow/tensorflow

MAINTAINER "AAA"

ENV PYTHONUNBUFFERED=0

ADD . ./
WORKDIR ./

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions