-
-
Notifications
You must be signed in to change notification settings - Fork 608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A demo is 1.5x faster in Flux than tensorflow, both use cpu; while 3.0x slower during using CUDA #1694
Comments
You'd want to avoid globals, and perhaps turn off the logging. The output of It you want to amortize the cost copying data on the gpu with every iteration which can add up quickly and silently, you may want to check out #1530 Besides that, these questions may be better suited to the JuliaLang Slack or discourse perhaps? |
Yes, there is a lot we can talk about, but the issue tracker isn't a great place for it. Please open a thread on Discourse (see https://discourse.julialang.org/t/psa-how-to-quote-code-with-backticks/7530 about formatting) and we can pick up there. |
@DhairyaLGandhi And, I test "X = randn(Float32, 2000, 100000) |> gpu", the momery of X allocated in local momery not GPU memory. |
@ToucheSir |
Flux layers should be exploiting parallelism through BLAS and other libraries as well, I don't believe that is the culprit. Anyhow, the offer still stands to open a discourse thread if you're having trouble closing the performance gap. I can provide a whole laundry list of recommendations :) |
As the title, I use Julia 1.6.2 and tensorflow 2.3.0 and cuda 11.0, my code as follows:
Flux:
using Flux
using CUDA
data = randn(Float32, 2, 100000) |> gpu
y = reshape(sin.(data[1,:] .* data[2,:]), (1, size(data)[2])) |> gpu
model = Chain(
Dense(2, 10, relu),
Dense(10, 10, relu),
Dense(10, 10, relu),
Dense(10, 10, relu),
Dense(10, 10, relu),
Dense(10, 10, relu),
Dense(10, 10, relu),
Dense(10, 1),
) |> gpu
opt = ADAM(0.001, (0.9, 0.999))
loss(x, y) = Flux.Losses.mse(model(x), y)
ps = Flux.params(model)
dl = Flux.DataLoader((data, y), batchsize=500, shuffle=true)|> gpu
Flux.@epochs 100 Flux.train!(loss, ps, dl, opt; cb = Flux.throttle(() -> @show(loss(data, y)), 10))
tensorflow
def test_tf():
import tensorflow as tf
import numpy as np
from tensorflow import keras
# tf.config.experimental.set_visible_devices(gpu[0], 'GPU')
with tf.device("/gpu:0"):
model = tf.keras.Sequential([
keras.layers.Dense(units=10, activation='relu', input_shape=[2]),
keras.layers.Dense(units=10, activation='relu'),
keras.layers.Dense(units=10, activation='relu'),
keras.layers.Dense(units=10, activation='relu'),
keras.layers.Dense(units=10, activation='relu'),
keras.layers.Dense(units=10, activation='relu'),
keras.layers.Dense(units=10, activation='relu'),
keras.layers.Dense(units=1),
]
)
model.compile(optimizer=keras.optimizers.Adam(1e-3), loss="mean_squared_error")
xs = np.random.randn(100000, 2).astype(np.float32)
ys = np.sin(xs[:,0] * xs[:, 1]).astype(np.float32)
model.fit(xs, ys, epochs=100, batch_size=500)
if name == "main":
import time
t0 = time.time()
test_tf()
print("everage time of epoch is {}".format((time.time()-t0)/100))
The text was updated successfully, but these errors were encountered: