-
Notifications
You must be signed in to change notification settings - Fork 6.8k
custom loss symbol in R/Python #3368
Comments
If the goal is to do linear regression, maybe mxnet.symbol.LogisticRegressionOutput can be considered. |
Thank you for the suggestion. In my case the objective function could be complex and I just want an example for MakeLoss. I believe this rcnn is the only official example with MakeLoss, and it's still too complex to see how to create a mlp with custom loss function. In tflearn this is quite simple, you can provide a function directly. I also read the example to create a new operator, but it looks very complex and unnecessary. |
Well simplicity comes with the price of slow speed and high memory cost. |
I don't get this part. For a customized objective function, using MakeLoss is easier than write a new layer. Anyway, I figure out how to use it now. But the performance looks wired. I tried regression sample in http://mxnet.readthedocs.io/en/latest/packages/r/fiveMinutesNeuralNetwork.html and changed loss to |
How about try something simpler like l2-loss as MakeLoss(square(fc - label)) and see if other components of this learning process is affecting performance? |
I tried square loss and it doesn't work. This is another test and doesn't work as well. |
Then perhaps try symbol.LinearRegressionOutput to make sure other part like io or symbol composition is correct? If that works, the problem could be the composited loss. It would be most helpful if you give an example where the loss is giving wrong gradient. Refer to symbol.simple_bind for this purpose. |
Here is the full script. two models are{ }{ |
Hi, I'm trying to also implement mean-squared error as a custom loss functions as a path to understanding how to implement custom loss functions in general. I have run the example code you have posted above and got the same results. As far as I can tell something which is missing is attaching the label/response values to the Unless by giving it the name |
First, your usage of make Second, you can try different I just randomly pick one and the result data <- mx.symbol.Variable("data")
label <- mx.symbol.Variable("label")
fc1 <- mx.symbol.FullyConnected(data, num_hidden=1)
lro <- mx.symbol.MakeLoss(mx.symbol.square(mx.symbol.Reshape(fc1, shape = 0) - label))
mx.set.seed(0)
model <- mx.model.FeedForward.create(lro, X=train.x, y=train.y,
initializer=mx.init.uniform(0.002),
ctx=mx.cpu(), num.round=50, array.batch.size=20,
learning.rate=2e-12, momentum=0.9,
eval.metric=mx.metric.rmse)
preds = predict(model, test.x)
sqrt(mean((preds-test.y)^2))
## [1] 23.88418 |
I think we might need to write a small vignette on how to use |
@thirdwing |
Apologies for cross-posing, but I think the issue I have raised here is relevant to this discussion also. I have given full details in this SO question, but basically even with the standard Note that I'm only looking at training performance, so it should be possible to get very low errors, and this is the case with other neural network tools, but I have been unable to get good regression performance from |
Thanks to some help provided on the other issue I have been able to resolve the problems I've had with regression performance of However I have noticed something unexpected (to me) when using To illustrate this consider the example below, in which I train 4 networks to perform simple polynomial regression. The first is using On all plots I have included a y=x (perfect fit) line (red), and on the third plot I have included a y=x^2 line (blue). As you can see the model trained with a square loss function appears to output the square of the response. This does not make sense to me, is it expected behaviour? If so my understanding of how RMSE results:
Produced from this code:
|
@piiswrong Do you have any idea on @khalida 's question? |
I guess MakeLoss was used as grad function rather than loss function? |
One possible explanation from what I can see (keep in mind I don't understand the internal workings of This is made a little clearer if we normalize (to zero mean and unit variance) the data of the excellent If this is the case then what we really want when we call The plots from this example are shown below: And the code used to produce these plots (based on the BostonHousing example provided by @thirdwing) is pasted below
|
Ok, I have managed to work this out by following the example for extracting the data from internal layers in R given here. Below is some example code which trains an linear regression model using For the If someone with some more mxnet experience (@thirdwing, @piiswrong, @Lodewic) could give the code below a quick check, I would be happy to update the A few things which feel a little hacky about my implementation below:
Code:
|
@khalida Thank you for what you have done. I have to admit I never used If we can confirm your solution, let's fix the documents first. Then I will try to provide some helper functions to make it easy. |
Hi @thirdwing many thanks for the response. Could you provide some links to appropriate parts of the In the meantime the example below might be able to form the basis of some updated documentation for the use of It's a bit long, but essentially it attempts to train 4 neural networks using
Each of the networks are then used for prediction (on both the in-sample training set, and a held-out test set) and their responses assessed using 3 error metrics. As might be hoped for, the networks perform well on the metric they have been trained to minimize. The results (errors have been normalized to the errors of the LRO model):
The code:
|
As pointed out here (regarding training to minimize custom loss functions in Julia) the above works, but is rather limited. In particular I have been unable to find a way to log the training and validation error during training. I tried to follow the example here, but doing so runs into the problem identified above (the output of a network with a custom loss, is the loss itself, not a predicted value). I have two questions:
Any pointers greatly appreciated. |
I was also confused how to add custom loss functions in python, just like TensorFlow. Sometimes I feel that some functions in MXNet are black-box... |
@khalida I have updated the document using your example. # Network config
optimizer <- "rmsprop"
batchSize <- 60
nRounds <- 50
nHidden <- 14
verbose <- FALSE
array.layout <- "rowmajor"
library(mxnet)
data(BostonHousing, package="mlbench")
BostonHousing[, sapply(BostonHousing, is.factor)] <-
as.numeric(as.character(BostonHousing[, sapply(BostonHousing, is.factor)]))
BostonHousing <- data.frame(scale(BostonHousing))
test.ind = seq(1, 506, 5) # 1 pt in 5 used for testing
train.x = data.matrix(BostonHousing[-test.ind, -14])
train.y = BostonHousing[-test.ind, 14]
test.x = data.matrix(BostonHousing[--test.ind, -14])
test.y = BostonHousing[--test.ind, 14]
data <- mx.symbol.Variable("data")
label <- mx.symbol.Variable("label")
fc1 <- mx.symbol.FullyConnected(data, num_hidden=nHidden, name="fc1")
tanh1 <- mx.symbol.Activation(fc1, act_type="tanh", name="tanh1")
fc2 <- mx.symbol.FullyConnected(tanh1, num_hidden=1, name="fc2")
lro <- mx.symbol.LinearRegressionOutput(fc2, name="lro")
mx.set.seed(0)
model <- mx.model.FeedForward.create(lro,
X=train.x, y=train.y,
eval.data=list(data=test.x, label=test.y),
ctx=mx.cpu(), num.round=nRounds,
array.batch.size=batchSize,
eval.metric=mx.metric.rmse,
optimizer=optimizer, verbose=verbose,
array.layout=array.layout)
pred <- predict(model, test.x)
lro2 <- mx.symbol.MakeLoss(mx.symbol.square(mx.symbol.Reshape(fc2, shape = 0) - label), name="lro2")
mx.set.seed(0)
model2 <- mx.model.FeedForward.create(lro2,
X=train.x, y=train.y,
eval.data=list(data=test.x, label=test.y),
ctx=mx.cpu(), num.round=nRounds,
array.batch.size=batchSize,
eval.metric=mx.metric.rmse,
optimizer=optimizer, verbose=verbose,
array.layout=array.layout)
internals = internals(model2$symbol)
fc_symbol = internals[[match("fc2_output", outputs(internals))]]
model3 <- list(symbol = fc_symbol,
arg.params = model2$arg.params,
aux.params = model2$aux.params)
class(model3) <- "MXFeedForwardModel"
pred3 <- predict(model3, test.x)
# Plotting of fits
par(mfrow=c(1,2))
# Train fits
plot(test.y, pred[1,], main="nnet Train Fit", xlab="Target", ylab="Response")
abline(0,1, col="red", lwd=2)
plot(test.y, pred3[1,], main="nnet MakeLoss square Train Fit", xlab="Target", ylab="Response")
abline(0,1, col="red", lwd=2) The output of So currently the metric doesn't work with |
I tried to create a custom loss symbol in R or Python. I found an example using MakeLoss in Python at https://zhuanlan.zhihu.com/p/21725762?refer=xlvector. I tried to create a network to minimize MSE for linear regression but never work. Could anyone please provide an example? Thanks.
The text was updated successfully, but these errors were encountered: