Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
pemami4911 authored Oct 30, 2017
1 parent 96802e5 commit 74e3917
Showing 1 changed file with 3 additions and 1 deletion.
4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,11 @@

**WORK IN PROGRESS**

**UPDATE 10/30/17** Was unable to get the RL pretraining model with greedy decoding to learn on the TSP10 or TSP20 environments. I tried a critic network as well as an exponential moving average baseline. I am using 1 NVIDIA GTX 1080 and trained for 1-2 days. It appears as if the variance of the actor loss is still too high, even with these baselines. Please create an Issue and let me know if you get this to work.

PyTorch implementation of [Neural Combinatorial Optimization with Reinforcement Learning](https://arxiv.org/abs/1611.09940).

So far, I have implemented the basic RL pretraining model from the paper. An implementation of the supervised learning baseline model is available [here](https://github.com/pemami4911/neural-combinatorial-rl-tensorflow).
I have implemented the basic RL pretraining model from the paper. An implementation of the supervised learning baseline model is available [here](https://github.com/pemami4911/neural-combinatorial-rl-tensorflow).

My implementation uses a stochastic decoding policy in the pointer network, realized via PyTorch's `torch.multinomial()`, during training, and beam search (not yet finished, only supports 1 beam a.k.a. greedy) for decoding when testing the model. I have tried to use the same hyperparameters as mentioned in the paper but have not yet been able to replicate results from TSP.

Expand Down

0 comments on commit 74e3917

Please sign in to comment.