-
Notifications
You must be signed in to change notification settings - Fork 1
Description
First of all, I would like to thank the authors for such inspiring work and for sharing it with the community. The paper is also very enjoyable to read. Congratulations!
Second, I have been reading the code and trying to reproduce some results in order to be able to cite it / reference it in my own work. I have noticed that the code for reproducing the Llama results are completely missing in the current main branch. Is there a timeline for providing it? This is central to the referencing I am making in my article.
Third, I am quite sure that there are 3 problems with the code in scripts/training/reconstruct.py.
1
173 ... category_maps = read_npy(args.input+'/category_map.npy').item()should read category_maps.npy. Which is a minor mistake.
2
194 for each in category_maps['id2category'][target_iid]:
195 for e in each:
196 target.append(e)Here, each is a string and therefore, in the second loop, the characters of this string will be looped, adding artifacts to the list target, which will contain non-sense like ['p', 'i', 'z', 'z', 'a', ' ' , 'p', 'l', 'a', 'c', 'e']. This gets fed forward and ends up in the final train.target file, which will contain something like
## p @@
## i @@
## z @@
## z @@
## a @@
...
3
The same problem above appears again in
204 for query_iid in i_ids:
205 candidates_list = category_maps['id2category'][query_iid]
206 candidates = random.sample(candidates_list,1)[0]
207 titles.append(', '.join(candidates))here, candidates is already a string. When doing ', '.join(candidates) python will interpret it as a iterable and will produce artifacts in the list titles which will have something like ['p,i,z,z,a' , 'p,l,a,c,e']. This error propagates and ends up contaminating the train.source file.
Hence, I wonder if we have an old version in the main branch.
Thank you for your work, I'd appreciate any input on that.