Skip to content

Bug reconstruct.py #4

@yrcmpbll

Description

@yrcmpbll

First of all, I would like to thank the authors for such inspiring work and for sharing it with the community. The paper is also very enjoyable to read. Congratulations!

Second, I have been reading the code and trying to reproduce some results in order to be able to cite it / reference it in my own work. I have noticed that the code for reproducing the Llama results are completely missing in the current main branch. Is there a timeline for providing it? This is central to the referencing I am making in my article.

Third, I am quite sure that there are 3 problems with the code in scripts/training/reconstruct.py.

1

173 ... category_maps = read_npy(args.input+'/category_map.npy').item()

should read category_maps.npy. Which is a minor mistake.

2

194                    for each in category_maps['id2category'][target_iid]:
195                        for e in each:
196                            target.append(e)

Here, each is a string and therefore, in the second loop, the characters of this string will be looped, adding artifacts to the list target, which will contain non-sense like ['p', 'i', 'z', 'z', 'a', ' ' , 'p', 'l', 'a', 'c', 'e']. This gets fed forward and ends up in the final train.target file, which will contain something like

## p @@
## i @@
## z @@
## z @@
## a @@
...

3

The same problem above appears again in

204                    for query_iid in i_ids:
205                        candidates_list = category_maps['id2category'][query_iid]
206                        candidates = random.sample(candidates_list,1)[0]
207                        titles.append(', '.join(candidates))

here, candidates is already a string. When doing ', '.join(candidates) python will interpret it as a iterable and will produce artifacts in the list titles which will have something like ['p,i,z,z,a' , 'p,l,a,c,e']. This error propagates and ends up contaminating the train.source file.

Hence, I wonder if we have an old version in the main branch.

Thank you for your work, I'd appreciate any input on that.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions