Bug reconstruct.py

First of all, I would like to thank the authors for such inspiring work and for sharing it with the community. The paper is also very enjoyable to read. Congratulations!

Second, I have been reading the code and trying to reproduce some results in order to be able to cite it / reference it in my own work. I have noticed that the code for reproducing the Llama results are completely missing in the current main branch. Is there a timeline for providing it? This is central to the referencing I am making in my article.

Third, I am quite sure that there are 3 problems with the code in `scripts/training/reconstruct.py`.

### 1

```python
173 ... category_maps = read_npy(args.input+'/category_map.npy').item()
```

should read `category_maps.npy`. Which is a minor mistake.

### 2

```python
194                    for each in category_maps['id2category'][target_iid]:
195                        for e in each:
196                            target.append(e)
```

Here, `each` is a string and therefore, in the second loop, the characters of this string will be looped, adding artifacts to the list `target`, which will contain non-sense like `['p', 'i', 'z', 'z', 'a', '  ' , 'p', 'l', 'a', 'c', 'e']`. This gets fed forward and ends up in the final `train.target` file, which will contain something like

```text
## p @@
## i @@
## z @@
## z @@
## a @@
...
```

### 3

The same problem above appears again in

```python
204                    for query_iid in i_ids:
205                        candidates_list = category_maps['id2category'][query_iid]
206                        candidates = random.sample(candidates_list,1)[0]
207                        titles.append(', '.join(candidates))
```
here, `candidates` is already a string. When doing `', '.join(candidates)` python will interpret it as a iterable and will produce artifacts in the list `titles` which will have something like  `['p,i,z,z,a' , 'p,l,a,c,e']`. This error propagates and ends up contaminating the `train.source` file.


Hence, I wonder if we have an old version in the main branch.

Thank you for your work, I'd appreciate any input on that.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug reconstruct.py #4

1

2

3

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Bug reconstruct.py #4

Description

1

2

3

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions