Improve feature embeddings implementation

Per discussion with @bpopeters and @vince62s on #1196 

> In the long run, by the time [data is] batched it shouldn't be necessary to know the data type. It's just a batch of data that the model should be able to handle. It only comes up right now because our way of handling feature embeddings (a feature almost no one uses) is super hacky.

> The question is just when, where, and how they get numericalized. I think the best way would be some kind of multilevel field that takes an input which is words plus arbitrarily many features and when numericalized produced the sort of stacked tensor that the embeddings module expects. The ``NestedField`` in torchtext almost, but not quite, fits the bill.

> It would save the trainer and translator from having to reason about the types of the source and target data.

I made a first pass at designing a multilevel field that handles the batching of features so that it's not in ``inputters.make_features``. I opened a PR from that branch against my branch for #1196 so that you can review the diff and guide the design further before I try to open one here (also because #1196 isn't merged). It's [here](https://github.com/flauted/OpenNMT-py/pull/5). I checked the PR script and all seems well.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve feature embeddings implementation #1200

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development