Generating Pre-Training Data for TAPAS

Hello,

I am trying to redo the whole training process with German data.
I already collected data for the fine-tuning process but struggle to understand on how the pre-training data is obtained. 
Based on this link (https://github.com/google-research/tapas/blob/9f2163958d1a6ffa15b9ac346eebe0a140460fb9/PRETRAIN_DATA.md) I understand one has to extract data in the proto text format and then convert it into TF examples with the "tapas/create_pretrain_examples_main.py" script. 
Now I'm having difficulty understanding how this data was obtained, especially on how to fill the question keys with values.
Am I missing something? Thanks in advance.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generating Pre-Training Data for TAPAS #174

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Generating Pre-Training Data for TAPAS #174

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions