Skip to content

Commit

Permalink
quick fix on concatenating text to support more datasets (huggingface…
Browse files Browse the repository at this point in the history
  • Loading branch information
zeyuyun1 authored and fabiocapsouza committed Nov 15, 2020
1 parent 4b03705 commit 677c6d9
Show file tree
Hide file tree
Showing 3 changed files with 3 additions and 3 deletions.
2 changes: 1 addition & 1 deletion examples/language-modeling/run_clm.py
Original file line number Diff line number Diff line change
Expand Up @@ -254,7 +254,7 @@ def tokenize_function(examples):
tokenize_function,
batched=True,
num_proc=data_args.preprocessing_num_workers,
remove_columns=[text_column_name],
remove_columns=column_names,
load_from_cache_file=not data_args.overwrite_cache,
)

Expand Down
2 changes: 1 addition & 1 deletion examples/language-modeling/run_mlm.py
Original file line number Diff line number Diff line change
Expand Up @@ -292,7 +292,7 @@ def tokenize_function(examples):
tokenize_function,
batched=True,
num_proc=data_args.preprocessing_num_workers,
remove_columns=[text_column_name],
remove_columns=column_names,
load_from_cache_file=not data_args.overwrite_cache,
)

Expand Down
2 changes: 1 addition & 1 deletion examples/language-modeling/run_plm.py
Original file line number Diff line number Diff line change
Expand Up @@ -279,7 +279,7 @@ def tokenize_function(examples):
tokenize_function,
batched=True,
num_proc=data_args.preprocessing_num_workers,
remove_columns=[text_column_name],
remove_columns=column_names,
load_from_cache_file=not data_args.overwrite_cache,
)

Expand Down

0 comments on commit 677c6d9

Please sign in to comment.