You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to build a NER model based on the setup explained here: https://aphp.github.io/edsnlp/latest/tutorials/training-ner/
I basically have a pandas dataframe with a note_id and note_text and for each note a set of annotations which I annotated with inception. I combined the data with the texts with the entities
>>> z = df.iloc[840]
>>> z
note_id 315
note_text b'Some text blablabla more blablabla'.
text b'Some text blablabla more blablabla'.
note_datetime ANNOTATION-IN-PROGRESS
entities [{'start_char': 188, 'end_char': 203, 'ent_tex...
entities looks like this
[{'start_char': 188, 'end_char': 203, 'ent_text': '14 januari 2011', 'ent_label': '03-Datum', 'label': '03-Datum', 'note_nlp_source_value': '03-Datum', 'text': '14 januari 2011'}, {'start_char': 199, 'end_char':
203, 'ent_text': '2011', 'ent_label': '03-Datum', 'label': '03-Datum', 'note_nlp_source_value': '03-Datum', 'text': '2011'}, {'start_char': 211, 'end_char': 231, 'ent_text': 'MAMA MIA Jan Julien', 'ent_label':
'01-Naam', 'label': '01-Naam', 'note_nlp_source_value': '01-Naam', 'text': 'MAMA MIA Jan Julien'}]
I next try to plug in the data in your training "From a script or a notebook" code where I replace
When I launch the train command it says it does not find text.
It's unclear based on the docs what the input data should look like. Could you elaborate that?
>>> train(
... nlp=nlp,
... max_steps=max_steps,
... validation_interval=max_steps // 10,
... train_data=TrainingData(
... data=train_data,
... batch_size="4096 tokens", # 32 * 128 tokens
... pipe_names=["ner"],
... shuffle="dataset",
... ),
... val_data=val_data,
... scorer={"ner": ner_metric},
... optimizer=optimizer,
... grad_max_norm=1.0,
... output_dir="artifacts",
... logger=loggers,
... # Do preprocessing in parallel on 1 worker
... num_workers=1,
... # Enable on Mac OS X or if you don't want to use available GPUs
... # cpu=True,
... )
Trainable components: ner
Training phases:
- 1: ner
File "<stdin>", line 1, in <module>
File "C:\Users\jwijf\AppData\Local\Programs\Python\Python312\Lib\site-packages\confit\registry.py", line 393, in wrapper_function
raise e.with_traceback(remove_lib_from_traceback(e.__traceback__))
File "C:\Users\jwijf\AppData\Local\Programs\Python\Python312\Lib\site-packages\pydantic\deprecated\decorator.py", line 227, in execute
return self.raw_function(**d, **var_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\jwijf\AppData\Local\Programs\Python\Python312\Lib\site-packages\edsnlp\training\trainer.py", line 651, in train
val_docs = list(chain.from_iterable(val_data))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\jwijf\AppData\Local\Programs\Python\Python312\Lib\site-packages\edsnlp\processing\simple.py", line 104, in process
for item in items:
^^^^^
File "C:\Users\jwijf\AppData\Local\Programs\Python\Python312\Lib\site-packages\edsnlp\core\stream.py", line 168, in __call__
yield from res
File "C:\Users\jwijf\AppData\Local\Programs\Python\Python312\Lib\site-packages\edsnlp\pipes\misc\split\split.py", line 163, in __call__
for sub_doc in self.split_doc(doc):
^^^^^^^^^^^^^^^^^^^
File "C:\Users\jwijf\AppData\Local\Programs\Python\Python312\Lib\site-packages\edsnlp\pipes\misc\split\split.py", line 200, in split_doc
for m in self.regex.finditer(doc.text)
^^^^^^^^
AttributeError: 'dict' object has no attribute 'text'
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello,
I'm trying to build a NER model based on the setup explained here: https://aphp.github.io/edsnlp/latest/tutorials/training-ner/
I basically have a pandas dataframe with a note_id and note_text and for each note a set of annotations which I annotated with inception. I combined the data with the texts with the entities
entities looks like this
I next try to plug in the data in your training "From a script or a notebook" code where I replace
When I launch the train command it says it does not find text.
It's unclear based on the docs what the input data should look like. Could you elaborate that?
Beta Was this translation helpful? Give feedback.
All reactions