This repository holds the test data and the evaluation script for evaluating the predictions of model on the posology structuration task.
You can install using uv with:
uv sync
Otherwise, you can use other installation methods with the requirements.txt.
Provided that you have a JSON file with predictions (see data format below), you can run the following script:
uv run evaluate.py --predictions <YOUR FILE>
There is a --output_file option to store errors in a txt file.
The expected data format is a list of dictionaries containing two keys:
query: the same query as the corresponding sample in the test setentities: the prediction of the model, it can be a JSON itself as in the test set or the raw text LLM prediction that the evaluation script will try to parse.
The structuration schema contains the following fields:
as_needed: dict (optional), contains information about whether the drug must be taken only under certain condition (e.g. if "as needed during headache" is written)as_needed: bool, True if the drug must be taken under certain conditionas_needed_for: str, the specific condition (e.g. "during headache")
designation: str, the minimal string to which the dosage corresponds, it does not contain the whole instruction, but rather the unit and the type of intake (e.g. "1 comprimé"), this allows to differentiate several posology instructions when neededmax_dose_per_period: dict (optional), contains information about a potential maximum dosingdose: int, the maximum amount of unitsdose_unit: str (optional), the type of unit (e.g. "comprimé")code: str (optional), the SNOMED code of the dose_unit
quantity_and_rate: dict, contains the amount of intake units to take at oncevalue: float, the amount of unit to take at onceunit: str (optional), the type of unitcode: str (optional), the SNOMED code for the unit
timing: dict, contains the the timing with which each intake must take placebounds_duration: dict (optional), the duration during which the treatment occursmax_value: float (optional), the maximum amount of time units the treatment must lastvalue: float, the amount of time units the treatment must last (if max_value is not null, value reperesent the minimal value)unit: str, the unit for measuring the treatment duration ("hours", "day", "week", or "month")
bounds_duration_text: str, the text from which bounds_duration is inferredbounds_period: dict (optional), an alternative to bounds_duration where exact dates have been providedstart_date: str, start date in format YYYY/MM/DDend_date: str, end date in format YYYY/MM/DD
day_of_week: list (optional), list of weekday diminutives when the treatment must be taken (e.g. "mon" or "sat")frequency: int, an integer for the number of intake for the given period (e.g. the 2 in "twice a day")frequency_max: int (optional), an integer for the maximum amount fo intake for the given period (frequency is then the minimum)frequency_texts: list, a list of elements from the original text from which thetimingwas inferrednumber_repeats_allows: int (optional), the number of times the treatment can be renewedoffset: str (optional), substring extracted from the query that indicates what is the offset of the intake ("30 minutes" in "30 minutes before meals")period: int, the number of time periods to observe before repeating the intake (6 in "3 tablets every 6 hours")period_unit: str, the time unit used for measuring the period between intakes ("hours" in "3 tablets every 6 hours")sequence: int (optional), when several posology instructions are given, sequence indicates their ordertime_of_day: list (optional), indicates precise intake hours in the format HH:mm:ss.time_of_dayandwhencannot be simultanously non-emptywhen: list (optional), indicates the moment of intake relative to the patient activity (e.g. "AC" for before a meal), the list of possible values is given in the FHIR standard