Probe matcher #22

kaitlyn-sharo · 2024-03-21T20:32:04Z

Matches user actions from simulator JSON files to the probes/action-mappings in the yaml files. Maybe this should be in a different repository, but I built it here for now since this is isolated python scripts and I was putting the yaml-to-json script here as well. If it should be somewhere else, let me know and I will move it.

To test, download participant data

Then, run python3 probe-matcher.py -i [path-to-dir-you-just-downloaded]

You should see the data output in a directory called output. Consider hand-checking some of the outputs against the json and yamls to make sure all is good.

For reviewing code, the only applicable file is the probe_matcher.py python script

brianpippin · 2024-03-27T23:13:44Z

_metrics_eval_human_data.zip

This is all the data I was able to download from confluence and match up the folder structure. I noticed some of the files give a lot of warnings, this might be expected, please review and let me know, how it works for you.

kaitlyn-sharo · 2024-03-28T01:17:06Z

_metrics_eval_human_data.zip

This is all the data I was able to download from confluence and match up the folder structure. I noticed some of the files give a lot of warnings, this might be expected, please review and let me know, how it works for you.

I downloaded this zip and put it in the format the program needs. I'm seeing 26 files with no errors at all, and 6 others with very few errors that I would expect. This would come out to 32/2 = 16 valid participants, if each participant has 2 valid files and there aren't several participants with only one valid file. Two did not run probably due to some formatting issues, and I'm not 100% sure I have all of the participant data transferred to the right format, but I have most of it. 13 others have way more errors. I've looked at each of the individual files to see why and have listed the file names and reasons below:

7e23cc31-422a-42e1-acb5-964c661750f4_2024215.json - desert - did not take any actions (probably had to restart for some reason once booted up)

92c78261-0f01-4d4f-8ae3-7aa875d6cd7b_2024219.json - urban - running the tutorial scenario

cbbf410f-4657-428e-9616-8a777cc4704d_2024204.json - sub - ended early, before Adept happened

0fb9b0c5-21bb-4f2a-b97f-a01a5e67e492_2024214.json - urban - answered first soartech probe, then quit out, no more actions recorded

2e8f6555-a7fa-4b54-8132-c030d697b4ad_2024212.json - jungle - I'm not sure about this one. it looks good but is having trouble latching onto the soartech probes. I'm going to take another look at it

a1bcac92-c01f-485a-9ce0-8198739b2c27_2024212.json - urban - another tutorial

8cc40587-ff7d-4ebd-98ee-6441baedf7e0_2024205.json - jungle - tutorial

66c2b794-b3b7-4d8a-a9fa-0e3338931eab_2024207.json - urban - tutorial

c23df057-ab53-4276-b791-ebe38c431df7_2024217.json - sub - tutorial

12988238-24b6-4ac2-9f33-398321e82ae0_2024214.json - urban - I will have to take a second look at this one as well, it looks alright

56d09dbd-9f4b-41e8-bfb3-143832e61d94_2024215.json - urban - tutorial

9d0a4033-4482-4a17-833b-e410bf0d178e_2024220.json - urban - tutorial

0c0cc880-b3fb-488d-a468-0e67c17ca176_2024214.json - urban - action list empty, no recorded actions

So mostly the ones that are erroring all the way through are tutorial scenarios and should be removed from the dataset (or I can throw in something to look for "tutorial" and ignore it if it's there), or ones where no actions were taken. There are only 2 that I can look at to try to get slightly better matches from them. Below are the participant ids and the environments I have data for. asterisks next to an environment name means that's the data I have to look at to see why we aren't finding matches. Asterisks next to a participant id means they do not have a full set of 2 environments worth of data. So I'm counting 14 full sets of participant data.

2024215 sub jungle
2024217 desert urban
2024201 desert urban
2024202 urban desert
2024206 urban desert
2024216 sub jungle
2024222 desert urban
2024203 sub jungle
2024207 jungle sub
2024219 jungle sub
2024221 urban desert
*2024205 urban
*2024209 desert
2024218 urban desert
2024223 jungle sub
*2024208 sub
2024212 sub jungle()
2024220 sub jungle
2024214 urban urban()
*2024204 sub (missing adept) jungle

brianpippin · 2024-03-28T12:00:04Z

Alright so I think we should definitely add something to protect against the tutorial (just to future proof this script when we move it to AWS), but also remove them from this dataset as well in case we want to give it to Big Bear.

The other thing we want to try to do is figure out what the three json files without participant ids are and if we can figure out who they belong to. I am going to work on that detective work this morning.

I want to say this is really awesome, and am looking forward to get the human data in the database. I haven't looked at the code that you added to mongo, but I want to make sure we add fields for each of these "evalNumber" and "evalName", for this time evalNumber=3, evalName="Metrics Evaluation" if you already added those, cool, if not we can put them on the root level

brianpippin · 2024-03-28T12:31:43Z

Alright here is my guess for the 'unknown json id' looking at timestamps let me know if you agree:

aecfcd56-2262-40a8-9bb8-088f57d46f3f_ = 2024206
de6d297c-23d6-4f85-a873-f48e90b01542_ = 2024208

The others I couldn't find a match for. I am also going to go dig through the zips to see if I missed anything. Let me know what you think about those two though?

brianpippin · 2024-03-28T12:38:54Z

Sorry for all the comments. Also for 2024222 I do see two files in the JSON that looked good to me, and I don't see either of them listed in the errors list you had. Could you check that one to see if we have a full data set for that one? I am just trying to see if we can get to 15 full data sets.

kaitlyn-sharo · 2024-03-28T13:27:57Z

Alright here is my guess for the 'unknown json id' looking at timestamps let me know if you agree:

aecfcd56-2262-40a8-9bb8-088f57d46f3f_ = 2024206 de6d297c-23d6-4f85-a873-f48e90b01542_ = 2024208

The others I couldn't find a match for. I am also going to go dig through the zips to see if I missed anything. Let me know what you think about those two though?

2024208's file does not have any actions, 2024206's is great! only one missing match, as usual for soartech scenarios. The missing 2024222 file is also good, with only one missing match. Updating comment with full list of valid ids & scenarios accordingly.

Also, none of the others have valuable data in them

brianpippin · 2024-03-28T15:35:34Z

So looking at the records in mongo, I think maybe just adding a field called "scenario_id" on the same spot you have "ta1" and "env" with that ID would make looking up and linking the ADM/Human data easier. Everything worked great though!

kaitlyn-sharo · 2024-03-28T17:11:35Z

Moved to ingest repo

kaitlyn-sharo added 3 commits March 21, 2024 11:21

soartech probe matcher second attempt

08590ae

finished soartech and adept probe matching

62ccf82

added documentation

48f647d

kaitlyn-sharo requested review from brianpippin, nextcen-dgemoets, jaudick, phile-caci, dereknop and joshuale-caci March 21, 2024 20:32

kaitlyn-sharo added 2 commits March 22, 2024 12:55

swapped patients x and w due to mismatch between yaml and json

ac6e499

send data to mongo

8251a25

kaitlyn-sharo added 2 commits March 28, 2024 10:44

added clean dataset and additional mongo data

f255466

fixed re-run errors; added timestamp to mongo

0b0762f

added scenario id from yaml to mongo

dbbd03b

kaitlyn-sharo mentioned this pull request Mar 28, 2024

initial commit with probe matcher script NextCenturyCorporation/itm-ingest#1

Merged

kaitlyn-sharo closed this Mar 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Probe matcher #22

Probe matcher #22

kaitlyn-sharo commented Mar 21, 2024 •

edited

Loading

brianpippin commented Mar 27, 2024

kaitlyn-sharo commented Mar 28, 2024 •

edited

Loading

brianpippin commented Mar 28, 2024

brianpippin commented Mar 28, 2024

brianpippin commented Mar 28, 2024

kaitlyn-sharo commented Mar 28, 2024 •

edited

Loading

brianpippin commented Mar 28, 2024

kaitlyn-sharo commented Mar 28, 2024

Probe matcher #22

Probe matcher #22

Conversation

kaitlyn-sharo commented Mar 21, 2024 • edited Loading

brianpippin commented Mar 27, 2024

kaitlyn-sharo commented Mar 28, 2024 • edited Loading

brianpippin commented Mar 28, 2024

brianpippin commented Mar 28, 2024

brianpippin commented Mar 28, 2024

kaitlyn-sharo commented Mar 28, 2024 • edited Loading

brianpippin commented Mar 28, 2024

kaitlyn-sharo commented Mar 28, 2024

kaitlyn-sharo commented Mar 21, 2024 •

edited

Loading

kaitlyn-sharo commented Mar 28, 2024 •

edited

Loading

kaitlyn-sharo commented Mar 28, 2024 •

edited

Loading