Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Probe matcher #22

Closed
wants to merge 8 commits into from
Closed

Probe matcher #22

wants to merge 8 commits into from

Conversation

kaitlyn-sharo
Copy link
Contributor

@kaitlyn-sharo kaitlyn-sharo commented Mar 21, 2024

Matches user actions from simulator JSON files to the probes/action-mappings in the yaml files. Maybe this should be in a different repository, but I built it here for now since this is isolated python scripts and I was putting the yaml-to-json script here as well. If it should be somewhere else, let me know and I will move it.

To test, download participant data

Then, run python3 probe-matcher.py -i [path-to-dir-you-just-downloaded]

You should see the data output in a directory called output. Consider hand-checking some of the outputs against the json and yamls to make sure all is good.

For reviewing code, the only applicable file is the probe_matcher.py python script

@brianpippin
Copy link
Contributor

_metrics_eval_human_data.zip

This is all the data I was able to download from confluence and match up the folder structure. I noticed some of the files give a lot of warnings, this might be expected, please review and let me know, how it works for you.

@kaitlyn-sharo
Copy link
Contributor Author

kaitlyn-sharo commented Mar 28, 2024

_metrics_eval_human_data.zip

This is all the data I was able to download from confluence and match up the folder structure. I noticed some of the files give a lot of warnings, this might be expected, please review and let me know, how it works for you.

I downloaded this zip and put it in the format the program needs. I'm seeing 26 files with no errors at all, and 6 others with very few errors that I would expect. This would come out to 32/2 = 16 valid participants, if each participant has 2 valid files and there aren't several participants with only one valid file. Two did not run probably due to some formatting issues, and I'm not 100% sure I have all of the participant data transferred to the right format, but I have most of it. 13 others have way more errors. I've looked at each of the individual files to see why and have listed the file names and reasons below:

7e23cc31-422a-42e1-acb5-964c661750f4_2024215.json - desert - did not take any actions (probably had to restart for some reason once booted up)

92c78261-0f01-4d4f-8ae3-7aa875d6cd7b_2024219.json - urban - running the tutorial scenario

cbbf410f-4657-428e-9616-8a777cc4704d_2024204.json - sub - ended early, before Adept happened

0fb9b0c5-21bb-4f2a-b97f-a01a5e67e492_2024214.json - urban - answered first soartech probe, then quit out, no more actions recorded

2e8f6555-a7fa-4b54-8132-c030d697b4ad_2024212.json - jungle - I'm not sure about this one. it looks good but is having trouble latching onto the soartech probes. I'm going to take another look at it

a1bcac92-c01f-485a-9ce0-8198739b2c27_2024212.json - urban - another tutorial

8cc40587-ff7d-4ebd-98ee-6441baedf7e0_2024205.json - jungle - tutorial

66c2b794-b3b7-4d8a-a9fa-0e3338931eab_2024207.json - urban - tutorial

c23df057-ab53-4276-b791-ebe38c431df7_2024217.json - sub - tutorial

12988238-24b6-4ac2-9f33-398321e82ae0_2024214.json - urban - I will have to take a second look at this one as well, it looks alright

56d09dbd-9f4b-41e8-bfb3-143832e61d94_2024215.json - urban - tutorial

9d0a4033-4482-4a17-833b-e410bf0d178e_2024220.json - urban - tutorial

0c0cc880-b3fb-488d-a468-0e67c17ca176_2024214.json - urban - action list empty, no recorded actions

So mostly the ones that are erroring all the way through are tutorial scenarios and should be removed from the dataset (or I can throw in something to look for "tutorial" and ignore it if it's there), or ones where no actions were taken. There are only 2 that I can look at to try to get slightly better matches from them. Below are the participant ids and the environments I have data for. asterisks next to an environment name means that's the data I have to look at to see why we aren't finding matches. Asterisks next to a participant id means they do not have a full set of 2 environments worth of data. So I'm counting 14 full sets of participant data.

2024215 sub jungle
2024217 desert urban
2024201 desert urban
2024202 urban desert
2024206 urban desert
2024216 sub jungle
2024222 desert urban
2024203 sub jungle
2024207 jungle sub
2024219 jungle sub
2024221 urban desert
*2024205 urban
*2024209 desert
2024218 urban desert
2024223 jungle sub
*2024208 sub
2024212 sub jungle()
2024220 sub jungle
2024214 urban urban()
*2024204 sub (missing adept) jungle

@brianpippin
Copy link
Contributor

Alright so I think we should definitely add something to protect against the tutorial (just to future proof this script when we move it to AWS), but also remove them from this dataset as well in case we want to give it to Big Bear.

The other thing we want to try to do is figure out what the three json files without participant ids are and if we can figure out who they belong to. I am going to work on that detective work this morning.

I want to say this is really awesome, and am looking forward to get the human data in the database. I haven't looked at the code that you added to mongo, but I want to make sure we add fields for each of these "evalNumber" and "evalName", for this time evalNumber=3, evalName="Metrics Evaluation" if you already added those, cool, if not we can put them on the root level

@brianpippin
Copy link
Contributor

Alright here is my guess for the 'unknown json id' looking at timestamps let me know if you agree:

aecfcd56-2262-40a8-9bb8-088f57d46f3f_ = 2024206
de6d297c-23d6-4f85-a873-f48e90b01542_ = 2024208

The others I couldn't find a match for. I am also going to go dig through the zips to see if I missed anything. Let me know what you think about those two though?

@brianpippin
Copy link
Contributor

Sorry for all the comments. Also for 2024222 I do see two files in the JSON that looked good to me, and I don't see either of them listed in the errors list you had. Could you check that one to see if we have a full data set for that one? I am just trying to see if we can get to 15 full data sets.

@kaitlyn-sharo
Copy link
Contributor Author

kaitlyn-sharo commented Mar 28, 2024

Alright here is my guess for the 'unknown json id' looking at timestamps let me know if you agree:

aecfcd56-2262-40a8-9bb8-088f57d46f3f_ = 2024206 de6d297c-23d6-4f85-a873-f48e90b01542_ = 2024208

The others I couldn't find a match for. I am also going to go dig through the zips to see if I missed anything. Let me know what you think about those two though?

2024208's file does not have any actions, 2024206's is great! only one missing match, as usual for soartech scenarios. The missing 2024222 file is also good, with only one missing match. Updating comment with full list of valid ids & scenarios accordingly.

Also, none of the others have valuable data in them

@brianpippin
Copy link
Contributor

So looking at the records in mongo, I think maybe just adding a field called "scenario_id" on the same spot you have "ta1" and "env" with that ID would make looking up and linking the ADM/Human data easier. Everything worked great though!

@kaitlyn-sharo
Copy link
Contributor Author

Moved to ingest repo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants