This use case analyzes anonymized therapy data from a repository of Applied Behavior Analysis sessions and cases to draw insights which can be used by individual families to help understand the effectiveness of their programs.
- Python
- SQL
- Azure Databricks
- Sketch
- sklearn
- matplotlib
- pandas
- numpy
- databricks MLFlow
UI based portal for therapists, parents and clients to track the progress of goals, the journey and the metrics of the session.
- Evaluation of 80% as a set goal for indication of future successes
- Evaluation of the effect of treatment intensity on trial outcome
- Analysis on the correlation of therapist/author based descriptors with outcomes
Find the mean goal length per goal Assessment type and see the deviation per data point
Encoded the gender field and pre processed the age field
To find the total time taken to reach the goal by multiplying the (Count of new sessions graphed (TrialGroup) per day per year) * (Bins of target periods)
We have trained a random forest classifier to predict the final goal status.
Given below is the feature importance distribution from the model and observations are :
- Age has a significant amount of impact on the model.
- Gender too has impact but comes second to age
- Given the other features, we have found that
- TrialTarget Count per trial is another feature of importance.
- Missing values are marked as 0
- Conversion of string columns to datetime for fields containing 'Date'
- Conversion of categorical fields to encoded ones using Pandas Label Encoding
- sessionCount_byGoal_byMonthYear : Count of sessions graphed (TrialGroup) per month per year
- sessionCount_byGoal_byDayYear : Count of sessions graphed (TrialGroup) per day per year
- sessionCount_byGoal_byWeekYear : Count of sessions graphed (TrialGroup) per week per year
- Gender_Encoded : Demographic Data
- Age : Demographic Data, calculated with the difference of TrialDataDate and BirthDateYear
- TrialTargetId_Encoded : IDs of the target Goals
- GoalDomain_Encoded : Encoded goal domain value of Adaptive, Communication and Language
- goalAssessment_encoded : Encoded goal ABA assessment
- encodedTrialPhase : Encoded Trial Phase (Baseline' -1,'Intervention' -2,'Generalization' -3,'Maintenance'-4)
- goalForced_80Percent : The target value of outcome to 80% set goal
- Random Forest
- Decision Tree
-
The initial data analysis shows that the session count descriptors of days and weeks have a negative correlation with the goalForced_80Percent outcome, which can be interpreted as less number of sessions or more spread out sessions increase the chances of a successful outcome whereas a high number of sessions might lead to an unsuccessful outcome.
-
The feature importance from the Supervised Models show that Session Count/ Treatment Intensity descriptors are more useful in predicting a successful outcome than the demographic features that implies that the treatment intensity/ frequency plays a more important role in the result of an outcome when compared to the gender or age of the client.
-
The Goal ID is also a useful feature but has limitations due to high cardinality which implies that if the goals are restricted or categorized, then they will be important in determining the outcome but if they aren't categorical, they might not be a suitable feature for the Machine Learning models due to their high cardinality.
-
Age is a more useful predictor for the outcome of a trial than the goal domain or trial phase which implies that the age of the clients is a contributing factor to the success outcome of the goal.
- Younger clients are more involved in the trials than the older ones.
- Age group from 7-15 is the most active in the trials.
This analysis is done on the basis of distribution of number of observations taken for a trial on a day.
The median number of observations taken is 15. Based on the above distribution, the intensity is chunked into 3 categories:
1. sessioncount_bygoal_bydayyear between 8 and below --> low intensity
2. sessioncount_bygoal_bydayyear between 8 and 25 --> medium intensity
3. sessioncount_bygoal_bydayyear between 25 and above --> high intensity
Further analysis is done on the basis of age groups.
Age groups are divided into 3 categories:
1. 0-15
2. 15-25
3. 25 and above
Based on the above categories, further analysis is done for understanding how the graphing intensity affects the success of the trials.
For 0-15 age group, the success of the trials is higher for higher intensity graphing method while for low and medium intensity, the success rate is only slightly higher than unsuccessful trials
For 16-25 age group, most of the clients are faring well in all the categories of graphing intensity with a good success rate. It must be highlighted that the high intensity graphing method is helping perform the best among the others.
For 26 and above age group, more than 50% of the clients are performing well in all the 3 categories of the graphing intesity. But the graphing intensity is not much helpful in considering which method is best for this age group.
Higher intensity graphing method showed better results for the age groups between 0-15 and 16-25 but the graphing intensity did not affect much for the success of the trials for the older age group (26 and above)