A prototype application for healthcare analysis.
A CAPSTONE project in the Master of Engineering program at the University of California, Berkeley
- Landscape
- Database Schema
- Preprocessing
- Architectures and Results
4.1. Lung Disease Prediction
4.2. Brain Cancer Prediction
4.3. Alzheimer Prediction
4.4. CNN Model Scores
4.5. Gene Test
4.6. Phenotype Model - High-Level Prototype in Figma
- User Manual
- Team
The healthcare industry is shifting rapidly with AI and machine learning driving demand for precision medicine and personalized care. Providers face pressure to reduce errors, optimize workflows, and improve outcomes, while startups challenge traditional methods. Complex medical data remains underutilized without efficient analysis tools. This application bridges the gap, offering actionable insights to enhance clinicians' decision-making. It augments, NOT replaces, doctors by providing predictions based on patient’s data to improve accuracy and reduce false positives. Unlike generalized databases, it delivers specialized insights tailored to clinical needs.
To construct the database effectively afterwards, a database schema has been drawn so that code implementation can be done by referencing the schema diagram.
Figure 1: Database schema of the user records
Arrows, together with a star at one end in the figure 1, indicate that there is a one-to-many relationship between each test and user. For example, while each user can have multiple brain tests, one brain test can only belong to one user.
Figure 2: Feature sets of phenotype model
In addition, the amount of data used in each class in different MRI and X-Ray models:
Figure 3: The amount of data used to build the model for each disease prediction.
In figure 2, we can see the hierarchical view of the database. Based on each disease, the fields were
separated. The leaf nodes indicate the prefixes of the fields' identification codes. The same
fields might be related to many diseases.
Key datasets used in our analysis include:
● DEMO (Demographic Data)
● BMX (Body Measures)
● BPX (Blood Pressure)
● LBX (Laboratory Values: Glucose, Insulin, HbA1c)
● DIQ (Diabetes Questionnaire)
These datasets provided variables such as age, gender, race/ethnicity, BMI, fasting glucose,
insulin levels, blood pressure, and HbA1c—all of which are established markers relevant to
diabetes risk.
For each dataset, since the class distributions were varying, their weights should be generated
so that the model can predict each class equally. If one class has more data than the other, then
the model will predict that class more than the other ones. That means there will be a bias
towards the classes with more training data.
Figure 4: Class frequencies of every CNN model. It shows a high variation, which can cause biased predictions
In order to solve this problem, a normalizer formula has been used to find the normalized
weights. Also, because each image is processed independently, instead of processing them in one
thread (the normal program flow), we can use multiple threads to process them in a parallel way.
By doing that, we reduced the time spent on preprocessing by 83%.
Figure 5: The amount of time spent in preprocessing images with and without threads.
Figure 6: The model architecture for the brain cancer prediction
Figure 7: The architecture of the model for the Alzheimer prediction
The model architecture used in the Alzheimer's prediction model has some differences. It is
because of the nature of the data. Since the size of the data is big, the model should be less
complex so that it can learn the trends instead of memorizing the data.
Figure 8: The model architecture for the lung disease prediction
The model architecture for lung disease prediction is almost the same architecture used in the
brain tumor prediction model. The only difference is the last fully connected layer, which, in the
case of the lung disease model, contains 3 nodes instead of 4 because of the possible number
of outcomes.
Each different model has been trained multiple times in order to get the highest accuracy. Test and training data were separated so that at the end of the training, the model was able to test with the data that it hadn’t seen before. By doing that, the model’s generalizability is tested. Test results of the three image-based models are as follows:
Figure 9: The CNN model's test results
Simulated Polygenic Risk Scores (PRS) added approximately 12% improvement in
AUC (from 0.75 to 0.84), confirming the added predictive power of genetic information.
● The TCF7L2 variant (Transcription Factor 7-Like 2), a well-established diabetes risk
gene, was identified as significantly associated with increased risk, showing a 1.4x
higher risk in modeled populations.
● Additional key genes identified via differential expression analysis included:
○ INSR (insulin receptor): central to the insulin signaling pathway
○ IRS1 (insulin receptor substrate 1): modulates insulin response
○ PPARG (peroxisome proliferator-activated receptor gamma): involved in
adipocyte differentiation and glucose metabolism
○ SLC2A4 (GLUT4): glucose transporter gene regulating cellular uptake
● KEGG enrichment analysis revealed overrepresentation of insulin signaling, AMPK
pathway, and type 2 diabetes mellitus pathways, further supporting biological relevance.
Figure 10: The percentage of each target gene
Some individuals with low-risk clinical profiles were predicted as high-risk due to
their genetic load, underscoring the importance of genomic screening.
Ethnic disparities were observed in healthcare access and risk exposure, with
certain minority populations showing underrepresentation in available genetic
reference data, which may affect risk calibration.
Figure 11: The AUC curve of each target gene
Figure 12: ROC curves comparing classification performance across different machine learning models
This plot shows the Receiver Operating Characteristic (ROC) curves, which visualize the
trade-off between the True Positive Rate (TPR: also known as sensitivity or recall, measures the
proportion of actual positives correctly identified by the model.) and False Positive Rate (FPR:
measures the proportion of actual negatives that are incorrectly classified as positive by the
model.) for different classification thresholds.
● Some individuals with non-obese BMI and normal glucose levels still exhibited
high predicted risk due to the presence of multiple coexisting social and clinical
risk indicators.
● Ethnic disparities were observed, particularly among Mexican American and
non-Hispanic Black subgroups, where risks were elevated even after adjusting
for lifestyle factors. This highlights a potential intersection of genetic susceptibility
and healthcare access.
Since after starting the implementation of the program, changing the design will be challenging, a Figma prototype has been developed in order to decide on the application’s user interface. After carefully considering human-computer interaction heuristics, we decided to use the following user interface:
Figure 13: The first Figma prototype of the desktop application models
The real program is ready to use, by cloning the directory in your local and executing the test_panels.py file in the App/main/ directory, you can execute it.
...User Manual is Coming...
Cagin Tunc: UC Berkeley, Master of Engineering / Bioengineering
Haoyu Zhao: UC Berkeley, Master of Engineering / Bioengineering
Bikramjeet Singh: UC Berkeley, Master of Engineering / Industrial Engineering and Operations Research
Shuo Li: UC Berkeley, Master of Engineering / Bioengineering
Jiachen Xi: UC Berkeley, Master of Engineering / Bioengineering












