As the number of responses to a survey increase, average values become more accurate, with errors and falsifications cancelling out. However, extreme (maximum/minimum) values turn unreliable, as mistakes and trolls become almost certain.
So, how do we solve this when, for instance, polling applicants to determine cut-off scores for university courses? Besides their score and alloted branch, have them also sumbit the branches they were denied. For each branch, at each possible score, tally the reported acceptances (
Idealized graph of the tallied acceptances and rejecttions at each score, for a particular course.
The y-axis represents the tally and the x-axis the exam score. 280 is the cut-off for this particular course.
The key property of the cut-off score is that there are a lot of positive y-values to its right (let their sum be
Now let the scores be
When you go from
Here's a survey for the institute BITS Pilani, India, which has three campuses – Pilani, Goa, and Hyderabad, and where admission is via the BITSAT examination. The survey contains three quesions:
- What is your final moderated BITSAT score?
- Which campus and branch were you assigned? Skip this question if rejected or waitlisted.
- In the preference form, which courses did you place ABOVE your allocated one?
The survey is conducted via Google Forms, which lets you export the results as a .csv
file. The file may come zipped, in which case unzip it, and run the Python script cutoffs.py
in the same directory. I.e., download it to the same folder as the CSV file, open a terminal there, and run python cutoffs.py
.