In a small team, I sent out a survey to most students taking redesigned courses at the largest university system in the United States, California State University, to look at frequency and effectiveness of technology use. I’ve included some general tips for survey analysis.
|
💡
|
Tools used: SurveyGizmo, Excel, SPSS, QDAMiner, Nvivo, Tableau |
I analyzed and helped create surveys to evaluate a system-wide course redesign with technology effort. The Course Redesign with Technology (CRT) Team supports the CSU 2025 Graduation Initiative through redesigning high- demand, high failure-rate courses and improving access to these courses. CRT funds a comprehensive program assisting faculty across the CSU system to redesign their courses for student success. At the end of the term in which the redesigned course is offered, CRT collects student survey data to measure which teaching and learning tools have been used and their perceived impact on student learning, student opinions on various course dimensions, and whether student confidence levels change as a result of completing a redesigned course.
To participate in the CRT program, CSU faculty members outline their ideas for changes in the courses they teach in a request for proposal process. CRT encourages them to adopt and adapt "proven practices" that faculty in previous cohorts established using narrative ePortfolios about their redesign processes and outcomes. Faculty selected to participate in CRT are funded and attend the Summer Institute where they learn about new technologies, discuss various pedagogical approaches, and develop learning communities focused on a specific discipline or technology. These courses are then offered to students in the fall and spring who are asked to provide feedback about their experiences through a survey.
-
Which activities and technologies do students use?
-
How frequently do students use them?
-
Which do they perceive to be effective?
-
Do students rate their course highly across several dimensions?
-
Does a student’s perceived confidence level change by the end of the redesigned course?
-
How does re-offering a redesign affect the outcomes?
-
How does delivery format affect the outcomes?
I determined these goals from the questions the team had already decided to ask, the information available, and the analyses possible, and a vote I elicited from the executive team, testing additional emergent hypotheses as they came up.
The CRT team requested the course ID from faculty who delivered a redesigned course in spring 2017. Using this ID, they acquired through an internal system-wide process the emails of most of the target population. They emailed 3607 students a link to the survey and received 610 responses. Each student received a specific single-use URL to ensure that a survey can only be completed once by each student. The course survey was deployed May 1st, before final exams. The survey was active for 40 days. Instructors were asked to encourage their students to participate in the survey, and students were incentivized to complete it by having their names entered in a prize raffle. One hundred fifty-four incomplete submissions were removed, leaving 456 complete responses. We did this to simplify the analysis. This survey received a response rate of 16.91%.
|
ℹ️
|
See csv files for raw data after removing 154 incomplete submissions. 25 rows are provided as an example, but all columns are included. Separate surveys sent out to quarter and semester students produced the two files. All identifying information has been removed or anonymized including instructor names and classes. |
See the student survey word doc for an explanation of most of the columns. I combined the quarter and semester sheets, creating a column for the variable, to produce a single sheet with 144 columns and 458 rows, then removed test data leftover from testing the survey. A separate version of the survey was sent out to each school so students could select their course from a shorter list. SurveyGizmo spit out a separate column for each university’s list of redesigned courses in the sample (less than 10 per). As each student should only have been allowed to select and review one course, I collapsed all these columns into one. Unfortunately, in a previous iteration of this survey, we did not test for this adequately.
|
🔥
|
Test your survey like a maniac! Try and select everything, and if you are able to select mutually exclusive boxes, like 'agree' and 'disagree', change your survey settings. I see this mistake all the time. |
Next I created some new independent variables for the analysis.
| variable | |
|---|---|
|
additional step for keeping track of rows |
|
pulled from our internal data. 'first-time-offered' for a brand new redesigned course or 're-offered' for subsequent iterations |
|
internal course redesign classifications. 'proven adopting' for redesigns following an approved model 'promising' for experimental redesigns 'sustaining' for a repeated effort or 'virtual labs' for redesigns featuring online labwork to increase access to courses requiring labs |
|
broader logical categories I created for the courses: 'Engineering' for IE 2A, 'Computer science' for CS 101, and so on |
Most of these changes required sorting the rows, which sometimes causes excel to splice a row, reordering some columns but not others, and corrupting the data. Since I save the file at each step of the process, I was able to frequently compare random samples of my current file with the raw data to check for consistency. If Jon Chou was suddenly female and making completely different comments than in the raw file, then I would retreat to the latest uncorrupted version. And of course I used several random samples at each check. I liked to think the people who bore deep holes into a wheel of cheese to check for consistency.
Getting to the structure of the survey: Programmed question pairs made up the bulk, so we could expect the sample sizes for each follow-up question to be all over the board. I’m extremely glad someone on the team was willing to program the surveys, since it allowed us to capture a lot of information without asking irrelevant questions and making an already monster survey behemoth. I renamed and numbered all the survey questions, with the paired questions as 12A1, 12A2, 12B1, and 12B2 belonging to question 12, for example.
Then I moved the all the independent variables together in the front for ease of use (facts about the course and student).
|
🔥
|
It is recommended to put demographic and otherwise sensitive survey questions at the end of a survey so as not to scare off the respondent. |
Next, I converted blanks to 'Null' or 'NA' (when N/A was actually selected) to help with analysis, created a separate file without comments for use in SPSS. Another version I shaped for use in Tableau. This is where the real witchcraft comes in.
Tableau requires data to be normalized with one dependent variable’s value and descriptors per row, and everything else in the row repeated. I used an Excel add-in for the actual reshaping. This move rockets the rows in the file up over 31,000, which I found very entertaining. Example:
independent_variable |
last_independent_variable |
Dep_var_ID |
Dep_Ans_Label |
Dep_Ans_Group |
|---|---|---|---|---|
semester |
mostly in-person |
clickers_22A |
Sometimes |
Frequency |
semester |
100% in-person |
clickers_22A |
NA |
Frequency |
And so on for all of the dependent variables.
This requires creating new metadata variables to explain the value (the dependent variable name), to contain the value, and allows you to classify the dependent variables, which I did by the name of the group of questions in the survey.
| variable | |
|---|---|
|
the column name, named after each question, like clickers_22A |
|
The response, ranging from 'Sometimes', to 'Agree', to 'No difference' |
|
Frequency, Efficacy, Confidence, Learning Experience - the types of closed-response answers received by our survey |
I produced visuals of the descriptive statistics, looking for any issues, and moved to the goals.
-
Which activities and technologies do students use?
-
How frequently do students use them?
We provided these options for students to select and room for comments and let them choose between ( ) Always ( ) Often ( ) Sometimes ( ) Rarely ( ) Never ( ) N/A I could have numericized this for a median and interquartile range, but decided the nontechnical audience would better appreciate percentages of respondents for the top two ratings - always and often. This is called top 2 box.
|
🔥
|
Do not generate a mean from a Likert scale. |
|
ℹ️
|
Note the description of how I created this chart in Tableau, which I left in as an example. I don’t crop it until I place it in it’s final display setting since the description of the active filters helps you trace brack your work and avoid mistakes. It also illustrates why I shaped the data with metadata in the previous steps, as it makes viz creation and documentation possible. |
Lots of variation, with instructor lecture the most commonly used learning tool of all. It could be confusing to refer to all of these things as learning tools. I could have instead referred to learning activities and technologies separately, but I’d like to change the conversation around learning and hold all methods to the same accountability: does it work? Plus, is 'instructor audio/video' an activity or a technology? However, it’s not good to perturb one’s stakeholders.
These are simple counts of mentions in the comments done with QDAMiner or Nvivo.
Youtube off the charts! Someone should take note.
-
Which do they perceive to be effective?
They think almost everything is effective. I am emphatic that we refer to this as 'perceived efficacy'. I wondered about a relationship between these two variables and checked:
For 18 of the 22 learning tools (82%), frequency was at least moderately correlated with perceived efficacy. Gamma was used for correlations given that it is appropriate for ordinal by ordinal correlation in a nonparametric, non-random sample. I believe we should probably measure something else next time, since these appear to be related. This should reinforce my assertion that the efficacy is 'perceived' and subject to bias, in this case it could be the familiarity heuristic.
-
Do students rate their course highly across several dimensions?
-
Does a student’s perceived confidence level change by the end of the redesigned course?
-
How does re-offering a redesign affect the outcomes?
Relatively inconclusive. There were not enough re-offered courses to tell.
-
How does delivery format affect the outcomes?
Running this produced a comparison similar to above and showed little promise of a relationship between any of the formats and the outcomes. This could be positive if you were worried about poor quality in online courses.
While the purpose of this post is the analysis process, I may include further commentary on the results here in the future.
In the future I intend to program my analyses in Python and standardize my use of Tableau to produce more consistent visuals. I’m pleased with my level of organization but could still improve my naming conventions. I will also continue improving my statistics knowledge. Grateful for your feedback.









