As a UX Research intern for Tyler Technologies, I was tasked with conducting a usability study on a prototype for a records management system (RMS) to support Fire Agencies. This product was moving from native software to a browser-based web application and thus had never engaged with UX in the company.
The goals for this research include:
- Test the navigation workflow of the new design
- Better understand the mental models associated with the visual and interaction design (e.g. how information is presented and what users must do to see it).
WHAT I DID
Planned and designed the usability test, including selecting appropriate participants, creating test scenarios, and identifying key metrics to measure usability.
Conducted the usability test remotely and provided clear instructions to participants on what they were expected to do.
Observed the participants as they interacted with the product or prototype, noting any issues or points of confusion.
Gathered feedback from participants on their experience using the product or prototype, either through surveys or interviews.
Analyzed the data collected from the usability test, identifying patterns and trends in participant behavior and feedback.
Reported the usability test results to the relevant stakeholders, highlighting areas of success and improvement.
With goals clarified, I transformed them into the following research questions:
- Can the user complete a report with the prototype?
- Can the user navigate with the stepper?
- Is the dynamic form builder discoverable or learnable?
I then planned a remote usability test around 5 tasks related to completing an incident report:
- Begin a new report
- Write a report
- Error management
- Utilizing the stepper to navigate through the report
- Submit a completed report
I facilitated the remote tests through Maze, a service designed for usability testing. Maze provided heat maps, misclick rates, and time on task.
For qualitative data analysis, Dovetail was used for generating insights out of the testing transcripts.
metrics and scoring
To quantify the success rate, I used success criteria scoring. Breaking each task down into steps and scored user performance on each step of each task. Participants could receive 1 of 3 scores. If they completed the step without any issues, they received a 1. If they didn’t need help but struggled, they received a 0. If they failed in the attempt or I had to step in to help them, they received a -1.
To better understand where users struggled, I then calculated the differential (sum of scores minus count of scores) on a given step.
From the SCS chart above, we can see exactly where test participants struggled and where they had no trouble.
I filtered the data further to outline the specific opportunity areas in which to prioritize improvements for future iterations. These areas included the steps in which participants had the least success (differential score of -4 to -5).
The beauty of success criteria scoring is that I was also able to use this data to calculate the success rate with Jakob Nielsen’s (2001) formula: (S+(P*0.5))/O with S = sum of 1’s and P = sum of 0’s. The variable O in this formula indicates the number of possible scores or opportunities.
Ease of use
I opted for the single ease question (SEQ) to quantify ease of use. After 3/5 tasks (Begin incident report, Complete report, Submit report), I asked users on a scale of 0–6, with 0 being very difficult and six being very easy, how difficult or easy this task was to complete. Since we have no personal benchmark from previous usability tests with which to compare our scores, we reference the historical average of 5.5 (Sauro, 2012).
As we can see from the chart above, our first task scored the worst in terms of ease of use, with an average of 3.33. Although participants struggled just as much with completing and submitting the report, they did not view these system aspects as difficult. Completing a report received an average SEQ score of 5, and submitting the report received a historical average of 5.5.
UX and usability
You can’t adequately conduct a usability test unless you are testing for usability. Various industry-recognized usability scoring methods exist, but the standard is still the System Usability Scale. This is a 10-question survey given after a test, and the responses are then aggregated into a SUS score. The average SUS score from years of historical data is 68 (Sauro, 2013).
However, a 10-question survey is too much to expect good feedback from participants at the end of a usability test. Instead, researchers have developed the Usability Metric for User Experience (UMUX). This is a 5-question survey developed as a more efficient means of generating a similar result. Yet, researchers at IBM went even further, researching the efficacy of the 5-question survey (Lewis, Utesch, & Maher, 2013). What they determined is that they could garner a similar feedback score by simply asking participants to rate their level of agreement with two positively framed UMUX statements:
This system’s capabilities meet my requirements.
This system is easy to use.
Can the user complete a report with the prototype?
8/8 the participants completed a report with the prototype.
Can the user navigate with the stepper?
50% of participants had some degree of struggle. One participant needed help. All participants agreed the stepper was learnable. Thus, the lack of predictability isn’t a cause for alarm.
Is the dynamic form builder discoverable or learnable?
88% (7/8) participants had no issue locating the form-builder.
Overall, the prototype was successful enough to go live. My primary recommendation from this study was to incorporate an autosave feature so users could pick up where they left off. I also recommended that the team put more effort into conducting usability tests with firefighters that create and file reports. A limitation of this study centers on the panel review process.
Instead of firefighters who spend time in the field, Tyler Technologies utilizes Fire Chiefs and other administrative personnel to provide feedback. This is because it’s ultimately up to these administrators to sign off on a particular design. However, given that they are not the actual end-users of the Fire RMS system from the standpoint from which I was testing, the test results may not be generalizable across the broader user group.
Ultimately, given the practicality and familiarity of Google’s Material design system upon which Tyler Technologies bases it’s own design system, we were able to express confidence in the new design, backed by user data, to launch on-time and budget.
Read more about the test metrics and analysis in my article over at UX Collective: “UX Scorecards: Quantifying and Communicating the User Experience“
Leave a Reply