There isn’t a usability thermometer. And if there were one, what would it measure? Unlike temperature, usability is not a property of a person or thing.

The term usability has certainly been in the vernacular for decades, and people seem to know when something isn’t usable.

But what exactly does it mean to someone whose job is to improve usability?

To properly measure and manage the usability of interfaces, we need to first agree on a definition of usability.

Through the 1980s and 1990s, the field struggled to find a definition. For example:

1985: “One of the most important issues is that there is, as yet, no generally agreed definition of usability and its measurement.” (Shackel, p. 17)

1998: “Attempts to derive a clear and crisp definition of usability can be aptly compared to attempts to nail a blob of Jell-O to the wall.” (Gray & Salzman, p. 242) [pdf].

This need for a shared definition of usability led the International Organization for Standardization (ISO) to publish the first edition of ISO 9241 Part 11 in 1998. In this article, we cover where ISO 9241 Part 11 came from and where it seems to be going.

Early History of Usability

The term usability existed well before the 1980s; for example, it appeared as a key feature in a refrigerator advertisement in the Palm Beach Post on March 8, 1936.

The modern industrial use of the term started in the late 1970s. Related terms from that time were user friendliness and ease of use, both of which usability has since displaced in professional and technical writing on the topic.

There are two major conceptions of usability: measurement-based goals (summative evaluation) versus detection and elimination of problems (formative or diagnostic evaluations).

It’s one thing to define a concept such as usability; it’s quite another to get a definition embodied in an international standard. Where did the defining begin?

Before ISO there was MUSiC

Before the ISO 9241 Part 11 standard was formally published, there was MUSiC. Starting in the early 1990s, the goal of the European MUSiC project was to develop specific usability measurement methods for the high-level constructs of effectiveness, efficiency, and satisfaction.

The literature is not consistent regarding the source words of the acronym, sometimes using “Measuring the Usability of Systems in Context” and sometimes “Metrics for Usability Standards in Computing.”

Attempts to measure these constructs did not start with MUSiC. Systematic collection of task completion rates, task completion times, and (sometimes) satisfaction had been common in industrial usability testing since the early 1980s.

IBM conducted an internal competitive usability testing project named SUMS (System Usability MetricS) in the late 1980s that collected usability benchmarks of success rates, completion times, and satisfaction for three office software suites, with the SUMS data used to develop one of the first standardized usability questionnaires, the PSSUQ.

The MUSiC project was, however, one of the first, if not the first, comprehensive public investigations into the systematic collection of what we now think of as prototypical usability metrics. This empirical work was conducted to support the development of ISO 9241-11, which had its first draft in 1988.

One of the early work efforts in the MUSiC project (Rengger, 1991) was the production of a list of potential usability measurements. The list was based on 87 papers that had described some quantification of usability, with the measures categorized into four classes:

Class 1: Goal achievement indicators (such as success rate and accuracy)

Class 2: Work rate indicators (such as speed and efficiency)

Class 3: Operability indicators (such as error rate and function usage)

Class 4: Knowledge acquisition indicators (such as learnability and learning rate)

These objective usability metrics are familiar to most modern usability and UX practitioners, but note the absence of any subjective metrics. Using this as a starting point, the MUSiC usability measures proposed in 1994 included

Effectiveness: Measures related to the accuracy and completeness with which task goals are achieved. For example, if the task is to transcribe a document into a specified format, effectiveness measures would include transcription accuracy, number of deviations from the specified format, and completeness of the transcription.

Efficiency: Measures related to the expenditure of mental or physical resources. Task time is one such measure, as are those that combine task time (or another measure of effort) with effectiveness.

Satisfaction: Measures of perceived usability and acceptability, including direct measures from the SUMI standardized usability questionnaire or indirect measures derived from ratios of positive and negative user comments.

From these fundamental measures, additional measures were derived, such as learnability (changes in usability over time, or comparisons of experienced and inexperienced users) and breakdowns of task time into productive and unproductive periods. Recommended measures of workload included the NASA Task Load Index (TLX) and the Subjective Mental Effort Questionnaire (SMEQ).

Usability Becomes a Standard in 1998

During the course of the MUSiC project, Part 11 of ISO 9241 had gone through numerous drafts, with the first edition published on March 15, 1998. Much of the standard’s text on the topic of measurement was taken from the MUSiC publications, but there were some differences.

The measures of overall usability continued to be effectiveness, efficiency, and satisfaction, with the following examples:

Effectiveness: Percentage of goals achieved; percentage of users completing a task; average accuracy of completed tasks

Efficiency: Time to complete a task; tasks completed per unit time; monetary costs of performing the task

Satisfaction: Rating scale for satisfaction; frequency of discretionary use; frequency of complaints

The standard also included a list of modifications of these key measures for special situations, including:

Meet needs of trained users

Meet needs to walk up and use

Meet needs for infrequent or intermittent use

Minimization of support requirements

Learnability

Error tolerance

Legibility

The standard included workload as a subcategory of efficiency measurement but did not make any specific recommendations. The list of recommended standardized questionnaires continued to include the SUMI but was expanded to include the PSSUQ and QUIS—notably absent was any recommendation for what was the most widely used questionnaire, the SUS, although that might be due to its relatively late publication in 1996.

The 2018 Revision and Related Standards

It’s ISO’s practice to review standards every five years to see whether they require revision. After remaining unrevised for two decades, ISO 9241-11:1998 was withdrawn and replaced with ISO 9241-11:2018, thirty years after its first draft. Bevan et al. (2016) described the new version, noting it retained earlier concepts but was extended to include systems and services. Other key changes included

Consideration of a wider range of goals, including personal and organizational outcomes.

Addition of potential negative consequences of use (e.g., health, safety, security, privacy, and trust issues).

Clarification of satisfaction to include a wider range of issues.

The most surprising change in the revised draft was that it no longer provided specific guidance on how to measure usability. In 2009, Nigel Bevan, a central figure in the development of ISO usability standards, wrote an essay for the Journal of Usability Studies that provided some insight on the changes that were to come.

In 2001, the American National Standards Institute (ANSI) published the Common Industry Format for Usability Test Reports (CIF), which included information from ISO 9241-11:1998 on how to measure usability (and notably adding the SUS to the list of standardized usability questionnaires). In 2006, ISO adopted the CIF (ISO/IEC 25066, updated in 2016) and in 2016 published a standard for the specification of a broad set of measures, including effectiveness, efficiency, and satisfaction (ISO/IEC 25022), so the specification of how to measure usability hasn’t gone away from the ISO standards; it’s just been moved.

Note on Nigel Bevan

As mentioned above, Nigel Bevan was an important leader in the development of the ISO usability standards, working on them for almost four decades before his untimely death in 2018. We both knew Nigel and appreciated his dedication to the usability/UX community. In 2018, a special issue of the Journal of Usability Studies was dedicated to him. We encourage you to read about his remarkable life and career.

Summary

The term usability has been in general use for close to a century but wasn’t officially codified into a standard until the late 1990s. The original definition of usability was embodied in ISO 9241 Part 11 as a combination of effectiveness, efficiency, and satisfaction. The ISO standard was heavily influenced by the European MUSiC project, which was a publicly funded investigation into the systematic collection of summative usability data, both objective and subjective.