Reliable research

Dan Seward
January 28 2011

Happy New Year! We hope this newsletter finds you well - we're back after a long hiatus. Our New Year's resolution: bring you a few more Peak Newsabilities in 2011.

The first article of 2011 discusses language choice in research construction - a task certainly fraught with peril, but manageable for the conscientious.

(short on time? skip to article summary)

We all know that words have power to persuade, influence, and emotionally impact people. Good copywriters build great ads - the word "Priceless" built one of the greatest and longest running marketing campaigns in history for Mastercard. We all (hopefully!) pay close attention to the content on our websites and carefully choose words to help our sites accomplish our goals. But it's easy to forget about the little details of wording as we build our research test plans, stimuli, and surveys. This article gives a strong reminder to pay attention to the words you use in research, as well as some practical research-based advice on how to build and conduct research.

Strong words push people away

Quantitative usability guru Jeff Sauro recently put up an informative blog article about the wording of questions in the System Usability Scale (SUS) - a common post-test survey used to score the overall usability of websites and software. The SUS asks people to rate their agreement with evaluative statements (i.e "I think that I would like to use this website frequently") on a Likert scale (example PDF). Jeff and his colleagues asked a bunch of people to rate a website using the SUS, but the participants received different versions of the questionnaire. A third of the participants received a standard version, while the remaining two thirds received a version with either a strong positive slant to the questions asked (i.e. "I think that this is one of my all-time favorite web sites").

What happened was interesting - people rated away from agreement with these emotionally-charged statements. In other words, people rated the site more negatively when they got a bunch of positive statements, and more positively when they got a bunch of negative statements. And they did this in a significant fashion. The lesson here is that people don't like taking extreme positions.

Implication: When gathering customer feedback via scales, understand that small variations in degree of commitment can have a large impact on your data. Users are likely to shy away from strong statements of opinion.

The mutability of memory and impact of descriptive words

Human memory - including recollections of experiences - is highly mutable, and suggestible. A cool article referencing a classic psychology investigation into suggestibility got us thinking about how using the correct words at key points in an interaction can substantially impact impressions on people. In this experiment, participants were shown a photograph of a car accident along with descriptive text. One cohort of participants read a description that the cars had "hit" each other, and one cohort read an identical description with one word changed - the cars had "smashed" each another. A week later, participants were asked to remember the details of the picture. In the "smashed" condition, significantly more people remembered seeing shattered glass (there was none) and overestimated the speed at which the accident had happened. What's noteworthy here isn't that people interpreted "smashed" to be more serious than "hit", but that changing this one word affected the way people recalled their experience of viewing the picture 140+ hours after the fact.

It's almost frightening how such a tiny change to the experiment could have a statistically significant effect on the way people in the experiment reacted to simple questions after so much time had passed. This really drives home the point that word choice is exceptionally important in experiment design.

Implication: The choice of descriptive words in task construction, and even the words moderators speak to participants, can influence the way people respond in a usability test. Avoid adjectives unless they're necessary to avoid influencing participants.

Accuracy and consistency in test and survey construction

Here's an interesting blog entry illustrating the dangers of inaccurate and inconsistent wording in survey questions. A Pew Research exercise was meant to track the frequency of use of online dating sites - in 2005 respondents were asked "Have you ever gone to an online dating website or other site where you can meet people online?" In 2009 the study was repeated, but respondents were asked "Please tell me if you ever use the internet to do any of the following things. Do you ever use the internet to… Use an online dating site. Did you happen to do this yesterday, or not?" Not surprisingly, the results were dramatically different - in 2005, 40% of young adults replied that they had used dating sites, but in 2009 only 10% had used dating sites… yesterday.

This is a dramatic example of poor research design, but it serves to illustrate an important point: changes in wording will change response. When conducting any sort of usability benchmarking research, consistency is really important.

We need to be intelligent about which variables we change in our research. If looking for true benchmarking results, don't change wording of instructions or questions between rounds of testing. To be able to draw valid conclusions about causality in our benchmarks, we have to be disciplined about how we vary these stimuli. Testing a web page with changed content wording AND navigational labels AND graphical layout is OK - you can still measure task success and subjective impressions - but because all of these variables were changed, we won't be able to identify the cause of the effect.

  • Implication 1: Changes in wording of research questions and instructions - even small changes - will yield different outputs, making benchmark results less valid.
  • Implication 2: Changes in wording of research stimuli - content, but also navigational items - will also yield different outputs, making straight comparisons invalid.

In summary...

For those who don't have the time to read this whole article (it was kinda long), what we're saying is:

  • Just as we have a responsibility to think through the content and presentation of the websites we design, we also have an obligation to be conscientious about the way we build our research tasks and activities.
  • Research participants will avoid making strong opinion statements, so build and evaluate your questions and survey scales accordingly. When using strong language, expect cautious results.
  • Descriptive words have the power to influence participant responses, even after long periods of time. Choose the words you use in your tasks and verbal questions carefully!
  • For benchmarking studies, changing the wording of questions will render comparisons less valid.
  • Changes in wording of research stimuli - content and navigational items - will also yield different outputs.

Usability tip

Bulleted lists are easy to read and quickly communicate key concepts

Here at Peak we commonly test information-heavy sites for findability of particular pieces of data. We all know that people scan pages to find information of interest, which can be assisted by good headers. But sometimes just locating information isn't enough - heavy paragraphs of text can be hard to process. Often it's simpler to digest a list of dot points.

Putting together a bullet list of key information can help you increase the readability of your site and also help remind you what your site is about at a fundamental level. Opt for bullet lists instead of paragraphs when key concepts can realistically be communicated in one pithy sentence (as per the article summary above).

Categories: User research