Skip to main content

Usability Testing Quickstart

Usability testing is the most direct way to see how people actually use your product. This guide helps you run your first test—or improve tests you're already doing.

Last updated: October 2023

What usability testing is (and isn't)

Usability testing means watching representative users attempt realistic tasks with your product (or prototype) while you observe.

What it reveals:

  • Whether users can complete tasks
  • Where users get confused or stuck
  • How users interpret your interface
  • Unexpected behaviors and workarounds

What it doesn't reveal:

  • Whether users want your product
  • Whether users will pay for your product
  • What features users want (necessarily)
  • Statistical significance (with typical sample sizes)

Usability testing complements other research methods—it doesn't replace them.

The basic process

  1. Prepare: Define tasks, recruit participants, set up the environment
  2. Facilitate: Guide participants through tasks while observing
  3. Analyze: Review findings and identify patterns
  4. Report: Share insights and recommendations

Preparing for the test

Define what you're testing

Be specific:

  • Which part of the product?
  • Which user flows?
  • What questions are you trying to answer?

You can't test everything in one session. Focus on the areas with the most uncertainty or the highest stakes.

Create realistic tasks

Tasks should:

  • Reflect real user goals (not feature demonstrations)
  • Be specific enough to observe
  • Avoid revealing the answer in the wording

Weak task: "Use the filter feature to narrow results" Better task: "You're looking for Italian restaurants within 2 miles that are open now. Find some options."

The weak version tells users where to go. The better version gives a realistic goal and lets you see if users find and use the filter.

Plan 5-7 tasks per session

More than that exhausts participants and dilutes focus. Fewer may not provide enough insight. Prioritize ruthlessly—you can always run more sessions.

Write a test script

Your script should include:

  • Introduction (purpose, think-aloud instructions, reassurance)
  • Background questions (brief)
  • Tasks in order
  • Follow-up questions for each task
  • Wrap-up questions
  • Thanks and next steps

Having a script ensures consistency across sessions and keeps you from forgetting important elements.

Think-aloud instructions

Ask participants to say what they're thinking as they work. "Please think out loud—tell me what you're looking at, what you're thinking, what you're trying to do." This gives insight into their mental model, not just their clicks.

Recruit participants

Who to recruit:

  • People who match your actual user profile
  • Mix of characteristics relevant to your product
  • Not colleagues, friends, or family if avoidable

How many:

  • 5 participants often reveals most major issues
  • 6-8 provides more confidence
  • More than 10 rarely justifies the time unless you're testing multiple segments

How to recruit:

  • In-app invitations to existing users
  • Customer lists (with appropriate consent)
  • Recruiting services
  • Community outreach (forums, social media)

Plan for 20-30% no-show/cancel rate.

Set up the environment

You need:

  • A quiet space (or reliable remote tool)
  • The product/prototype ready to test
  • Recording method (with consent)
  • Note-taking system
  • Backup plan for technical issues

During the test

Facilitator role

Your job is to observe, not to help, teach, or defend:

  • Ask the task, then be quiet
  • Don't answer questions about how to use the product
  • Redirect questions: "What would you do if I weren't here?"
  • Observe body language and hesitation, not just success/failure

Note-taker role

If possible, have a separate note-taker. They should capture:

  • Exact quotes
  • Behavioral observations
  • Timestamps of key moments
  • Non-verbal cues

Common facilitator mistakes

Leading participants: "Did you notice the search button?" → This points them to what you wanted them to find.

Helping stuck users: Jumping in to explain. Instead, note that they were stuck and what they tried.

Defending the design: "Well, that feature is supposed to..." → The test isn't about being right.

Talking too much: Every word you say is a word they're not saying. Embrace silence.

When participants struggle

It's uncomfortable to watch someone struggle, but that struggle is data.

If someone is completely stuck:

  1. Note how long they struggled and what they tried
  2. Offer a hint only if needed to continue to other tasks
  3. Mark this task as "required assistance"

Session timing

  • Introduction: 5 minutes
  • Background questions: 5 minutes
  • Tasks: 30-45 minutes
  • Wrap-up: 5-10 minutes

Total: 45-65 minutes. Don't schedule back-to-back—you need buffer for overruns and decompression.

After the test

Analyze while it's fresh

Review notes and recordings soon after sessions. You'll remember context that doesn't make it into notes.

For each task, note:

  • Success/failure
  • Time taken
  • Errors made
  • Confusion points
  • Quotes

Look for patterns

Individual sessions reveal individual experiences. Patterns across sessions reveal design issues.

Focus on:

  • Problems multiple participants encountered
  • Points where participants did something unexpected
  • Confusion that even successful participants expressed

Severity assessment

Not all problems are equal. Assess severity:

Critical: Prevents task completion. Users cannot work around it.

Serious: Significantly impairs task completion. Users can work around it with difficulty.

Minor: Causes confusion or minor delay. Users recover easily.

Cosmetic: Noted by users but doesn't affect task success.

Create actionable recommendations

For each finding, suggest:

  • What to do about it (specific action)
  • Why that would help (connection to observed problem)
  • Priority (based on severity and frequency)

Recommendations without evidence are opinions. Link everything back to what you observed.

Reporting findings

Who needs what

Executives: Summary of major findings, key recommendations, impact

Designers: Detailed findings, video clips, specific recommendations

Developers: Implementation-relevant details, edge cases discovered

Report format

A useful report includes:

  1. Executive summary (1 page)
  2. Methodology (brief: who, how many, what you tested)
  3. Key findings (organized by theme or severity)
  4. Detailed findings (task by task if helpful)
  5. Recommendations (prioritized)
  6. Appendix (screener, script, raw data)

Video clips

Short clips showing problems in action are worth pages of description. Edit clips to 30-60 seconds showing the moment of confusion.

What if users keep asking me questions during the test?

Redirect with 'What do you think?' or 'What would you try?' If they ask about your intentions, defer: 'I'm interested in your experience. I can answer questions after we're done.'

Can I test with colleagues if I can't recruit real users?

It's better than nothing, but expect biased results. Colleagues know too much about the product and company context. Use them for catching obvious issues, not for validating that the design works.

How do I test something that isn't built yet?

Prototypes. They can range from paper sketches to clickable mockups to functional prototypes. Test with the lowest-fidelity prototype that lets users attempt realistic tasks.

What if stakeholders want to attend sessions?

Great—observation builds empathy. But establish ground rules: observers don't talk, don't help, and save discussion for after. Consider having observers in another room watching a stream.

How do I know if a finding is a real problem or just one user's experience?

Multiple participants encountering the same issue is strong evidence. A single occurrence might be an anomaly or might be a sign of something others would hit too. Use judgment, and note confidence level in your reports.

Remote or in-person testing?

Both work. In-person provides richer observation (body language, physical context). Remote is easier logistically and may better represent real-world use. Choose based on what you need to learn and practical constraints.