Usability Testing Quickstart
Usability testing is the most direct way to see how people actually use your product. This guide helps you run your first test—or improve tests you're already doing.
Last updated: October 2023
What usability testing is (and isn't)
Usability testing means watching representative users attempt realistic tasks with your product (or prototype) while you observe.
What it reveals:
- Whether users can complete tasks
- Where users get confused or stuck
- How users interpret your interface
- Unexpected behaviors and workarounds
What it doesn't reveal:
- Whether users want your product
- Whether users will pay for your product
- What features users want (necessarily)
- Statistical significance (with typical sample sizes)
Usability testing complements other research methods—it doesn't replace them.
The basic process
- Prepare: Define tasks, recruit participants, set up the environment
- Facilitate: Guide participants through tasks while observing
- Analyze: Review findings and identify patterns
- Report: Share insights and recommendations
Preparing for the test
Define what you're testing
Be specific:
- Which part of the product?
- Which user flows?
- What questions are you trying to answer?
You can't test everything in one session. Focus on the areas with the most uncertainty or the highest stakes.
Create realistic tasks
Tasks should:
- Reflect real user goals (not feature demonstrations)
- Be specific enough to observe
- Avoid revealing the answer in the wording
Weak task: "Use the filter feature to narrow results" Better task: "You're looking for Italian restaurants within 2 miles that are open now. Find some options."
The weak version tells users where to go. The better version gives a realistic goal and lets you see if users find and use the filter.
Plan 5-7 tasks per session
More than that exhausts participants and dilutes focus. Fewer may not provide enough insight. Prioritize ruthlessly—you can always run more sessions.
Write a test script
Your script should include:
- Introduction (purpose, think-aloud instructions, reassurance)
- Background questions (brief)
- Tasks in order
- Follow-up questions for each task
- Wrap-up questions
- Thanks and next steps
Having a script ensures consistency across sessions and keeps you from forgetting important elements.
Ask participants to say what they're thinking as they work. "Please think out loud—tell me what you're looking at, what you're thinking, what you're trying to do." This gives insight into their mental model, not just their clicks.
Recruit participants
Who to recruit:
- People who match your actual user profile
- Mix of characteristics relevant to your product
- Not colleagues, friends, or family if avoidable
How many:
- 5 participants often reveals most major issues
- 6-8 provides more confidence
- More than 10 rarely justifies the time unless you're testing multiple segments
How to recruit:
- In-app invitations to existing users
- Customer lists (with appropriate consent)
- Recruiting services
- Community outreach (forums, social media)
Plan for 20-30% no-show/cancel rate.
Set up the environment
You need:
- A quiet space (or reliable remote tool)
- The product/prototype ready to test
- Recording method (with consent)
- Note-taking system
- Backup plan for technical issues
During the test
Facilitator role
Your job is to observe, not to help, teach, or defend:
- Ask the task, then be quiet
- Don't answer questions about how to use the product
- Redirect questions: "What would you do if I weren't here?"
- Observe body language and hesitation, not just success/failure
Note-taker role
If possible, have a separate note-taker. They should capture:
- Exact quotes
- Behavioral observations
- Timestamps of key moments
- Non-verbal cues
Common facilitator mistakes
Leading participants: "Did you notice the search button?" → This points them to what you wanted them to find.
Helping stuck users: Jumping in to explain. Instead, note that they were stuck and what they tried.
Defending the design: "Well, that feature is supposed to..." → The test isn't about being right.
Talking too much: Every word you say is a word they're not saying. Embrace silence.
When participants struggle
It's uncomfortable to watch someone struggle, but that struggle is data.
If someone is completely stuck:
- Note how long they struggled and what they tried
- Offer a hint only if needed to continue to other tasks
- Mark this task as "required assistance"
Session timing
- Introduction: 5 minutes
- Background questions: 5 minutes
- Tasks: 30-45 minutes
- Wrap-up: 5-10 minutes
Total: 45-65 minutes. Don't schedule back-to-back—you need buffer for overruns and decompression.
After the test
Analyze while it's fresh
Review notes and recordings soon after sessions. You'll remember context that doesn't make it into notes.
For each task, note:
- Success/failure
- Time taken
- Errors made
- Confusion points
- Quotes
Look for patterns
Individual sessions reveal individual experiences. Patterns across sessions reveal design issues.
Focus on:
- Problems multiple participants encountered
- Points where participants did something unexpected
- Confusion that even successful participants expressed
Severity assessment
Not all problems are equal. Assess severity:
Critical: Prevents task completion. Users cannot work around it.
Serious: Significantly impairs task completion. Users can work around it with difficulty.
Minor: Causes confusion or minor delay. Users recover easily.
Cosmetic: Noted by users but doesn't affect task success.
Create actionable recommendations
For each finding, suggest:
- What to do about it (specific action)
- Why that would help (connection to observed problem)
- Priority (based on severity and frequency)
Recommendations without evidence are opinions. Link everything back to what you observed.
Reporting findings
Who needs what
Executives: Summary of major findings, key recommendations, impact
Designers: Detailed findings, video clips, specific recommendations
Developers: Implementation-relevant details, edge cases discovered
Report format
A useful report includes:
- Executive summary (1 page)
- Methodology (brief: who, how many, what you tested)
- Key findings (organized by theme or severity)
- Detailed findings (task by task if helpful)
- Recommendations (prioritized)
- Appendix (screener, script, raw data)
Video clips
Short clips showing problems in action are worth pages of description. Edit clips to 30-60 seconds showing the moment of confusion.
Redirect with 'What do you think?' or 'What would you try?' If they ask about your intentions, defer: 'I'm interested in your experience. I can answer questions after we're done.'
It's better than nothing, but expect biased results. Colleagues know too much about the product and company context. Use them for catching obvious issues, not for validating that the design works.
Prototypes. They can range from paper sketches to clickable mockups to functional prototypes. Test with the lowest-fidelity prototype that lets users attempt realistic tasks.
Great—observation builds empathy. But establish ground rules: observers don't talk, don't help, and save discussion for after. Consider having observers in another room watching a stream.
Multiple participants encountering the same issue is strong evidence. A single occurrence might be an anomaly or might be a sign of something others would hit too. Use judgment, and note confidence level in your reports.
Both work. In-person provides richer observation (body language, physical context). Remote is easier logistically and may better represent real-world use. Choose based on what you need to learn and practical constraints.