Saturday, June 20, 2009

Eyetracking research and forms design

Formulate: Eye tracking research and forms design

In recent years, a number of form design related eye tracking studies seem to have captured popular attention amongst the web community. These include a study by Matteo Penzo on label placement and another more recent study.

Not only have these studies been widely circulated on the Internet but, in the case of Penzo's eye tracking study, they have formed the basis of some parts of Luke Wroblewski's popular book “Web Form Design: Filling in the Blanks”.

We think it's great that the unique design challenge forms represent is getting more attention from the web and user experience community. However, we are a little unsettled about the increasing use of these articles as the basis for best practice. Our concern stems from what we see as fairly major flaws in the methodology that these and similar eye tracking research studies contain.

The two methodological problems lie behind much past forms-related eye tracking research
In many cases, eye tracking studies that have examined different options for the design of forms have suffered from two main shortcomings: insufficient recognition that seeing is not equivalent to attention; and drawing inferences from an inadequate sample. Seeing does not equate to attention. It is one thing to know that someone has directed their gaze in a particular place. It is another thing entirely to know what they were attending to—or thinking—at that time.
Zimmerman estimates that at any one time, the eyes take in 10,000,000 bits per second of information, yet we pay conscious attention to only 40 bits per second¹. That's 40 bits out of 10 million, or attention going to only 0.0004% of what we see.

If you're not convinced about this phenomenon, try to remember what colour shirt the person you share an office with was wearing yesterday, or even the colour of their eyes. You probably look at both things many times in a working day, but you don't necessarily attend to them.
The implications for eye tracking research is that such studies give us only part of the picture of what's going on when someone interacts with a form. In order to truly make informed conclusions, we need to supplement this picture with information from other sources. This might include error and task analysis of the completed forms and/or probing the participant, using protocols such as concurrent “think aloud” or retrospective discussion.

An adequate sample is a prerequisite for drawing inferences Our second and equally significant concern relates to the design of the samples used to conduct these many eyetracking studies.
As an example, the cxPartners' study involved only 8 participants: 6 female and 2 male, all of which were in their 20s or 30s and reasonably web savvy. Without considering any other aspects of the study's design, this is enough to make a statistician break out into a cold sweat.
The statistician's reaction is because the sample used by cxPartners is highly likely to have been skewed. By skewed we mean that the sample probably doesn't accurately reflect the greater web-form-filling population. At the very least, it would have been preferable to have included both younger and older participants, not to mention more males.

Furthermore, the sample size—8 people—is so small that it is likely to be highly influenced by the nature of the particular 8 participants that were involved. Pick a different 8 people and there is a good chance that the findings from the research would be very different.

This is why 30 is the recommended minimum sample size for any study from which inferences for a general population are to be drawn². While there's a lot more to designing a good sample than having a minimum of 30 participants, this will at least get you into the space where you might be able to calculate statistical significance.

Statistical significance is about knowing which differences are likely to be due to just the particular sample that was selected as opposed to reflecting a true difference in the underlying population. With a small sample size, we cannot calculate statistical significance and thus have no real indication of the reliability of our findings. (For more on statistical significance and user research, see Caroline Jarrett's recent article on Usability News titled "Statistically significant usability testing".)

Being transparent about sample design is important One thing cxPartners did well in their article is describe the sample that formed the basis of their research. Providing this information empowers the reader to make their own judgement about how to use the findings presented therein.

Conversely, Matteo Penzo's article doesn't give many specifics about the design of his sample. He says that the sample included both expert users—primarily designers and programmers, but also some usability experts—and novice users. But we are not given any more detail nor told how many participants there were. One hopes, given the immense popularity of his article, that Penzo's sample was both representative and large.
Better not to report at all?

To be fair to the team at cxPartners, their eyetracking forms article did begin with note about the potentially invalidity of the study. Isn't it enough that readers were duly warned? Unfortunately, we think not.

It is our impression that web designers and developers are hungry for guidelines based on research. This hunger is a great thing: it means we all want to know more and create the best sites we can. However, it also means that readers are likely to latch on to the findings of a study and pay little regard to the caveats regarding methodology that are placed around it. This is just human nature. We can work with a guideline; we need a guideline. The perhaps-flimsy basis behind the guideline is just all too often seen as the spoil-sport at the party and pushed to one side.

So what should researchers do with findings based on an inadequate sample? Perhaps controversially, we suggest that rather than report findings with caveats around them, it may be better to not report such findings at all. That way widespread inappropriate use can be prevented.

This is a hard position for many people to accept. Surely it is better to have some findings than nofindings?

The problem is that the “some” findings may be pointing in completely the wrong direction. If we have no data, there's nothing to suggest one course of action is better than another. But if we have bad data, it can lead us astray, all the while with a false sense of confidence in our decision because, after all, it is based on research findings.
We raise these issues to help progress the field
We did not write this article to embarrass or shame anyone, nor to discourage people from doing forms research. We know from direct experience how unbelievably hard it is to design a sound research study.

Moreover, we think both Matteo Penzo and cxPartners should be congratulated for actually taking the (not insignificant) time and effort to actually do some research and share their findings with the community. A lot of people make demands of such individuals—“Why didn't you do X?”, “What would have happened if you had tested Y?” etc—but very few people actually take up the gauntlet and run such studies themselves.

Having said that, what we would like to see in the future is for the web community to have a higher awareness of what makes for quality research, and approach published studies with a more critical eye. Formulate has and will be on the receiving end of such critique—see, for example, the comments to our recent research article on A List Apart—but as long as it is informed and considered, we believe this can only help to advance the field.

In the end we hope the web industry will recognise the importance of the sort of rigour that has been commonplace for decades in other fields such as psychology and social research. Not only will this lead to better design decisions, but we believe it will help the industry mature, in turn generating respect for the web as a serious vehicle for communication, transaction and information.

1 Zimmerman, M. (1989) "The Nervous System in the Context of Information Theory". In Zimmerman, M. Schmidt, R. F. & Thews, G. (eds) Human physiology pp. 166-173.
2 This minimum of 30 can be found in almost any statistics or sampling textbook, e.g. Howell, D.C. (1982) Statistical Methods for Psychology p. 149. The number comes from the fact that given a large population, the greater the sample size, the closer the distribution of means from samples of that size comes to approximating the normal distribution. This in turn makes various sample estimates—including statistical significance—valid (provided some other conditions also hold, but we won't go into that here!).

Eye-tracking studies: more than meets the eye

Official Google blog: Eye-tracking studies: more than meets the eye
Imagine that you need a refresher on how to tie a tie. So, you decide to type [how to tie a tie] into the Google search box. Which of these results would you choose?Where did your eyes go first when you saw the results page? Did they go directly to the title of the first result? Did you first check the terms in boldface to see if the results really talk about tying a tie? Or maybe the images captured your attention and drew your eyes to them?You might find it difficult to answer these questions. You probably did not pay attention to where you were looking on the page and you most likely only used a few seconds to visually scan the results. Our User Experience (eye tracking) Research team has found that people evaluate the search results page so quickly that they make most of their decisions unconsciously. To help us get some insight into this split-second decision-making process, we use eye-tracking equipment in our usability labs. This lets us see how our study participants scan the search results page, and is the next best thing to actually being able to read their minds. Of course, eye-tracking does not really tell us what they are thinking, but it gives us a good idea of which parts of the page they are thinking about.To see what the eye-tracking data we collect looks like, let's go back to the results page we got for the query [how to tie a tie]. The following video clip shows in real time how a participant in our study scanned the page. And yes, seriously — the video is in real time! That's how fast the eyes move when scanning a page. The larger the dot gets, the longer the users' eye pauses looking at that specific location.Based on eye-tracking studies, we know that people tend to scan the search results in order. They start from the first result and continue down the list until they find a result they consider helpful and click it — or until they decide to refine their query. The heatmap below shows the activity of 34 usability study participants scanning a typical Google results page. The darker the pattern, the more time they spent looking at that part of the page. This pattern suggests that the order in which Google returned the results was successful; most users found what they were looking for among the first two results and they never needed to go further down the page.When designing the user interface for Universal Search, the team wanted to incorporate thumbnail images to better represent certain kinds of results. For example, in the [how to tie a tie] example above, we have added thumbnails for Image and Video results.
However, we were concerned that the thumbnail images might be distracting and disrupt the well-established order of result evaluation.We ran a series of eye-tracking studies where we compared how users scan the search results pages with and without thumbnail images. Our studies showed that the thumbnails did not strongly affect the order of scanning the results and seemed to make it easier for the participants to find the result they wanted.
For the Universal Search team, this was a successful outcome. It showed that we had managed to design a subtle user interface that gives people helpful information without getting in the way of their primary task: finding relevant information.In addition to search research, we also use eye-tracking to study the usability of other products, such as Google News and Image Search.
For these products, eye-tracking helps us answer questions, such as "Is the 'Top Stories' link discoverable on the left of the Google News page?" or "How do the users typically scan the image results — in rows, in columns or in some other way?"Eye-tracking gives us valuable information about our users' focus of attention — information that would be very hard to come by any other way and that we can use to improve the design of our products. However, in our ongoing quest to make our products more useful, usable, and enjoyable, we always complement our eye-tracking studies with other methods, such as interviews, field studies and live experiments.

Results of Eye Tracking Study: Google Versus Bing

Bill Hartzer: User Centric Releases Results of Eye Tracking Study: Google Versus Bing

User Centric has released the results of an eye tracking study that compares data between Google and the new Microsoft search engine Bing. What is interesting to note is that sponsored links are attracting more attention than they are on Google. Google users appear to be more aware of the sponsored links.

User Centric used eye tracking technology to capture 21 participants’ eye movements as they completed two informational and two transactional search tasks, each in and

The two search phrases that were used during this eye tracking study:Learn about eating healthyBook a last minute vacation

User Centric used eye tracking technology to capture 21 participants’ eye movements as they completed two informational (e.g., “Learn about eating healthy”) and two transactional (e.g., “Book a last minute vacation”) search tasks in each engine.

According to User Centric, “Preliminary findings revealed comparable amount of visual attention on organic search results and top sponsored links across both search engines. Sponsored links on the right, however, attracted more attention on Bing than they did on Google. On average, across all four tasks, 42% of participants looked at Bing’s sponsored links on the right; by contrast, only 25% of participants looked at Google’s right rail links.”

When it came to the amount of attention paid to the organic search results, Bing and Google did not differ: users spent an average of seven seconds looking at the organic search results. They were both about the same amount of time spent.

During a search, 90 percent of users tested looked at the “sponsored links”. User Centric also reports that during “transactional searches” (searches that involved someone buying something or searches related to completing a transaction), “participants would spend more time looking at the sponsored results on top (~2.5 seconds) than they did on informational searches (~1.5 seconds).” But, on the search engine, participants of the study spent more time looking at the paid links (sponsored links). About 42 percent paid more attention to the sponsored results on Bing. About 25 percent on Google.

Eye-tracking 2.0: it's about users, not science

Mihkel Jäätma is co-founder of eye-tracking company Realeyes

Eye-tracking has been used in web design for many years. However, the widespread preconception is that it takes PhD skilled technicians - plus long consulting hours - to make any sense or use of people’s eye gaze data.

The value from eye-tracking has been directly related to consultancy skills, but shouldn’t it be more about real users?
Eye-tracking has been used mainly as a qualitative tool because of technical reasons. The hardware was difficult to operate and only few consulting houses had the access and capability to run the tests in their lab conditions. Operating labs and recruiting people from consumer panels is expensive and forced consultancies to stick with small sample sizes to fit into client’s budgets. Small sample sizes have held back the wider acceptance of eye-tracking analysis in web design. In 'Eye-tracking 2.0' devices will be taken to users, not users to devices. This fundamentally changes the speed and sample size of users who can be eye-tracked for analysis purposes.
Quantitative eye-tracking speaks the voice of the customer, not of the consultant. It takes to test about 50 people (why 50? about eye tracking) to achieve reliability in the visual analysis for any piece of media. Statistical significance of this sample size allows conveying real user design preferences in neutral and objective manner. Visual metrics and animations of user interaction gain their own stand-alone value as models of real user behavior on designers and marketers desktops. Consultant expertise and qualitative insights are extremely valuable, quantitative eye-tracking complements them with sound numbers. Technological progress has made data collection procedures and packaging 'a standard issue'. Professionals can now spend their time on more demanding and value-adding activities than running user tests or analyzing gigabytes of eye-tracking data.
Reliable quantitative visual analysis is now available as a given tool and consultants can focus more on fundamentals. Much like fund managers do when using outsourced data feeds, but making their own investment decisions.
Eye-tracking was often oversold in the past, creating well-deserved skepticism towards the technology. The 'before-and-after' case studies of websites being redesigned based on only heatmaps of 10 people rightly upset many industry professionals.
As eye-tracking hardware improves and operational models for analysis develop there will be less ‘magic’ and more of the real stuff: identifying user preferences and employing that knowledge to achieve better web design.