Powered by Blogger

Thursday, February 02, 2006

Screen Readers and Contextual Information

The topic of contextual information in user interface and the lack thereof in screen reader interfaces creates a lot of controversy and discussion among people who think about such things.  A few years ago, ATIA held an AT/IT compatibility meeting which kicked off its AT/IT compatibility committee as well as its AT/AT compatibility committee.  The membership of the AT/IT group contained people from mainstream companies (Microsoft, IBM, Adobe, etc.) and AT companies (FS, AI^2, Medentech and others).  Its agenda included helping define new methods of communicating between AT products and mainstream software packages in order to improve the user experience.

Well, a few years have passed, lots of new versions of JAWS, Window-Eyes and HAL have shipped.  A new version of MS Office (2003) went out the door and these releases demonstrate little progress in bringing the screen reader user more contextual information.  Apple has released its screen reader and Microsoft’s next OS release will contain a new API called “UI Automation” which claims to hold enough information to move the art forward.  Through a lot of terrific work by our friends at Sun Microsystems and IBM, a new AT/IT compatibility protocol was published by the AT/IT committee that looks a lot like the gnome accessibility API with some extensions and, to my knowledge, hasn’t been used in any product anywhere.

Other than direct “screen scraping” (taking information written to the screen and delivering it verbatim to the user) the two most commonly used techniques for getting data out of an application and delivering it to a blind user have been MSAA and Microsoft’s Object Model interface.  As a Microsoft manager once said, “an MSAA object is like an amoeba, it knows a bit about itself and its purpose but nothing about its surroundings,” which makes the context problem impossible to solve using MSAA as the soul source of information.  The MS Object Model provides much more information about context but remains so limited that a screen reader cannot do much to provide its users with information rich enough so they can perform at an even level with their sighted counterparts.

What do I mean by contextual information?

When a sighted person views a spreadsheet, they can focus their attention on a specific cell but, by seeing the surrounding cells, they can infer certain pieces of information about the cell they care most about.  A screen reader user with JAWS can learn a lot about the cell and its context by using some of the more advanced JAWS features but the limits upon them slow their data gathering much more so than a person who, by diverting their gaze, can learn a lot about a single data point.  The reason for using JAWS as the example is because it leads the pack in delivering contextual information in important applications, I will criticize it a bit in this article only by suggesting that more can be done in the future.  Window-Eyes and Freedom Box System Access have both done some catching up in this area in word processors and should be encouraged to continue to innovate as well.

Screen readers present information in a one dimensional manner.  They either deliver a long string of syllables through a speech synthesizer or deliver a few characters on a single line of a refreshable Braille display.  Multi-line Braille displays are cost prohibitive and have made few inroads into the market.  The ViewPlus tactile display, while expensive, does an amazing job of delivering contextual information and, for users who can gain access to such products, it may be the feel of the future.

How can screen readers improve?

Three dimensional audio is now cheap and ubiquitous.  On Tuesday, I paid $99 for a Creative Labs audio card for my laptop that supports all of the latest Direct X features, Dolby 5.1 & 7.1 as well as a few other goodies.  So, as I have mentioned before, the screen reader vendors should start researching two and three dimensional interfaces in audio and, using tactile displays like the one from ViewPlus, two dimensional touch interfaces.

Such interfaces will make using products we blinks already have, like word processors and spreadsheets more efficient and will open up accessibility to very graphical programs like Microsoft Visio and VisualStudio.

Why is this important?

Because we blinks need to compete in the workplace.  We desire career advancements and hope to move up in organizations.  Today, it is impossible to provide “reasonable accommodations” to programs from which screen readers cannot gather enough information to make them usable.  The technology is available in Direct X, in the tactile displays from ViewPlus and in Microsoft’s Object Models.  Now, we need the AT companies to be creative and discover ways to deliver it to us.  JAWS has set a pretty high bar with the information it can expose in Word, Excel, PowerPoint and MS Project but still has a way to go.  The other players need to catch up to JAWS and then start considering adding more dimensions to the reality of screen reader users like me.  This is one of my real hot button topics so expect to read more about contextual information and multi-dimensional interfaces at Blind Confidential in the future.

Now, back to the ICADI conference to learn more about smart homes and how we blinks can use them to improve our standard of living.  Thus far, this conference has been very cool.


Anonymous Will Pearson said...


I'll add to a couple of the key observations you raise in your article.

1. "Screen readers present information in a one dimensional manner. They either deliver a long string of syllables through a speech synthesizer or deliver
a few characters on a single line of a refreshable Braille display."

At present, this is the biggest divide between blinks and sighted people, at least in terms of efficiency. If you consider semantics, which is what we as humans actually intend to communicate, they have to be encoded some how in order for the semantic content to be transmitted. Visual communication uses the frequency property of lightwaves as well as four spatial locations: x, y, z and the temporal dimension to encode the semantic content. Speech, on the other hand, only commonly uses frequency and the temporal dimension. Therefore, just based on properties used to encode semantic information, speech is far slower than visual communication, at least in screen reader applications of speech technology.

2. "Three dimensional audio is now cheap and ubiquitous."

It can even be free. You don't need 3D-capable hardware in order to obtain 3D audio. All the leading audio engines, at least on the PC, such as DirectSound, FMOD and the offering from Creative, are capable of mixing sounds to render them in three dimensions using a technology known as Head Related Transfer Function, or HRTF for short. Even if someone has a sound card that is 3D capable, I would still suggest that people use software buffers to render 3D sounds, as opposed to using the 3D capable buffers on a sound card. The justification for this, is that there is a limited number of 3D capable buffers on a sound card and when these run out the sounds are then usually located in software buffers. This will mean that two methods of creating 3D sounds are being used and the two methods generally present sounds in slightly different places, which causes a heck of a mess in terms of the Gestalt perceptual laws. Additionally, you can do tricks with 3D software buffers that you can't do with hardware 3D capable buffers.

1:44 PM  
Anonymous Chris Westbrook said...

Has any research been done as to how 3d interfaces would affect the deaf blind? I find I can't play most of the 3d games because I can't really tell a difference between left and right in those games. I'm pretty sure my neckloop that I use only renders things in mono. I don't want to lose my screen reading experience to some kind of 3d thing I won't be able to understand.

8:40 PM  

Post a Comment

<< Home