Enabling Gtk+ and Gnome for the blind

Abstract

Graphical user interfaces are a challenge for blind people to use because a bit-mapped display is not easily and effectively mapped to non-visual interfaces. However working at the toolkit level it's possible to have access to the application internals and to provide a satisfactory auditory and braille interface.
The flexibility of the object model of the Gtk+ graphical library allows the development of an (application independent) module (GSpeech) that enables the blind to use (most) of the Gtk+ and Gnome applications.
Programmability and application-specific support can further expand the usability of such applications.
Additionally, with the use of auditory icons we try to give the visually-impaired some of the benefits that GUIs give to sighted people.

Why a blind would want to use a GUI application?

Blind people can effectively use a computer through a text-only interface that can be easily mapped to a braille device or a speech synthesizer.
There are several applications and drivers that enable a blind the use of an un*x-like operating system, most notably BrlTTY and Emacspeak.
The advent of GUI interfaces and applications poses a threat to the ability of the blind to use computers: a bit-mapped display is not easily mapped to a braille device or to synthesized voice.
Moreover, an effective mapping needs to know the structure of the user interface, so it can relate, for example, a label and a text entry next to it.
Blind people don't want to be left behind now that the major applications are written only for graphical user interfaces and they want to continue to collaborate with their colleagues that are switching away from text-only applications. Using graphical applications means that the blind can get assistance from relatives or friends not particularly good in using computers.
Also, research is ongoing on trying to provide to the visually-impaired actual benefits in the use of GUIs using auditory icons.
Being able to use GUIs is also a challenge that many blind are willing to undertake.

Mapping between visual and non-visual interfaces

An application using a graphical user interface appears on the screen in a 2-dimension space; however the internal structure is a tree of components that represent windows, buttons, text entries and the like.
The mapping from the visual interface to an auditory interface can be made using synthesized voice and sound samples.
Of some help can be also braille devices and tactile feedback.
To render some of the properties of the interface the sounds can be spatialized or changed in other manners (pitch, volume, ...).
The use of different voices can help distinguish interface spoken feedback from entered text.

Requirements for an effective approach

For a tool that does the mapping from visual interfaces to non-visual ones to be really effective there are several requirements:

access to the internal structures
The tool must have access to the internal tree structure of the interface to provide proper navigation and hierarchical feedback.
work with all or most of the applications
In an era of quick development of new applications, we need to tackle the problem at the root (the toolkit level).
The module I developed works only with programs using the Gtk+ library (so also all of the Gnome applications): this may seam a drawback, but there are a lot of graphical applications written with this toolkit. There exists a similar tool for Xt and Motif applications (Ultrasonix), but it's not free software.
AFAIK Qt doesn't have the hooks needed to develop something like this.
work without needing changes to the applications
There are thousands of applications, we cannot hope to be able to change them all. Moreover an application-independent approach allows the use of applications that come without source-code.
be customized to suit the user's experience and ability
An advanced user may want a different kind of audio feedback from a novice (who needs a verbose approach), or may want to add more cleverness, so the tool needs to be programmable.

The object model of Gtk+

The Gtk+ library provides a single-inheritance object oriented programming environment in which objects emit signals that can be hooked by the object's user or by a descendant. The base object is called GtkObject.
Signals describe events or changes to the state of an object and the like and are described by a string.
For example the GtkButton object emits the "clicked" signal when a user activates it; likewise a GtkItem emits a "select" signal when a GtkListItem (descendant from GtkItem) is selected in a list.
Signals can carry additional information such as the text that is going to be inserted in the GtkEditable's "insert_text" signal.
Additionally objects can have attributes accessed through a standard interface: this attributes can be the title of the window for a GtkWindow widget, or the border width for a container widget and so on. This simple but powerful model enables the access to the internal state of the user interface in an application independent manner.
Attributes and additional arguments to signals are described with the GtkArg system: we not only know the value, but also the type of an attribute (a string, an integer, a floating point value...).
Gtk+, also, provides the hooks to listen to selected signals emitted by a class: this means we can register a callback to be called when, for example, a window is mapped on the screen, or when a text entry receives the focus etc.
As all the information needed to hook to a signal or to read an attribute, is string-based (the name of the class, the name of the signal, the name of the attribute) or has type information, we can describe it without having to write additional code or link to additional object files.
The possibility of loading a module (implemented as a shared library) in the applications's memory space using command-line options or environment variables offered by Gtk+, is the last bit that makes the development of GSpeech possible.

GSpeech

GSpeech is a module that can be loaded by any application using the Gtk+ library (version 1.2 and above), that enables the mapping from the graphical interface to an auditory one.
The are several components in GSPeech:

configuration file parsing and handling
speech server interface
sound handling
widget specific routines

The configuration file contains information on all of the features available: the speech server to use, the sound samples to load, the signals to hook to.
Hooks are specified with the Hook directive, a hook name, the object class name and the signal name. Follows a list of actions, that can be simple text to synthesize, sound samples to play or special functions.
Example 1 (plays the sample "text" whenever a GtkEditable derived widget receives the focus):

hook "text_focus" "GtkEditable" "focus_in_event" {
        sound "text"
}

Example 2 (synthesize the title of the window and play the sample "window"):

hook "window_opens" "GtkWindow" "map" {
        "Opening window " 
        arg "title"
        sound "window"
}

The speech server abstraction interface is important because it lets use many of the available hardware and software synthesizers (more on this later).
The sound handling routines let the module play sound samples during speech synthesis using the Esound daemon (a program that mixes different audio sources and plays the result).
Widget specific routines are functions that deal with particular widgets or signals. These routines can be loaded from additional modules using the "load" directive.
An example is the function used to read the text entry's contents word by word or a letter at a time.

Speech servers

To allow for the broad range of hardware and software synthesizers available, the module includes a speech server abstraction.
Each speech server can support different languages and different voices, change the rate or the speech mode (normal, punctuation, letter by letter).
The supported speech servers are:

a simple command line synthesizer (like rsynth)
festival
Emacspeak's speech servers

More can be added and developed independently.

Application-specific modules and customization

The GSpeech module comes with an additional sub-module that handles the Gnome-specific widgets (most notably the Zvt terminal emulator widget).
Likewise, other library or application-dependent sub-modules can be developed and distributed independently from GSpeech: this way custom developed widgets with unusual requirements can be mapped to the auditory interface.
When it's not possible to know the internals of an application that we want to fine tune or customize for the auditory interface, it may be useful the Gle Gtk+ module available from www.gnome.org. This module lets inspect the widget structure of an application and the signals and attributes supported by the custom widgets.
The module lets define many hooks and then enable only the needed ones according to the experience and ability of the user.
An advanced user must also be able to program the auditory interface with an interpreted language.

Auditory icons and the possible benefits

The research done for the Mercator project explains some of the reasons graphical user interfaces are easier to use: the visual representation helps the user remember the options and the functionality, exploiting the interaction of visual feedback and short-term memory.
The use of auditory icons (sound samples that represent not only the user interface components, but also their state) may have the same effect on the visually-impaired.
The module needs to be able to change the samples to represent widget's state (sensitive, insensitive) or properties (large text area or long list opposed to a little text entry or a short list).

TODD and problems

Interaction with the window manager
Braille interface (mostly for the text entry widgets)
Embedding perl or another interpreter
Internationalization and multiple language support.
Icons, images and other non-text interface components.

References

Gtk+: http://www.gtk.org
Gnome: http://www.gnome.org
Emacspeak:
Mercator:
Ultrasonix:
Festival:
Esound:

Paolo Molaro <lupus@debian.org>