Enabling Gtk+ and Gnome for the blind
Abstract
Graphical user interfaces are a challenge for blind people to use
because a bit-mapped display is not easily and effectively
mapped to non-visual interfaces. However working at the toolkit
level it's possible to have access to the application internals
and to provide a satisfactory auditory and braille interface.
The flexibility of the object model of the Gtk+ graphical library
allows the development of an (application independent) module
(GSpeech) that enables the blind to use (most) of the Gtk+ and Gnome
applications.
Programmability and application-specific support can further expand
the usability of such applications.
Additionally, with the use of auditory icons we try to give the
visually-impaired some of the benefits that GUIs give to sighted people.
Why a blind would want to use a GUI application?
Blind people can effectively use a computer through a text-only
interface that can be easily mapped to a braille device or a speech
synthesizer.
There are several applications and drivers that enable a blind the
use of an un*x-like operating system, most notably BrlTTY and
Emacspeak.
The advent of GUI interfaces and applications poses a threat to
the ability of the blind to use computers: a bit-mapped display
is not easily mapped to a braille device or to synthesized voice.
Moreover, an effective mapping needs to know the structure of the
user interface, so it can relate, for example, a label and a text
entry next to it.
Blind people don't want to be left behind now that the major
applications are written only for graphical user interfaces
and they want to continue to collaborate with their colleagues
that are switching away from text-only applications. Using
graphical applications means that the blind can get assistance
from relatives or friends not particularly good in using computers.
Also, research is ongoing on trying to provide to the visually-impaired
actual benefits in the use of GUIs using auditory icons.
Being able to use GUIs is also a challenge that many blind are
willing to undertake.
Mapping between visual and non-visual interfaces
An application using a graphical user interface appears on the screen
in a 2-dimension space; however the internal structure is a tree
of components that represent windows, buttons, text entries and the like.
The mapping from the visual interface to an auditory interface can be made
using synthesized voice and sound samples.
Of some help can be also braille devices and tactile feedback.
To render some of the properties of the interface the sounds can
be spatialized or changed in other manners (pitch, volume, ...).
The use of different voices can help distinguish interface spoken
feedback from entered text.
Requirements for an effective approach
For a tool that does the mapping from visual interfaces to non-visual
ones to be really effective there are several requirements:
- access to the internal structures
The tool must have access to the internal tree structure
of the interface to provide proper navigation and
hierarchical feedback.
- work with all or most of the applications
In an era of quick development of new applications, we need
to tackle the problem at the root (the toolkit level).
The module I developed works only with programs using the
Gtk+ library (so also all of the Gnome applications): this may seam
a drawback, but there are a lot of graphical applications written
with this toolkit. There exists a similar tool for Xt and Motif
applications (Ultrasonix), but it's not free software.
AFAIK Qt doesn't have the hooks needed to develop
something like this.
- work without needing changes to the applications
There are thousands of applications, we cannot hope to be able
to change them all. Moreover an application-independent approach
allows the use of applications that come without source-code.
- be customized to suit the user's experience and ability
An advanced user may want a different kind of audio feedback
from a novice (who needs a verbose approach), or may want to add
more cleverness, so the tool needs to be programmable.
The object model of Gtk+
The Gtk+ library provides a single-inheritance object oriented programming
environment in which objects emit signals that can be hooked by the
object's user or by a descendant. The base object is called GtkObject.
Signals describe events or changes to the state of an object and the like
and are described by a string.
For example the GtkButton object emits the "clicked" signal when a
user activates it; likewise a GtkItem emits a "select" signal
when a GtkListItem (descendant from GtkItem) is selected in a list.
Signals can carry additional information such as the text that is going
to be inserted in the GtkEditable's "insert_text" signal.
Additionally objects can have attributes accessed through
a standard interface: this attributes can be the title of the window
for a GtkWindow widget, or the border width for a container widget and so on.
This simple but powerful model enables the access to the internal
state of the user interface in an application independent manner.
Attributes and additional arguments to signals are described with the
GtkArg system: we not only know the value, but also the type of an
attribute (a string, an integer, a floating point value...).
Gtk+, also, provides the hooks to listen to selected signals emitted by
a class: this means we can register a callback to be called when, for
example, a window is mapped on the screen, or when a text entry
receives the focus etc.
As all the information needed to hook to a signal or to read an
attribute, is string-based (the name of the class, the name of the signal,
the name of the attribute) or has type information, we can describe it
without having to write additional code or link to additional object files.
The possibility of loading a module (implemented as a shared library) in the
applications's memory space using command-line options or environment
variables offered by Gtk+, is the last bit that makes the development of
GSpeech possible.
GSpeech
GSpeech is a module that can be loaded by any application using the
Gtk+ library (version 1.2 and above), that enables the mapping from
the graphical interface to an auditory one.
The are several components in GSPeech:
- configuration file parsing and handling
- speech server interface
- sound handling
- widget specific routines
The configuration file contains information on all of the features
available: the speech server to use, the sound samples to load, the
signals to hook to.
Hooks are specified with the Hook directive, a hook name,
the object class name and the signal name. Follows a list of actions,
that can be simple text to synthesize, sound samples to play or
special functions.
Example 1 (plays the sample "text" whenever a GtkEditable derived
widget receives the focus):
hook "text_focus" "GtkEditable" "focus_in_event" {
sound "text"
}
Example 2 (synthesize the title of the window and play the sample
"window"):
hook "window_opens" "GtkWindow" "map" {
"Opening window "
arg "title"
sound "window"
}
The speech server abstraction interface is important because it lets use
many of the available hardware and software synthesizers (more on this
later).
The sound handling routines let the module play sound samples
during speech synthesis using the Esound daemon (a program that
mixes different audio sources and plays the result).
Widget specific routines are functions that deal with particular
widgets or signals. These routines can be loaded from additional modules
using the "load" directive.
An example is the function used to read the text entry's contents
word by word or a letter at a time.
Speech servers
To allow for the broad range of hardware and software synthesizers
available, the module includes a speech server abstraction.
Each speech server can support different languages and different voices,
change the rate or the speech mode (normal, punctuation, letter by letter).
The supported speech servers are:
- a simple command line synthesizer (like rsynth)
- festival
- Emacspeak's speech servers
More can be added and developed independently.
Application-specific modules and customization
The GSpeech module comes with an additional sub-module that handles
the Gnome-specific widgets (most notably the Zvt terminal emulator
widget).
Likewise, other library or application-dependent sub-modules can
be developed and distributed independently from GSpeech: this
way custom developed widgets with unusual requirements can be mapped
to the auditory interface.
When it's not possible to know the internals of an application that
we want to fine tune or customize for the auditory interface, it may
be useful the Gle Gtk+ module available from www.gnome.org. This
module lets inspect the widget structure of an application and
the signals and attributes supported by the custom widgets.
The module lets define many hooks and then enable only the needed
ones according to the experience and ability of the user.
An advanced user must also be able to program the auditory
interface with an interpreted language.
Auditory icons and the possible benefits
The research done for the Mercator project explains some of the
reasons graphical user interfaces are easier to use: the visual
representation helps the user remember the options and the
functionality, exploiting the interaction of visual feedback
and short-term memory.
The use of auditory icons (sound samples that represent not only
the user interface components, but also their state) may have the
same effect on the visually-impaired.
The module needs to be able to change the samples to represent
widget's state (sensitive, insensitive) or properties (large
text area or long list opposed to a little text entry or a short
list).
TODD and problems
- Interaction with the window manager
- Braille interface (mostly for the text entry widgets)
- Embedding perl or another interpreter
- Internationalization and multiple language support.
- Icons, images and other non-text interface components.
References
Gtk+: http://www.gtk.org
Gnome: http://www.gnome.org
Emacspeak:
Mercator:
Ultrasonix:
Festival:
Esound:
Paolo Molaro <lupus@debian.org>