Sound On The Internet

From 1997

The conventional wisdom among IT professionals is that audio is a toy, for entertainment purposes only. To date, the best use of audio in user interfaces is in video games and CD ROMs. But thanks to faster processors, sophisticated new algorithms, and well-thought-out new APIs, user interface designers stand poised to take advantage of perhaps the most subtly sophisticated sensing devices human beings have — our ears.

The elements of hearing
Human hearing is often overlooked by hardware and software developers, but it’s a rich and subtle sense. It is also always with us. We cannot close our ears the way we close our eyes. We even hear in our sleep, as anyone who has been awakened at 6 a.m. by a noisy garbage truck can tell you. Unlike vision, which focuses on one image at a time, we can “focus” our hearing on several things simultaneously — it is possible to hold a conversation while listening to music, and even follow several conversations going on at once.
People have been trained by evolution to respond to very subtle sound cues. We can:
-hear in all directions at once.
-hear around corners.
-hear through doors.
-hear through walls.
-identify the location of the source of a sound by hearing alone.
-hear loud noises, like thunder, from miles away.
-identify the dimensions of a space by acoustic clues.
-identify movement behind us via subtle changes in the sound field.
Problems and potential for audio
Currently there are many companies that allow no audible sound from computers in their workplace. Sound is regarded as intrusive and unnecessary. Perhaps the problem is not with sound per se, but the way it has been used in the past. Until recently, the audio-processing capability of a standard PC was fairly low. Unpleasant bleeps, bloops and quacks were the only types of sounds that could be produced. For the most part, audio alerts have been used only to notify the user of error — and nobody wants to share a mistake with the entire office.
Recent innovations are changing this trend. Standard-issue computers are now able to produce high-quality sound. Most systems come with at least a 16-bit sound card. Sun’s JavaSoft division, Microsoft, Intel and Elemedia, a division of Lucent have all released high-quality, software-based audio processing tools that can run on most standard machines. Separately, a number of very good software-based, text-to-speech systems have been released for several platforms.
Other companies are beginning to take tentative steps toward incorporating sound into their interfaces. Sun has licensed the Headspace sound engine (see below) and is releasing it as part of JDK 1.2, as the basis for the Java Sound API. Sun’s Java Media Framework API, out in public beta now, has a very thorough object-oriented approach to rendering multimedia files that makes the incorporation of audio events in Java applications both simple and flexible. Microsoft’s Windows 95 and Internet Explorer 4.0 offer limited options for using sounds as feedback for certain actions. Qualcomm’s Eudora Pro allows the user to select an audio alert instead of an alert box.
Immediate solutions with audio
Without waiting for new products and technologies, sound can easily be added to Web sites through the use of JavaScript. The same programming logic behind the use of mouseOver to create graphic “rollovers” can be applied to sound files to create spoken captions and links, or to associate a musical theme with a certain link. Other user actions, such as clicking on links and clicking the forward or back browser buttons, can be associated with sounds as well.
Sounds provide valuable feedback in the Web environment. Web sites often feature frustrating delays that may indicate either a process in the works or a failed action. Because of delayed reactions to form submissions or clicked hyperlinks, users often click several times, unsure whether their click has “taken.” An audible “click” can reassure users that a response will be forthcoming. Added assurance would come if the browser supplied audio feedback when it opened a connection to a remote server and began downloading data. Yet another sound could signal the completion of the download. During long downloads, sound can also be used as the equivalent of the music played over the telephone to people on hold — perhaps not a major feature, but a courtesy to the user. onClick can be used to initiate both the file download and to start a musical sequence or streaming audio file. When the download is complete, onLoad can stop playback of the music.
Browser plug-ins
Sound is still in the early stages online, but several products are currently available to exploit the “gee-whiz” factor. Beatnik is designed to make Web page “sonification” relatively simple to accomplish. The Beatnik plug-in contains a wavetable synthesis engine that allows for the rendering of most standard digital audio files such as .WAV, .AIFF and .AU. If will also play MIDI files and Headspace’s own RMF files. An RMF file can contain both MIDI sequences, and digital audio samples. RMF files are highly compressed and download quickly. The contents of an RMF file can be addressed individually via JavaScript. This is particularly useful because it allows all of the sounds and musical sequences used on a page, or even an entire site to be downloaded in one small package that can be cached. It also allows the audio elements of a site to be updated very simply.
Beatnik gives users unprecedented control over the playback itself. Users can not only control volume, they can change the tempo, pitch and even the instruments used to play musical sequences. A Java applet that watches the number and speed of user clicks could use that information to create a customized soundtrack. A user who hopped around a site quickly, would hear an uptempo version of the site’s theme music while a more leisurely surfer would get a soundtrack to reflect that.
Sseyo is promoting the use of what they call “generative music” with its Koan plug-in for Windows. The Koan plug-in reads a very small file, sometimes as small as 1K, that acts as a “seed” that creates an ever-evolving piece of music. The “seed” file specifies things like feel, tempo, key, basic melodies and instruments used. The plug-in takes it from there and “improvises” a new composition each time. This eliminates the use of repetitious “loops” that can quickly become annoying. Because of the small files used, this technology offers an incredibly fast download time.
Microsoft has an ActiveX control for Win32 and IE called Interactive Music Control that has similarities to both Beatnik and Koan. As with Beatnik, sound events contained in a single file can be scripted and dowloadable sounds are supported. As with Koan, ever-changing compositions are generated on the fly based on very simple initial parameter settings. Interactive Music Control can also combine elements of both plug-ins, so that a user’s interaction shapes the qualities of the music.
Sound is still a rarity on the Internet, often used for its novelty value. As technology progresses and ideas catch up with PCs’ new abilities, the power of one of our most important senses is likely to play a growing role. The role that sound already plays in computer games may be an indicator of the future — imagine Quake silent. The challenge to software developers today is to bring that quality of sound design to productivity and communications software.