Sound component

This component provides sound support. AudioClip defines a sound file and allows to start and stop playback. Sound positions the sound in 3D world and configures sound parameters. Sounds may be spatialized (3D) or not.

See also X3D specification of the Sound component.

Sound demo

Contents:

1. Demos

For demos and tests of these features, see the sound subdirectory inside our VRML/X3D demo models.

2. Supported nodes

There are two nodes dealing with sound:

  1. AudioClip node is a buffer for sound data. Basically, it's a wrapper around a sound data coming from .wav, .ogg and such files.
  2. Sound node is a 3D sound, that has a 3D position, can move (by animating Sound.location and the parent transformation). Naturally Sound references AudioClip by Sound.source field.

Sound(Pascal API: TSoundNode) node (3D sound) supported fields / events:

  • intensity
  • location (correctly transformed by Sound transformation; animating the location field or the transformation of the Sound node works perfectly)
  • priority (we have a smart sounds allocator, and the priority really matters when you have many sounds playing at once)
  • source (to indicate AudioClip node)
  • spatialize. Note that multi-channel (e.g. stereo) sound files are never spatialized. Be sure to convert your sound files to mono if you want to use them for 3D spatialized sound.
  • minFront and maxFront values are handled.

TODO:

  • Our sound attenuation model is a little simpler than VRML/X3D requirements. We have an inner sphere, of radius minFront, where sound gain is at maximum. And we have an outer sphere, of radius maxFront, where sound gain drops to zero. Between them sound gain drops linearly.

    Contrast this with VRML/X3D spec, that requires two ellipsoids (not just spheres). In our implementation, the sounds at your back are attenuated just the same as the front sounds. We simply ignore direction, minBack and maxBack fields. As far as I know, the ellipsoids model is not possible in OpenAL, so we're probably not the only viewer that doesn't support them.

  • It's unclear from the specification whether playing should be limited only to the sounds in the active graph part (the subset of children chosen in LOD / Switch and similar nodes). To be on the safe side, always place your sounds in the active graph part, although our implementation will usually also play sounds in the inactive part (exceptions may happen for complicated situations with PROTOs).

    Reports how other browsers handle this are welcome.

  • Right now, sounds are not sorted exactly like the specification says. In short, we only look at your given "priority" field, disregarding current sound distance to player and such. This will definitely by fixed some day, please speak up on forum if you need it.

AudioClip(Pascal API: TAudioClipNode) node (the sound file to be played, basic playback properties, events to start/stop playing of the sound) supported fields / events:

  • url (allowed sound file formats are OggVorbis and (uncompressed) WAV).
  • duration_changed
  • loop (we loop without any glitches between)
  • description (is simply ignored, this is valid behavior)
  • pitch (correctly changes both the sound pitch and speed of playing)
  • all time-dependent fields (start/stop/pause/resumeTime, elapsedTime, isActive, isPaused).

    TODO: But we don't really control our position (offset) within the sound. When we detect that we should be playing, but we're not — we just start playing, always from the beginning.

TODO: There's no streaming implemented yet. So too many and too long music files in one world may cause large memory consumption, and at the first play of a long sound there may be a noticeable loading delay.

TODO: Only AudioClip works as sound source for now. You cannot use MovieTexture as sound source.

3. DEF / USE on sounds

VRML/X3D define the play/stop events at the AudioClip node (not at higher-level Sound node, which would be more useful IMO). This means that USEing many times the same AudioClip or Sound nodes doesn't make much sense. You can only achieve the same sound, playing simultaneously the same thing, from multiple 3D sources. Since such simultaneous sounds are usually useless, we don't even implement them (so don't reUSE AudioClip or Sound nodes, or an arbitrary one will be playing). If you want to use the same sound file many times, you will usually want to just add many AudioClip nodes, and set their url field to the same value. Our implementation is optimized for this case, we have an internal cache that will actually load the sound file only once, even when it's referenced by many AudioClip.url values.

More detailed explanation:

The question is: where do we put start/stop events. At the Sound node, or at the AudioClip node?

More precisely, which node has X3DTimeDependentNode as an ancestor (X3DTimeDependentNode contains startTime, stopTime and a lot of other playing-related fields and events).

  1. The decision of X3D specificaion was to put them at AudioClip.

    The downside: DEF/USE for AudioClip doesn't make much sense, usually. You can write this:

      Sound DEF S1 { source DEF A1 AudioClip { url "sound.wav" } }
      Sound DEF S2 { source USE A1 }
    

    but it's not useful: you can only send startTime to the A1, making both sound sources playing simultaneously the same thing. To be able to independently start/stop playing of sounds on S1 and S2, you have to resign from DEF/USE, and write

      Sound DEF S1 { source DEF A1 AudioClip { url "sound.wav" } }
      Sound DEF S2 { source DEF A2 AudioClip { url "sound.wav" } }
    

    So you need two AudioClip nodes, even though their contents are equal.

    The upside of X3D specification is that this way MovieTexture, which also descends from X3DTimeDependentNode, can be used inside Sound nodes. This way you can play audio track from movies.

  2. The inverse decision would be to make Sound node a X3DTimeDependentNode. Then you could write

      Sound DEF S1 { source DEF A1 AudioClip { url "sound.wav" } }
      Sound DEF S2 { source USE A1 }
    

    and independently start/stop playing sounds S1 and S2.

    The downside would be that playing audio tracks from MovieTexture is ugly, and probably should not be allowed by the specification. When both MovieTexture and Sound would be of X3DTimeDependentNode, it would be unclear which node controls the playing in case of this:

      Sound DEF S1 { source DEF M1 MoveTexture { url "movie.avi" } }
    

    Probably, the idea of playing sounds from MovieTexture should be just dropped in this case, otherwise it gets messy.

Personally, Michalis would choose the option 2. (But it's too late for that now, and we implement spec-complaint decision 1.) I don't think that playing audio tracks from movie files is a useful or common use case. It's wasteful, anyway, to reference a movie just to play an audio track, so authors are well adviced to avoid this. If you want to play an audio track from a movie, consider just extracting the audio track to a separate .wav/.ogg file and playing it using AudioClip node. This way we will not have to download the whole movie just to play its audio.

4. Plans for new X3D 4.0 sound nodes

X3D version 4 introduced a number of new sound nodes and capabilities. See the Sound in X3D 4.0 specification.

They have been designed to match the Web Audio API (see Web Audio API at Mozilla Developer Network and W3C Recommendation). This API is available in major web browsers (at least Firefox and Google Chrome support them). On other platforms (desktops and mobile), LabSound is a nice C++ library that provides the equivalent.

To be clear, we applaud this move in X3D. Basing the sound design around an open standard, that is already implemented in web browsers, makes total sense from the X3D point of view, esp. if you consider web to be the major platform for X3D.

That said, implementing these nodes/capabilites in CGE/view3dscene is a non-trivial work and it's admittedly not our priority, at least now. It means doing 2 things:

  1. Implementing a cross-platform sound backend exposing Web Audio API. Likely using LabSound under the hood on non-web platforms (on web, we can just use browser support).

    To explain, we have a number of sound "backends" in Castle Game Engine:

    We will likely have another in 2023: AudioKinetic's Wwise, another popular solution in gamedev domain.

    Note: a common practice in gamedev is to use "sound middleware". This means that sound designer/musician uses a sound middleware provided by FMOD ("FMOD Studio") or Wwise — it's a special application, independent of the game engine. Such application can export a "sound bank" (which may be even optimized for given platform) and then the game engine (like CGE) uses FMOD / Wwise API to issue events. These events control the sounds indirectly, following the encoded instructions from the sound bank. E.g. an "event" may just play a sound, but it can also change a volume or pitch of something.

    All this means is that support for WebAudio API is not our priority. It does not seem a common practice in gamedev for sound designers to target WebAudio concepts. Improving our support for FMOD and Wwise, to enable using "sound middleware", has higher priority as a standard workflow in gamedev.

    Note: While OpenAL, FMOD and Wwise have similar concepts to WebAudio... finding a match between them and WebAudio seems to be quite a lot of work. That's why, for a complete implementation of X3D 4 sound nodes, I think we just need a WebAudio/LabSound backend.

    Possible argument in favor of LabSound and WebAudio: maybe LabSound will become just more popular than OpenAL with time. This would be a strong reason to switch to it, and start recommending LabSound (and thus WebAudio) backend over OpenAL. I mention this, because the development of OpenAL as a specification has unfortunately stagnated — looks like Creative is no longer interested. Although, OpenAL Soft implementation continues to be very active.

    Of course on the (planned) web platform, the situation is more straightforward, as there WebAudio API is just available in the web browser. But we're a cross-platform game engine, whatever we do -- we want to have consistent support on all platforms (desktop, mobile, consoles, web).

  2. Implementing X3D 4 sound nodes on top of Web Audio backend.

    Admittedly this is low priority for CGE. Because for developers using "Castle Game Engine", the X3D sound nodes do not matter much. We recommend to use our sound components instead, that are easy to set up in CGE editor, have convenient OOP API in Pascal. These components are different than X3D nodes. They have been modelled following what our users want, and looking at other game engines.

    This situation is not set in stone. For various things, CGE uses X3D under the hood (e.g. for all TCastleScene rendering as well as light sources). It is possible that CGE sound components would employ X3D sound nodes at some point, when it will result in a useful API expected by game developers.

This priority is not "set in the stone" of course. It may be that Web Audio will become de-facto standard for how you do sound everywhere. And a dedicated contributor, interested in upgrading our code to support X3D 4 sound capabilities, is absolutely welcome. If you are interested in seeing Web Audio sound backend in CGE, and support for X3D 4 sound nodes/capabilities in CGE, please