MPEG-4 Part 3 or MPEG-4 Audio (formally ISO/IEC 14496-3) is the third part of the ISO/IECMPEG-4 international standard developed by Moving Picture Experts Group.[1] It specifies audio coding methods. The first version of ISO/IEC 14496-3 was published in 1999.[2]
MPEG-4 Audio does not target a single application such as real-time telephony or high-quality audio compression. It applies to every application which requires the use of advanced sound compression, synthesis, manipulation, or playback.
MPEG-4 Audio is a new type of audio standard that integrates numerous different types of audio coding: natural sound and synthetic sound, low bitrate delivery and high-quality delivery, speech and music, complex soundtracks and simple ones, traditional content and interactive content.[7]
MPEG-4 Audio includes a system for handling a diverse group of audio formats in a uniform manner. Each format is assigned a unique Audio Object Type to represent it.[18][19] Object Type is used to distinguish between different coding methods. It directly determines the MPEG-4 tool subset required to decode a specific object. The MPEG-4 profiles are based on the object types and each profile supports a different list of object types.[19]
Used in the "AAC Profile". MPEG-4 AAC LC Audio Object Type is based on the MPEG-2 Part 7 Low Complexity profile (LC) combined with Perceptual Noise Substitution (PNS) (defined in MPEG-4 Part 3 Subpart 4).[4][22]
3
AAC SSR (Scalable Sample Rate)
1999
MPEG-4 AAC SSR Audio Object Type is based on the MPEG-2 Part 7 Scalable Sampling Rate profile (SSR) combined with Perceptual Noise Substitution (PNS) (defined in MPEG-4 Part 3 Subpart 4).[4][22]
It is also known as "Fine Granule Audio" or fine grain scalability tool. It is used in combination with the AAC coding tools and replaces the noiseless coding and the bitstream formatting of MPEG-4 Version 1 GA coder. Error Resilient
also known as MPEG Spatial Audio Coding (SAC), it is a type of spatial audio coding[31][32] (MPEG Surround was also defined in ISO/IEC 23003-1 in 2007[33])
This object type conveys Low Delay MPEG Surround Coding side information (that was defined in MPEG-D Part 2 – ISO/IEC 23003-2[43]) in the MPEG-4 Audio framework.
45
SAOC-DE
2013
Spatial Audio Object Coding Dialogue Enhancement
46
Audio Sync
2015
The audio synchronization tool provides capability of synchronizing multiple contents in multiple devices.
Audio Profiles
Hierarchical structure of AAC Profile, HE-AAC Profile and HE-AAC v2 Profile, and compatibility between them. The HE-AAC Profile decoder is fully capable of decoding any AAC Profile stream. Similarly the HE-AAC v2 decoder can handle all HE-AAC Profile streams as well as all AAC Profile streams. Based on the MPEG-4 Part 3 technical specification.[21]
The MPEG-4 Audio standard defines several profiles. These profiles are based on the object types and each profile supports different list of object types. Each profile may also have several levels, which limit some parameters of the tools present in a profile. These parameters usually are the sampling rate and the number of audio channels decoded at the same time.
AAC LC, AAC LTP, AAC Scalable, CELP, ER AAC LC, ER AAC LTP, ER AAC Scalable, ER CELP
2000
Low Delay Audio Profile
CELP, HVXC, TTSI, ER AAC LD, ER CELP, ER HVXC
2000
Natural Audio Profile
AAC Main, AAC LC, AAC SSR, AAC LTP, AAC Scalable, TwinVQ, CELP, HVXC, TTSI, ER AAC LC, ER AAC LTP, ER AAC Scalable, ER TwinVQ, ER BSAC, ER AAC LD, ER CELP, ER HVXC, ER HILN, ER Parametric
2000
Mobile Audio Internetworking Profile
ER AAC LC, ER AAC Scalable, ER TwinVQ, ER BSAC, ER AAC LD
There is no standard for transport of elementary streams over a channel, because the broad range of MPEG-4 applications have delivery requirements that are too wide to easily characterize with a single solution.
Transport in Real-time Transport Protocol is defined in RFC 3016 (RTP Payload Format for MPEG-4 Audio/Visual Streams), RFC 3640 (RTP Payload Format for Transport of MPEG-4 Elementary Streams), RFC 4281 (The Codecs Parameter for "Bucket" Media Types) and RFC 4337 (MIME Type Registration for MPEG-4).
LATM and LOAS were defined for natural audio applications, which do not require sophisticated object-based coding or other functions provided by MPEG-4 Systems.
The Advanced Audio Coding in MPEG-4 Part 3 (MPEG-4 Audio) Subpart 4 was enhanced relative to the previous standard MPEG-2 Part 7 (Advanced Audio Coding), in order to provide better sound quality for a given encoding bitrate.
It is assumed that any Part 3 and Part 7 differences will be ironed out by the ISO standards body in the near future to avoid the possibility of future bitstream incompatibilities. At present there are no known player or codec incompatibilities due to the newness of the standard.
The MPEG-2 Part 7 standard (Advanced Audio Coding) was first published in 1997 and offers three default profiles:[49][50] Low Complexity profile (LC), Main profile and Scalable Sampling Rate profile (SSR).
The MPEG-4 Part 3 Subpart 4 (General Audio Coding) combined the profiles from MPEG-2 Part 7 with Perceptual Noise Substitution (PNS) and defined them as Audio Object Types (AAC LC, AAC Main, AAC SSR).[4]
AAC Scalable Sample Rate was introduced by Sony to the MPEG-2 Part 7 and MPEG-4 Part 3 standards.[citation needed] It was first published in ISO/IEC 13818-7, Part 7: Advanced Audio Coding (AAC) in 1997.[49][50] The audio signal is first split into 4 bands using a 4 band polyphase quadrature filter bank. Then these 4 bands are further split using MDCTs with a size k of 32 or 256 samples. This is similar to normal AAC LC which uses MDCTs with a size k of 128 or 1024 directly on the audio signal.
The advantage of this technique is that short block switching can be done separately for every PQF band. So high frequencies can be encoded using a short block to enhance temporal resolution, low frequencies can be still encoded with high spectral resolution. However, due to aliasing between the 4 PQF bands, coding efficiency around (1,2,3) * fs/8 is worse than with normal MPEG-4 AAC LC.[citation needed]
MPEG-4 AAC-SSR is very similar to ATRAC and ATRAC-3.
Why AAC-SSR was introduced
The idea behind AAC-SSR was not only the advantage listed above, but also the possibility of reducing the data rate by removing 1, 2 or 3 of the upper PQF bands. A very simple bitstream splitter can remove these bands and thus reduce the bitrate and sample rate.
Note: although possible, the resulting quality is much worse than typical
for this bitrate. So for normal 64 kbit/s AAC LC a bandwidth of 14–16 kHz is
achieved by using intensity stereo and reduced NMRs. This degrades audible quality
less than transmitting 6 kHz bandwidth with perfect quality.
BSAC
Bit Sliced Arithmetic Coding is an MPEG-4 standard (ISO/IEC 14496-3 subpart 4) for scalable audio coding. BSAC uses an alternative noiseless coding to AAC, with the rest of the processing being identical to AAC. This support for scalability allows for nearly transparent sound quality at 64 kbit/s and graceful degradation at lower bit rates. BSAC coding is best performed in the range of 40 kbit/s to 64 kbit/s, though it operates in the range of 16 kbit/s to 64 kbit/s. The AAC-BSAC codec is used in Digital Multimedia Broadcasting (DMB) applications.
Licensing
In 2002, the MPEG-4 Audio Licensing Committee selected the Via Licensing Corporation as the Licensing Administrator for the MPEG-4 Audio patent pool.[3][51][52]
See also
TwinVQ – one of the object types defined in MPEG-4 Audio version 1
^D. Thom, H. Purnhagen, and the MPEG Audio Subgroup (October 1998). "MPEG Audio FAQ – MPEG-4". chiariglione.org. Archived from the original on 2012-02-05. Retrieved 2009-10-06.{{cite web}}: CS1 maint: multiple names: authors list (link)
^ ab
Scheirer, Eric D.; Ray, Lee (1998). "Algorithmic and Wavetable Synthesis in the MPEG-4 Multimedia Standard". Audio Engineering Society Convention 105, 1998. CiteSeerX10.1.1.35.2773. 2.2 Wavetable synthesis with SASBF: The SASBF wavetable-bank format had a somewhat complex history of development. The original specification was contributed by E-Mu Systems and was based on their "SoundFont" format [15]. After integration of this component in the MPEG-4 reference software was complete, the MIDI Manufacturers Association (MMA) approached MPEG requesting that MPEG-4 SASBF be compatible with their "Downloaded Sounds" format [13]. E-Mu agreed that this compatibility was desirable, and so a new format was negotiated and designed collaboratively by all parties.