Sound Representation

From Sound Waves to Numbers

When you clap your hands, speak, or play a guitar, you create sound waves — vibrations in the air that travel to your ears. These waves are analogue: they change continuously and smoothly, with an infinite number of values at every moment. If you could zoom in on a sound wave, you would always find more detail, no matter how closely you looked.

Computers, however, can only store digital data — discrete numbers made up of 0s and 1s. A computer cannot store a smooth, continuous wave directly. So how does your phone record your voice? How does Spotify stream a song? How does a game play sound effects?

The answer is sampling. To store sound digitally, a computer takes thousands of “snapshots” of the sound wave every second. Each snapshot captures the amplitude (height) of the wave at that exact moment and stores it as a number. When these snapshots are played back in rapid succession, your ears hear what sounds like a continuous, smooth wave — even though the computer is really just playing a series of numbers, one after another.

Think of it like a flip book. Each page shows a slightly different picture. When you flip through them quickly, you see smooth animation — even though each page is a single, frozen image. Sound sampling works the same way: each “page” is a single measurement of the wave, and playing them rapidly recreates the original sound.

Key Concept: Analogue vs Digital Analogue data is continuous — it has an infinite number of possible values and changes smoothly over time. Real-world sound is analogue. Digital data is discrete — it consists of separate, fixed values represented as binary numbers. Computers store everything digitally. To convert analogue sound into digital data, the computer uses sampling: measuring the amplitude of the sound wave at regular intervals and recording each measurement as a binary number.

Explore

How Sound Sampling Works

Imagine you are trying to describe a roller coaster’s shape to someone who cannot see it. You could measure the height of the track at regular intervals — every metre, say — and write down each measurement. The more frequently you measure, the more accurately your list of numbers captures the true shape of the track. Sound sampling works on exactly the same principle.

When a computer samples sound, two key settings determine the quality of the recording:

Sample Rate (Frequency)

The sample rate is how many samples (measurements) the computer takes per second. It is measured in Hertz (Hz), where 1 Hz means one sample per second.

Telephone quality: 8,000 Hz (8 kHz) — 8,000 samples every second
FM radio quality: 22,050 Hz (22.05 kHz)
CD quality: 44,100 Hz (44.1 kHz) — 44,100 samples every second
Professional audio / Blu-ray: 96,000 Hz or 192,000 Hz

A higher sample rate captures the sound wave more frequently, which means it can reproduce higher-pitched sounds more accurately. Think of our roller coaster analogy: measuring every 10 centimetres gives you far more detail than measuring every 5 metres.

Bit Depth (Sample Resolution)

The bit depth (also called sample resolution) is the number of bits used to store each individual sample. It determines how precisely the amplitude of the wave can be recorded.

8-bit: 2⁸ = 256 possible amplitude values
16-bit (CD quality): 2¹⁶ = 65,536 possible amplitude values
24-bit (professional): 2²⁴ = 16,777,216 possible amplitude values (over 16.7 million)

A higher bit depth means each sample can be recorded with greater precision. With only 8 bits, the computer must round the true amplitude to one of 256 levels, which can introduce a slight “graininess” called quantisation error. With 16 bits, there are 65,536 levels to choose from, so the rounding is far less noticeable. With 24 bits, the precision is so fine that the human ear cannot detect any rounding at all.

To summarise the roles of each setting:

Higher sample rate = better capture of frequency (high-pitched and rapidly changing sounds)
Higher bit depth = more precise capture of amplitude (volume levels and subtle dynamics)

The Quality vs File Size Trade-off

There is an unavoidable trade-off: better quality means larger files. Doubling the sample rate doubles the file size. Doubling the bit depth also doubles the file size. A CD-quality recording takes up roughly 80 times more space than the same recording at telephone quality. This is why different applications use different settings — a phone call does not need CD quality, and a streaming service must balance quality against bandwidth and storage.

Did You Know? The reason CD audio uses a sample rate of 44,100 Hz comes from the Nyquist theorem. This theorem states that to accurately capture a sound wave, you must sample at at least twice the highest frequency present. Human hearing ranges from about 20 Hz to 20,000 Hz (20 kHz). To capture the full range of human hearing, the sample rate must be at least 2 × 20,000 = 40,000 Hz. CD audio uses 44,100 Hz — just over double the upper limit — to provide a small safety margin. This is not a coincidence; it is deliberate engineering based on the mathematics of signal processing.

Understand

Calculating Sound File Size

One of the most commonly examined skills in GCSE Computer Science is calculating the file size of an uncompressed sound recording. The formula is straightforward:

The Formula

File size (bits) = sample rate × bit depth × duration (seconds)

For stereo audio (two channels — left and right), multiply by 2:

File size (bits) = sample rate × bit depth × duration × 2

To convert: bits ÷ 8 = bytes, then bytes ÷ 1,024 = KB, then KB ÷ 1,024 = MB.

Worked Example 1: CD-Quality Stereo (30 seconds)

Worked Example 1

  Sample rate:  44,100 Hz
  Bit depth:    16 bits
  Duration:     30 seconds
  Channels:     Stereo (2)

  File size = 44,100 × 16 × 30 × 2
           = 42,336,000 bits
           = 42,336,000 ÷ 8 = 5,292,000 bytes
           = 5,292,000 ÷ 1,024 = 5,168.0 KB
           = 5,168.0 ÷ 1,024 ≈ 5.05 MB

Just 30 seconds of uncompressed CD stereo audio takes over 5 MB. That gives you an idea of why music files are compressed!

Worked Example 2: Telephone-Quality Mono (10 seconds)

Worked Example 2

  Sample rate:  8,000 Hz
  Bit depth:    8 bits
  Duration:     10 seconds
  Channels:     Mono (1)

  File size = 8,000 × 8 × 10
           = 640,000 bits
           = 640,000 ÷ 8 = 80,000 bytes
           = 80,000 ÷ 1,024 ≈ 78.13 KB

The same 10 seconds at telephone quality takes under 80 KB — a tiny fraction of CD quality. The sound quality is much worse, but for a voice call it is perfectly adequate.

Worked Example 3: 3-Minute CD Stereo Song

Worked Example 3

  Sample rate:  44,100 Hz
  Bit depth:    16 bits
  Duration:     3 minutes = 180 seconds
  Channels:     Stereo (2)

  File size = 44,100 × 16 × 180 × 2
           = 254,016,000 bits
           = 254,016,000 ÷ 8 = 31,752,000 bytes
           = 31,752,000 ÷ 1,024 = 31,007.8 KB
           = 31,007.8 ÷ 1,024 ≈ 30.28 MB

A single 3-minute song takes about 30 MB uncompressed. An album of 12 songs could easily exceed 400 MB. This is why compression (like MP3 or AAC) is essential for practical music storage and streaming.

Quality Settings Comparison

Quality Level	Sample Rate	Bit Depth	Channels	~Size per Minute
Telephone	8,000 Hz	8 bits	Mono	~469 KB
FM Radio	22,050 Hz	16 bits	Stereo	~5.05 MB
CD	44,100 Hz	16 bits	Stereo	~10.09 MB
Professional	96,000 Hz	24 bits	Stereo	~32.96 MB

Important: The formula gives you the uncompressed file size. In practice, audio files are almost always compressed. MP3 compression typically reduces file size by about 90%, so a 30 MB uncompressed song might become a 3 MB MP3. Exam questions will specify whether they want the uncompressed size (most do) or will tell you a compression ratio to apply.

Case Study: How Spotify Compresses Audio

Spotify streams music to over 600 million users worldwide. Storing and transmitting uncompressed audio at that scale would be impossibly expensive, so Spotify uses lossy compression — specifically Ogg Vorbis (on desktop) and AAC (on mobile) formats.

Consider a typical 3-minute song:

Uncompressed (WAV): ~30 MB
Spotify Normal quality (96 kbps): ~2.2 MB
Spotify High quality (160 kbps): ~3.5 MB
Spotify Very High quality (320 kbps): ~7 MB

How does lossy compression shrink files so dramatically? It uses psychoacoustic models — algorithms based on how human hearing works — to remove sounds that most people cannot perceive. For example:

Frequency masking: A loud sound at one frequency makes nearby quieter sounds inaudible, so those quieter sounds are removed.
Temporal masking: A loud sound makes softer sounds immediately before and after it undetectable, so those are removed too.
Inaudible frequencies: Sounds above 20 kHz (beyond human hearing) are discarded entirely.

This approach enables streaming without consuming huge amounts of mobile data. At Normal quality, streaming for one hour uses roughly 43 MB of data — compared to about 605 MB if the audio were uncompressed. That is a reduction of over 93%.

Exercise 1: Sound File Size Calculator

Calculate the file size of the sound recording described below. Give your answer in MB, rounded to 2 decimal places.

Formula: file size (bits) = sample rate × bit depth × duration × channels. Then convert bits → bytes → KB → MB.

Score: 0 / 0

Exercise 2: Quality Comparison

Two audio recordings are described below. Determine which has higher quality, which has a larger file size, and calculate the exact file size of each in MB (to 2 decimal places).

Configuration A

Configuration B

Which recording has higher quality?

Configuration A Configuration B

Which recording has a larger file size?

Configuration A Configuration B

Size of A: MB

Size of B: MB

Score: 0 / 0

Practice

Test Yourself

Click on each question to reveal the answer. Try to work it out yourself first!

Q1: What is meant by the term “sampling” in the context of sound representation?

Answer: Sampling is the process of measuring the amplitude (height) of an analogue sound wave at regular intervals and recording each measurement as a digital (binary) number. These discrete measurements are called samples. When played back in rapid succession, the samples recreate an approximation of the original sound wave.

Q2: Explain the difference between sample rate and bit depth.

Answer:

Sample rate is the number of samples taken per second, measured in Hertz (Hz). A higher sample rate captures sound more frequently, improving the accuracy of high-frequency sounds. CD quality uses 44,100 Hz.
Bit depth is the number of bits used to store each sample. A higher bit depth means more possible amplitude values, so each sample is more precise. CD quality uses 16 bits (65,536 possible values).

In short: sample rate determines how often you measure, and bit depth determines how precisely you measure.

Q3: Calculate the file size of a 45-second mono recording at 22,050 Hz with a bit depth of 16 bits.

Answer:

  File size = 22,050 × 16 × 45 × 1
           = 15,876,000 bits
           = 15,876,000 ÷ 8 = 1,984,500 bytes
           = 1,984,500 ÷ 1,024 = 1,938.0 KB
           = 1,938.0 ÷ 1,024 ≈ 1.89 MB

Q4: A stereo audio file has a sample rate of 44,100 Hz, a bit depth of 16 bits, and a file size of approximately 60.56 MB. Estimate the duration of the recording.

Answer: Rearrange the formula to find duration:

  60.56 MB = 60.56 × 1,024 × 1,024 bytes
           = 63,504,957 bytes (approximately)
           = 63,504,957 × 8 = 508,039,654 bits

  Duration = bits ÷ (sample rate × bit depth × channels)
           = 508,039,654 ÷ (44,100 × 16 × 2)
           = 508,039,654 ÷ 1,411,200
           ≈ 360 seconds = 6 minutes

Q5: Explain why increasing the sample rate improves sound quality but also increases file size.

Answer: A higher sample rate means more samples are taken per second, so the digital recording captures the shape of the analogue wave more accurately. Rapid changes in the wave (high-frequency sounds) that would be missed at a low sample rate are caught at a higher one. However, each additional sample takes up space — more samples per second means more data per second. Since file size = sample rate × bit depth × duration, doubling the sample rate exactly doubles the file size.

Q6: What is the Nyquist theorem and why is it relevant to CD audio?

Answer: The Nyquist theorem states that to accurately capture and reproduce a sound wave, the sample rate must be at least twice the highest frequency present in the sound. Human hearing ranges up to approximately 20,000 Hz (20 kHz). To capture this full range, the sample rate must be at least 2 × 20,000 = 40,000 Hz. CD audio uses a sample rate of 44,100 Hz, which is slightly above this minimum to provide a safety margin. If the sample rate were less than twice the highest frequency, the recording would suffer from a distortion called aliasing, where high-frequency sounds are incorrectly represented.

Q7: A podcast is recorded in mono at 22,050 Hz with 16-bit depth. The same content is also recorded in stereo at 44,100 Hz with 24-bit depth. How many times larger is the stereo file?

Answer:

  Mono file:   22,050 × 16 × 1 = 352,800 bits per second
  Stereo file: 44,100 × 24 × 2 = 2,116,800 bits per second

  Ratio = 2,116,800 ÷ 352,800 = 6

The stereo file is 6 times larger than the mono file for the same duration. This is because the sample rate doubled (×2), the bit depth increased from 16 to 24 (×1.5), and it went from mono to stereo (×2). Combined: 2 × 1.5 × 2 = 6.

Understand

Key Vocabulary

Make sure you know all of these terms for your exam:

Term	Definition
Sampling	The process of measuring the amplitude of an analogue sound wave at regular intervals and storing each measurement as a digital value. This converts continuous sound into discrete data a computer can store.
Sample Rate	The number of samples taken per second, measured in Hertz (Hz). A higher sample rate captures more detail and higher frequencies. CD quality is 44,100 Hz.
Bit Depth	The number of bits used to represent each individual sample. A higher bit depth provides more possible amplitude values and greater precision. Also called sample resolution. CD quality is 16-bit.
Analogue	A continuous signal with an infinite number of possible values. Real-world sound is analogue — it varies smoothly and continuously over time.
Digital	Data represented as discrete values, typically stored as binary numbers (0s and 1s). Computers can only process digital data.
Hertz (Hz)	The unit of measurement for frequency. 1 Hz = one cycle (or one sample) per second. 1 kHz = 1,000 Hz. Used to express both sample rate and sound frequency.
Mono	A single audio channel. All sound is mixed into one stream. Uses half the storage of stereo.
Stereo	Two separate audio channels (left and right), creating a sense of spatial depth. Requires twice the storage of mono because every sample must be recorded for both channels.
Nyquist Theorem	A principle stating that the sample rate must be at least twice the highest frequency in the sound to accurately reproduce it. This is why CD audio samples at 44.1 kHz for sounds up to 20 kHz.

Explore

Exam Tips

Exam Tip 1: Memorise the Formula The file size formula comes up in almost every GCSE paper. Learn it by heart: file size (bits) = sample rate × bit depth × duration (seconds) × number of channels. Always show your full working in the exam — even if your final answer is slightly wrong, clear working can earn you method marks. Write the formula first, then substitute the values, then calculate step by step.

Exam Tip 2: Do Not Forget Stereo × 2 One of the most common mistakes in sound representation questions is forgetting to multiply by 2 for stereo. If a question says “stereo”, “two channels”, or “left and right”, you must multiply the result by 2. If it says “mono” or “single channel”, you do not. If the question does not mention channels at all, assume mono (1 channel) unless the context clearly implies stereo.

Exam Tip 3: Quality vs File Size Trade-off A very common exam question asks you to discuss the trade-off between quality and file size. A strong answer includes: (1) increasing sample rate captures more detail but produces larger files; (2) increasing bit depth gives more precise amplitude values but produces larger files; (3) the optimal settings depend on the application — a phone call does not need CD quality, but a music studio does; (4) compression can reduce file size but may reduce quality (lossy) or require decompression (lossless).

Exam Tip 4: Unit Conversions Many students lose marks on unit conversions. Remember the chain: bits ÷ 8 = bytes, bytes ÷ 1,024 = kilobytes (KB), KB ÷ 1,024 = megabytes (MB). Some questions ask for the answer in bits, some in bytes, some in KB, some in MB. Always check what unit the question asks for and convert accordingly. Show each conversion step clearly.

Exam Tip 5: Common Mistakes to Avoid

Using 1,000 instead of 1,024: In computing, 1 KB = 1,024 bytes and 1 MB = 1,024 KB. Some exam boards accept 1,000, but using 1,024 is more accurate and always accepted.
Forgetting to convert minutes to seconds: If the duration is given in minutes, you must convert to seconds first (multiply by 60) before using the formula.
Confusing the formula with image file size: Image size uses width × height × colour depth. Sound size uses sample rate × bit depth × duration. Do not mix them up.

Exam Practice

Past Paper Questions

Try these exam-style questions, then click to reveal the mark scheme answer.

Explain how sound is stored digitally in a computer. [3] marks

Mark scheme:

Sound waves are sampled at regular intervals (1)
The amplitude/height of each sample is recorded as a binary value (1)
The more samples taken per second (higher sample rate) and the more bits per sample (higher bit depth), the more accurate the representation (1)

A sound file is recorded at a sample rate of 44,100 Hz with a bit depth of 16 bits. The recording is 5 seconds long. Calculate the file size in bits. [2] marks

Mark scheme:

File size = sample rate × bit depth × duration (1)
= 44,100 × 16 × 5 = 3,528,000 bits (1)

Explain how increasing the sample rate affects the quality and file size of a digital audio recording. [2] marks

Mark scheme:

Higher sample rate means more samples taken per second, so the digital recording more closely matches the original sound / better quality (1)
More samples means more data to store, so the file size increases (1)

Reflect