From Sound Waves to Numbers
When you clap your hands, speak, or play a guitar, you create sound waves — vibrations in the air that travel to your ears. These waves are analogue: they change continuously and smoothly, with an infinite number of values at every moment. If you could zoom in on a sound wave, you would always find more detail, no matter how closely you looked.
Computers, however, can only store digital data — discrete numbers made up of 0s and 1s. A computer cannot store a smooth, continuous wave directly. So how does your phone record your voice? How does Spotify stream a song? How does a game play sound effects?
The answer is sampling. To store sound digitally, a computer takes thousands of “snapshots” of the sound wave every second. Each snapshot captures the amplitude (height) of the wave at that exact moment and stores it as a number. When these snapshots are played back in rapid succession, your ears hear what sounds like a continuous, smooth wave — even though the computer is really just playing a series of numbers, one after another.
Think of it like a flip book. Each page shows a slightly different picture. When you flip through them quickly, you see smooth animation — even though each page is a single, frozen image. Sound sampling works the same way: each “page” is a single measurement of the wave, and playing them rapidly recreates the original sound.
How Sound Sampling Works
Imagine you are trying to describe a roller coaster’s shape to someone who cannot see it. You could measure the height of the track at regular intervals — every metre, say — and write down each measurement. The more frequently you measure, the more accurately your list of numbers captures the true shape of the track. Sound sampling works on exactly the same principle.
When a computer samples sound, two key settings determine the quality of the recording:
Sample Rate (Frequency)
The sample rate is how many samples (measurements) the computer takes per second. It is measured in Hertz (Hz), where 1 Hz means one sample per second.
- Telephone quality: 8,000 Hz (8 kHz) — 8,000 samples every second
- FM radio quality: 22,050 Hz (22.05 kHz)
- CD quality: 44,100 Hz (44.1 kHz) — 44,100 samples every second
- Professional audio / Blu-ray: 96,000 Hz or 192,000 Hz
A higher sample rate captures the sound wave more frequently, which means it can reproduce higher-pitched sounds more accurately. Think of our roller coaster analogy: measuring every 10 centimetres gives you far more detail than measuring every 5 metres.
Bit Depth (Sample Resolution)
The bit depth (also called sample resolution) is the number of bits used to store each individual sample. It determines how precisely the amplitude of the wave can be recorded.
- 8-bit: 28 = 256 possible amplitude values
- 16-bit (CD quality): 216 = 65,536 possible amplitude values
- 24-bit (professional): 224 = 16,777,216 possible amplitude values (over 16.7 million)
A higher bit depth means each sample can be recorded with greater precision. With only 8 bits, the computer must round the true amplitude to one of 256 levels, which can introduce a slight “graininess” called quantisation error. With 16 bits, there are 65,536 levels to choose from, so the rounding is far less noticeable. With 24 bits, the precision is so fine that the human ear cannot detect any rounding at all.
To summarise the roles of each setting:
- Higher sample rate = better capture of frequency (high-pitched and rapidly changing sounds)
- Higher bit depth = more precise capture of amplitude (volume levels and subtle dynamics)
The Quality vs File Size Trade-off
There is an unavoidable trade-off: better quality means larger files. Doubling the sample rate doubles the file size. Doubling the bit depth also doubles the file size. A CD-quality recording takes up roughly 80 times more space than the same recording at telephone quality. This is why different applications use different settings — a phone call does not need CD quality, and a streaming service must balance quality against bandwidth and storage.
Calculating Sound File Size
One of the most commonly examined skills in GCSE Computer Science is calculating the file size of an uncompressed sound recording. The formula is straightforward:
File size (bits) = sample rate × bit depth × duration (seconds)
For stereo audio (two channels — left and right), multiply by 2:
File size (bits) = sample rate × bit depth × duration × 2
To convert: bits ÷ 8 = bytes, then bytes ÷ 1,024 = KB, then KB ÷ 1,024 = MB.
Worked Example 1: CD-Quality Stereo (30 seconds)
Sample rate: 44,100 Hz
Bit depth: 16 bits
Duration: 30 seconds
Channels: Stereo (2)
File size = 44,100 × 16 × 30 × 2
= 42,336,000 bits
= 42,336,000 ÷ 8 = 5,292,000 bytes
= 5,292,000 ÷ 1,024 = 5,168.0 KB
= 5,168.0 ÷ 1,024 ≈ 5.05 MB
Just 30 seconds of uncompressed CD stereo audio takes over 5 MB. That gives you an idea of why music files are compressed!
Worked Example 2: Telephone-Quality Mono (10 seconds)
Sample rate: 8,000 Hz
Bit depth: 8 bits
Duration: 10 seconds
Channels: Mono (1)
File size = 8,000 × 8 × 10
= 640,000 bits
= 640,000 ÷ 8 = 80,000 bytes
= 80,000 ÷ 1,024 ≈ 78.13 KB
The same 10 seconds at telephone quality takes under 80 KB — a tiny fraction of CD quality. The sound quality is much worse, but for a voice call it is perfectly adequate.
Worked Example 3: 3-Minute CD Stereo Song
Sample rate: 44,100 Hz
Bit depth: 16 bits
Duration: 3 minutes = 180 seconds
Channels: Stereo (2)
File size = 44,100 × 16 × 180 × 2
= 254,016,000 bits
= 254,016,000 ÷ 8 = 31,752,000 bytes
= 31,752,000 ÷ 1,024 = 31,007.8 KB
= 31,007.8 ÷ 1,024 ≈ 30.28 MB
A single 3-minute song takes about 30 MB uncompressed. An album of 12 songs could easily exceed 400 MB. This is why compression (like MP3 or AAC) is essential for practical music storage and streaming.
Quality Settings Comparison
| Quality Level | Sample Rate | Bit Depth | Channels | ~Size per Minute |
|---|---|---|---|---|
| Telephone | 8,000 Hz | 8 bits | Mono | ~469 KB |
| FM Radio | 22,050 Hz | 16 bits | Stereo | ~5.05 MB |
| CD | 44,100 Hz | 16 bits | Stereo | ~10.09 MB |
| Professional | 96,000 Hz | 24 bits | Stereo | ~32.96 MB |
Spotify streams music to over 600 million users worldwide. Storing and transmitting uncompressed audio at that scale would be impossibly expensive, so Spotify uses lossy compression — specifically Ogg Vorbis (on desktop) and AAC (on mobile) formats.
Consider a typical 3-minute song:
- Uncompressed (WAV): ~30 MB
- Spotify Normal quality (96 kbps): ~2.2 MB
- Spotify High quality (160 kbps): ~3.5 MB
- Spotify Very High quality (320 kbps): ~7 MB
How does lossy compression shrink files so dramatically? It uses psychoacoustic models — algorithms based on how human hearing works — to remove sounds that most people cannot perceive. For example:
- Frequency masking: A loud sound at one frequency makes nearby quieter sounds inaudible, so those quieter sounds are removed.
- Temporal masking: A loud sound makes softer sounds immediately before and after it undetectable, so those are removed too.
- Inaudible frequencies: Sounds above 20 kHz (beyond human hearing) are discarded entirely.
This approach enables streaming without consuming huge amounts of mobile data. At Normal quality, streaming for one hour uses roughly 43 MB of data — compared to about 605 MB if the audio were uncompressed. That is a reduction of over 93%.
Exercise 1: Sound File Size Calculator
Calculate the file size of the sound recording described below. Give your answer in MB, rounded to 2 decimal places.
Formula: file size (bits) = sample rate × bit depth × duration × channels. Then convert bits → bytes → KB → MB.
Exercise 2: Quality Comparison
Two audio recordings are described below. Determine which has higher quality, which has a larger file size, and calculate the exact file size of each in MB (to 2 decimal places).
Which recording has higher quality?
Which recording has a larger file size?
Test Yourself
Click on each question to reveal the answer. Try to work it out yourself first!
Answer: Sampling is the process of measuring the amplitude (height) of an analogue sound wave at regular intervals and recording each measurement as a digital (binary) number. These discrete measurements are called samples. When played back in rapid succession, the samples recreate an approximation of the original sound wave.
Answer:
- Sample rate is the number of samples taken per second, measured in Hertz (Hz). A higher sample rate captures sound more frequently, improving the accuracy of high-frequency sounds. CD quality uses 44,100 Hz.
- Bit depth is the number of bits used to store each sample. A higher bit depth means more possible amplitude values, so each sample is more precise. CD quality uses 16 bits (65,536 possible values).
In short: sample rate determines how often you measure, and bit depth determines how precisely you measure.
Answer:
File size = 22,050 × 16 × 45 × 1
= 15,876,000 bits
= 15,876,000 ÷ 8 = 1,984,500 bytes
= 1,984,500 ÷ 1,024 = 1,938.0 KB
= 1,938.0 ÷ 1,024 ≈ 1.89 MB
Answer: Rearrange the formula to find duration:
60.56 MB = 60.56 × 1,024 × 1,024 bytes
= 63,504,957 bytes (approximately)
= 63,504,957 × 8 = 508,039,654 bits
Duration = bits ÷ (sample rate × bit depth × channels)
= 508,039,654 ÷ (44,100 × 16 × 2)
= 508,039,654 ÷ 1,411,200
≈ 360 seconds = 6 minutes
Answer: A higher sample rate means more samples are taken per second, so the digital recording captures the shape of the analogue wave more accurately. Rapid changes in the wave (high-frequency sounds) that would be missed at a low sample rate are caught at a higher one. However, each additional sample takes up space — more samples per second means more data per second. Since file size = sample rate × bit depth × duration, doubling the sample rate exactly doubles the file size.
Answer: The Nyquist theorem states that to accurately capture and reproduce a sound wave, the sample rate must be at least twice the highest frequency present in the sound. Human hearing ranges up to approximately 20,000 Hz (20 kHz). To capture this full range, the sample rate must be at least 2 × 20,000 = 40,000 Hz. CD audio uses a sample rate of 44,100 Hz, which is slightly above this minimum to provide a safety margin. If the sample rate were less than twice the highest frequency, the recording would suffer from a distortion called aliasing, where high-frequency sounds are incorrectly represented.
Answer:
Mono file: 22,050 × 16 × 1 = 352,800 bits per second
Stereo file: 44,100 × 24 × 2 = 2,116,800 bits per second
Ratio = 2,116,800 ÷ 352,800 = 6
The stereo file is 6 times larger than the mono file for the same duration. This is because the sample rate doubled (×2), the bit depth increased from 16 to 24 (×1.5), and it went from mono to stereo (×2). Combined: 2 × 1.5 × 2 = 6.
Key Vocabulary
Make sure you know all of these terms for your exam:
| Term | Definition |
|---|---|
| Sampling | The process of measuring the amplitude of an analogue sound wave at regular intervals and storing each measurement as a digital value. This converts continuous sound into discrete data a computer can store. |
| Sample Rate | The number of samples taken per second, measured in Hertz (Hz). A higher sample rate captures more detail and higher frequencies. CD quality is 44,100 Hz. |
| Bit Depth | The number of bits used to represent each individual sample. A higher bit depth provides more possible amplitude values and greater precision. Also called sample resolution. CD quality is 16-bit. |
| Analogue | A continuous signal with an infinite number of possible values. Real-world sound is analogue — it varies smoothly and continuously over time. |
| Digital | Data represented as discrete values, typically stored as binary numbers (0s and 1s). Computers can only process digital data. |
| Hertz (Hz) | The unit of measurement for frequency. 1 Hz = one cycle (or one sample) per second. 1 kHz = 1,000 Hz. Used to express both sample rate and sound frequency. |
| Mono | A single audio channel. All sound is mixed into one stream. Uses half the storage of stereo. |
| Stereo | Two separate audio channels (left and right), creating a sense of spatial depth. Requires twice the storage of mono because every sample must be recorded for both channels. |
| Nyquist Theorem | A principle stating that the sample rate must be at least twice the highest frequency in the sound to accurately reproduce it. This is why CD audio samples at 44.1 kHz for sounds up to 20 kHz. |
Exam Tips
- Using 1,000 instead of 1,024: In computing, 1 KB = 1,024 bytes and 1 MB = 1,024 KB. Some exam boards accept 1,000, but using 1,024 is more accurate and always accepted.
- Forgetting to convert minutes to seconds: If the duration is given in minutes, you must convert to seconds first (multiply by 60) before using the formula.
- Confusing the formula with image file size: Image size uses width × height × colour depth. Sound size uses sample rate × bit depth × duration. Do not mix them up.
Past Paper Questions
Try these exam-style questions, then click to reveal the mark scheme answer.
Explain how sound is stored digitally in a computer. [3] marks
Mark scheme:
- Sound waves are sampled at regular intervals (1)
- The amplitude/height of each sample is recorded as a binary value (1)
- The more samples taken per second (higher sample rate) and the more bits per sample (higher bit depth), the more accurate the representation (1)
A sound file is recorded at a sample rate of 44,100 Hz with a bit depth of 16 bits. The recording is 5 seconds long. Calculate the file size in bits. [2] marks
Mark scheme:
- File size = sample rate × bit depth × duration (1)
- = 44,100 × 16 × 5 = 3,528,000 bits (1)
Explain how increasing the sample rate affects the quality and file size of a digital audio recording. [2] marks
Mark scheme:
- Higher sample rate means more samples taken per second, so the digital recording more closely matches the original sound / better quality (1)
- More samples means more data to store, so the file size increases (1)
Sound Representation in Everyday Life
Sound representation is at work every time you interact with digital audio — which, in the modern world, is almost constantly:
- Every phone call you make involves your voice being sampled thousands of times per second, compressed, transmitted over a network, decompressed, and played through a speaker — all in real time with barely perceptible delay.
- Every song you stream on Spotify, Apple Music, or YouTube Music was originally recorded at high quality, then compressed using lossy algorithms that discard sounds humans cannot easily hear, reducing file sizes by 80–95%.
- Every voice assistant (Siri, Alexa, Google Assistant) samples your speech, converts the samples into data, and sends that data to servers where it is analysed and interpreted — all beginning with the simple act of sampling an analogue wave.
- Every video game uses sound sampling for effects, music, and dialogue. Game developers carefully balance audio quality against file size to fit everything onto a disc or download within reasonable limits.
Understanding how computers represent sound gives you insight into the engineering behind every podcast you listen to, every video call you join, and every voice note you send. The principles are simple — sample rate, bit depth, and duration — but they underpin a vast, multi-billion-pound industry that shapes how we communicate and experience music every day.
Video Resources
Further Reading
- BBC Bitesize — Edexcel GCSE Computer Science — Full specification coverage for data representation including sound
- Isaac Computer Science — Data Representation — In-depth explanations of sound sampling, file sizes, and compression
- GCSE Topic 2: Data Representation — Interactive revision tools and sound representation activities