Why Compress?
Every digital file — whether it is a photograph, a song, a video, or a document — is ultimately stored as a long sequence of binary data. The problem is that uncompressed files are enormous. A single uncompressed photograph from a modern phone camera can exceed 25 MB. A three-minute song stored as raw audio takes up roughly 30 MB. And a two-hour film in uncompressed 4K? That would be around 7 terabytes — more than most hard drives can hold.
Compression is the process of reducing the file size of data so that it takes up less storage space. This is essential for three reasons:
- Faster transmission: Smaller files can be sent over the internet more quickly. Streaming a 7 TB film would be impossible, but a compressed 5 GB version streams smoothly.
- Less storage: Your phone can hold thousands of compressed photos. Without compression, you might fit only a few hundred.
- Lower bandwidth: Compressed data uses less network capacity, which means less cost and less congestion on networks.
There are two fundamental types of compression, and understanding the difference between them is one of the most important concepts in your GCSE:
- Lossy compression — permanently removes some data to achieve much smaller file sizes. The original cannot be perfectly restored.
- Lossless compression — reduces file size without removing any data. The original can be perfectly reconstructed from the compressed version.
Compression Ratio = Original Size ÷ Compressed Size
For example, if a 20 MB file is compressed to 4 MB, the compression ratio is 20 ÷ 4 = 5:1 (read as “five to one”). This means the original is five times larger than the compressed version. You can also express compression as a percentage saved: (20 − 4) ÷ 20 × 100 = 80% reduction.
Lossy vs Lossless Compression
Lossy Compression
Lossy compression works by permanently removing data that is considered less important or less noticeable to human senses. Once this data is discarded, it is gone forever — the original file cannot be perfectly restored from the compressed version.
The trade-off is that lossy compression achieves dramatically smaller file sizes — often reducing files to just 5–10% of their original size. For media files where small reductions in quality are acceptable, this is an excellent deal.
Common lossy formats include:
- JPEG (images) — reduces colour detail and removes fine textures that the human eye barely notices. A quality slider lets you choose how much data to discard.
- MP3 (audio) — removes sounds at frequencies that the human ear cannot easily hear, such as very high-pitched tones or quiet sounds masked by louder ones.
- MP4 / H.265 (video) — compresses video by storing only the differences between frames rather than every complete frame, and by removing visual detail the eye will not notice.
Best for: photographs, music, podcasts, video streaming — any media where a small reduction in quality is acceptable in return for a much smaller file.
Important: Every time you re-save a lossy file, more data is lost. If you open a JPEG, edit it, and save it again as a JPEG, the quality degrades further. This is called generation loss, and it is why professionals keep uncompressed originals and only export to lossy formats at the end.
Lossless Compression
Lossless compression reduces file size without removing any data at all. The original file can be perfectly reconstructed — bit for bit — from the compressed version. It achieves this by finding patterns and redundancy in the data and encoding them more efficiently.
The trade-off is that lossless compression typically achieves a smaller reduction in file size compared to lossy — usually around 50–70% of the original, depending on the type of data.
Common lossless formats include:
- PNG (images) — perfect quality, commonly used for screenshots, logos, and images with sharp edges or text.
- FLAC (audio) — audiophile-quality music with no data loss, roughly half the size of uncompressed audio.
- ZIP / 7z (any file) — general-purpose lossless compression that can bundle and shrink any type of file.
- GIF (images) — lossless but limited to a palette of 256 colours, best for simple graphics and short animations.
Best for: text files, source code, medical images, legal documents, logos, spreadsheets — any situation where quality must be preserved perfectly and no data can be lost.
Comparison Table
| Feature | Lossy Compression | Lossless Compression |
|---|---|---|
| Data loss | Yes — some data is permanently removed | No — all original data is preserved |
| File size reduction | Very large (often 80–95%) | Moderate (typically 30–70%) |
| Reversibility | Irreversible — cannot recover original | Fully reversible — original perfectly restored |
| Example formats | JPEG, MP3, MP4, AAC, H.265 | PNG, FLAC, ZIP, GIF, 7z |
| Best use cases | Photos, music, video, streaming | Text, code, medical images, logos, archives |
| Re-saving | Each re-save degrades quality further | Can be saved and opened repeatedly with no loss |
How Compression Works
Run-Length Encoding (RLE) — A Simple Lossless Technique
Run-Length Encoding is one of the simplest and most intuitive compression algorithms. Instead of storing every single value individually, RLE replaces consecutive repeated values (called “runs”) with a single value and a count.
For example, instead of storing:
AAAAAABBBCCC
12 characters = 12 bytes
We can store:
6A3B3C
6 characters = 6 bytes (50% smaller!)
The compressed version tells us: “six As, three Bs, three Cs.” We can perfectly reconstruct the original, so this is lossless.
Worked Example: A 1-Bit Image Row
Imagine a row of pixels in a simple black-and-white image, where W = white and B = black:
Original pixel row: W W W W B B B B W W W
Original length: 11 values
RLE encoded: 4W 4B 3W
Encoded length: 6 values (3 pairs of count + colour)
Saving: 11 values reduced to 6 values = 45% smaller
To decode, simply expand each pair:
4W → W W W W
4B → B B B B
3W → W W W
Result: W W W W B B B B W W W (identical to original)
RLE works extremely well for data with lots of repetition — simple graphics, icons, fax transmissions, and bitmap images with large areas of uniform colour. It works poorly on data with little repetition (e.g., a photograph with constantly varying colours), where the encoded version might even be larger than the original.
Huffman Coding — Brief Explanation
Huffman coding is a more sophisticated lossless technique. It works by analysing how frequently each value appears in the data and assigning shorter binary codes to the most common values and longer codes to the rarer values.
Think of Morse code as an analogy: the letter E (the most common letter in English) is encoded as just a single dot ( · ), while the rare letter Q is encoded as a much longer sequence ( – – · – ). Huffman coding applies this same idea automatically to any data.
For example, if the letter ‘e’ appears 1,000 times in a text file and the letter ‘z’ appears only 3 times, Huffman coding would assign a very short binary code (perhaps just 2 bits) to ‘e’ and a longer code (perhaps 10 bits) to ‘z’. Overall, the total number of bits needed to encode the entire file is minimised.
How JPEG Compression Works (Simplified)
JPEG is the most widely used lossy image format. Understanding the basic steps helps you explain why lossy compression removes data without making images look terrible:
- Divide the image into 8×8 blocks: The image is split into small squares of 8 by 8 pixels (64 pixels each).
- Apply a mathematical transformation: A technique called the Discrete Cosine Transform (DCT) converts each block from pixel values into frequency information — essentially separating the “smooth gradients” from the “fine details.”
- Remove fine detail: The high-frequency data (sharp edges, tiny textures) is reduced or discarded. Human eyes are much more sensitive to gradual colour changes than to fine detail, so this removal is often barely noticeable.
- Quality slider: When you save a JPEG, the “quality” setting controls how aggressively step 3 discards data. Quality 100 keeps almost everything (large file). Quality 10 discards most of the detail (tiny file, visible degradation).
Real-World Case Studies
A typical three-minute song stored as uncompressed audio (CD quality WAV) takes up approximately 30 MB. Spotify uses lossy compression (Ogg Vorbis and AAC codecs) to shrink these files dramatically:
- Normal quality (160 kbps): approximately 3.5 MB — a compression ratio of about 8.5:1
- High quality (256 kbps): approximately 5.5 MB — a compression ratio of about 5.5:1
- Very high quality (320 kbps): approximately 7 MB — a compression ratio of about 4.3:1
How does it achieve this? The lossy codec analyses the audio and identifies frequencies that humans can barely hear — very high-pitched sounds above 16 kHz, quiet tones masked by louder simultaneous sounds, and extremely subtle variations. These are removed or simplified. Most listeners cannot tell the difference between 320 kbps and uncompressed audio.
Without compression, Spotify’s library of over 100 million tracks would require vastly more server storage, and streaming over mobile data would be impractical.
Hospitals store X-rays, CT scans, and MRI images in a format called DICOM (Digital Imaging and Communications in Medicine). These images must use lossless compression — lossy formats like JPEG are forbidden for diagnostic images.
Why? Because even a tiny amount of data loss could mean the difference between spotting a small tumour and missing it, or correctly identifying a hairline fracture and overlooking it. A radiologist needs to see every single detail in the image, exactly as the scanner captured it.
Lossless compression still achieves a useful reduction of 50–70% in file size while preserving perfect quality. For a hospital generating thousands of scans per day, this saving in storage is significant — without compromising patient safety.
This is where compression truly works miracles. Consider the numbers:
- A single frame of 4K video (3840 × 2160 pixels, 24-bit colour) = approximately 24 MB
- At 30 frames per second, one second of video = approximately 720 MB
- A two-hour film at this rate = approximately 5.2 TB uncompressed
Modern video codecs like H.265 (HEVC) compress this to just 5–15 GB — a compression ratio of around 500:1. They achieve this by only storing the differences between consecutive frames (since most of the image stays the same from one frame to the next) and by applying lossy techniques to remove visual detail the eye will not notice during motion.
Without compression, services like Netflix and YouTube simply could not exist. No internet connection could stream 720 MB per second of uncompressed video, and no server could store billions of uncompressed films.
Interactive Exercise 1: Run-Length Encoding
Practice encoding and decoding with RLE. Use the buttons to switch mode.
Encode: Convert the repeated string into RLE notation (e.g., AAABBB → 3A3B)
Interactive Exercise 2: Compression Ratio Calculator
Calculate the compression ratio (e.g., 5:1) and the percentage saved (e.g., 80%) from the given file sizes.
Interactive Exercise 3: Lossy or Lossless?
Read the scenario and decide: would lossy or lossless compression be more appropriate?
Test Yourself
Click on each question to reveal the answer. Try to work it out yourself first!
Answer:
Lossy compression permanently removes data from the file that is considered less important or less noticeable to human senses. The original file cannot be perfectly reconstructed from the compressed version. It achieves very large reductions in file size (often 80–95%).
Lossless compression reduces file size by finding and encoding patterns more efficiently, without removing any data. The original file can be perfectly reconstructed from the compressed version. It typically achieves a smaller reduction (around 50–70%).
The key difference is reversibility: lossless is fully reversible; lossy is not.
Answer:
Lossy: JPEG (images) and MP3 (audio). Other acceptable answers include MP4, AAC, or H.265.
Lossless: PNG (images) and ZIP (any file). Other acceptable answers include FLAC, GIF, or 7z.
For full marks, name the format and state what type of file it is used for.
Answer:
Hospitals use lossless compression because medical images such as X-rays, CT scans, and MRIs are used to diagnose illnesses and injuries. Even a tiny amount of data loss from lossy compression could mean that a small tumour, fracture, or other abnormality is no longer visible in the image. This could lead to a missed diagnosis, which could be life-threatening.
Lossless compression still reduces file sizes by 50–70%, which saves significant storage, while preserving every detail exactly as the scanner captured it. Patient safety requires that no data is lost.
Answer:
Run-length encoding (RLE) is a lossless compression technique that replaces consecutive repeated values (“runs”) with a single count-value pair.
Example: The string WWWWWBBBBRRR (12 characters) would be encoded as 5W4B3R (6 characters) — a 50% reduction. To decompress, you simply expand each pair: 5W becomes WWWWW, 4B becomes BBBB, and 3R becomes RRR.
RLE works best on data with long runs of identical values, such as simple images with large areas of uniform colour.
Answer:
Lossy compression is used for music streaming because:
- An uncompressed three-minute song is approximately 30 MB. Lossy compression reduces this to around 3–7 MB, making streaming over mobile data practical.
- The lossy codec removes sounds that humans can barely hear — such as very high frequencies and quiet tones masked by louder sounds. Most listeners cannot perceive the difference between a high-quality lossy stream and the uncompressed original.
- With millions of users streaming simultaneously, the bandwidth and storage savings are enormous. Without lossy compression, the service would not be economically or technically feasible.
The small, largely imperceptible reduction in quality is an acceptable trade-off for dramatically smaller file sizes.
Answer:
Compression ratio: 15 ÷ 3 = 5:1 (the original is 5 times larger than the compressed version).
Percentage saved: (15 − 3) ÷ 15 × 100 = 12 ÷ 15 × 100 = 80%.
Always show your working in the exam. State the formula, substitute the values, and give the final answer with the correct unit (:1 for ratio, % for percentage).
Answer:
Compression is essential for video streaming because uncompressed video files are extraordinarily large. A single second of 4K video contains approximately 30 frames, each with over 8 million pixels in 24-bit colour — totalling roughly 720 MB per second. A two-hour film would be approximately 5 TB uncompressed.
No home internet connection could stream 720 MB per second, and no practical server could store billions of uncompressed films. Video codecs like H.265 compress this to just 5–15 GB by:
- Storing only the differences between consecutive frames (since most of the image stays the same from frame to frame)
- Removing visual detail that the human eye will not notice during motion
Without compression, services like Netflix and YouTube simply could not exist.
Key Vocabulary
Make sure you know all of these terms for your exam:
| Term | Definition |
|---|---|
| Compression | The process of reducing the file size of data so it takes up less storage space and can be transmitted more quickly. There are two types: lossy and lossless. |
| Lossy Compression | A type of compression that permanently removes data considered less important. The original file cannot be perfectly reconstructed. Examples: JPEG, MP3, MP4. |
| Lossless Compression | A type of compression that reduces file size without removing any data. The original file can be perfectly reconstructed from the compressed version. Examples: PNG, FLAC, ZIP. |
| Run-Length Encoding (RLE) | A simple lossless compression technique that replaces consecutive repeated values with a count-value pair. For example, AAABBB becomes 3A3B. |
| Huffman Coding | A lossless compression technique that assigns shorter binary codes to frequently occurring values and longer codes to rare values, minimising the total number of bits needed. |
| Compression Ratio | A measure of how much a file has been compressed, calculated as original size divided by compressed size. A ratio of 5:1 means the original is five times larger than the compressed version. |
| Bit Rate | The number of bits processed or transmitted per second, typically measured in kilobits per second (kbps). Higher bit rates generally mean better quality but larger files. For example, Spotify streams at 160–320 kbps. |
| Codec | Short for “coder-decoder.” A codec is a program or algorithm that compresses (encodes) and decompresses (decodes) data. Examples include H.265 for video and MP3 for audio. |
Exam Tips
- Compression ratio: Original ÷ Compressed = ratio. Write it as X:1.
- Percentage saved: (Original − Compressed) ÷ Original × 100 = percentage.
Past Paper Questions
Try these exam-style questions, then click to reveal the mark scheme answer.
Explain the difference between lossy and lossless compression. Give an example of when each type would be appropriate. 4 marks
Mark scheme:
Lossy (2 marks):
- Permanently removes data / some quality is lost / cannot be reversed (1)
- Appropriate for: streaming media / web images / where smaller size matters more than perfect quality (e.g. MP3, JPEG) (1)
Lossless (2 marks):
- No data is lost / original can be perfectly reconstructed (1)
- Appropriate for: text files / program code / medical images / where accuracy is essential (e.g. PNG, ZIP) (1)
Apply Run Length Encoding (RLE) to compress the following data: AAABBBBCCCCCCDD 2 marks
Mark scheme:
Compressed: 3A 4B 6C 2D (1 mark)
Original = 15 characters, compressed = 8 characters — demonstrating the data has been reduced in size (1 mark)
Give two reasons why compression is used when transmitting files over the internet. 2 marks
Mark scheme:
- Reduces the file size so it takes less time to transmit / download (1)
- Uses less bandwidth / allows more data to be sent in the same time / reduces storage requirements on servers (1)
Compression in Everyday Life
Compression is one of those technologies that is completely invisible yet absolutely essential to modern digital life. Every time you do any of these things, compression is working behind the scenes:
- Take a photo on your phone: Your camera sensor captures raw data, but the image is immediately compressed to JPEG before being saved. Without this, your phone’s storage would fill up 5–10 times faster.
- Stream a song on Spotify or Apple Music: The original studio recording is compressed from approximately 30 MB to 3–7 MB using lossy codecs, allowing it to stream smoothly over mobile data.
- Watch a video on YouTube or Netflix: The video is compressed from potentially terabytes of raw footage into a few gigabytes, then streamed at a bit rate your internet connection can handle.
- Send a ZIP file by email: Lossless compression bundles and shrinks your files so they can be attached to an email without exceeding the size limit.
- Browse the web: Images on websites are compressed (JPEG, PNG, or the modern WebP format) so pages load quickly rather than taking minutes.
The next time you take a photo, stream a song, or watch a video, think about the incredible mathematics happening behind the scenes — shrinking millions or billions of bytes into something that fits in your pocket and streams over the air. Compression is one of computer science’s greatest achievements, and understanding it gives you real insight into how the digital world works.
Video Resources
Further Reading
- BBC Bitesize — Edexcel GCSE Computer Science — Comprehensive coverage of compression including lossy, lossless, and RLE
- Isaac Computer Science — Data Representation — In-depth explanations of compression techniques with worked examples
- GCSE Topic 2: Data Representation — Interactive revision tools covering compression and file formats