Compression

Why Compress?

Every digital file — whether it is a photograph, a song, a video, or a document — is ultimately stored as a long sequence of binary data. The problem is that uncompressed files are enormous. A single uncompressed photograph from a modern phone camera can exceed 25 MB. A three-minute song stored as raw audio takes up roughly 30 MB. And a two-hour film in uncompressed 4K? That would be around 7 terabytes — more than most hard drives can hold.

Compression is the process of reducing the file size of data so that it takes up less storage space. This is essential for three reasons:

Faster transmission: Smaller files can be sent over the internet more quickly. Streaming a 7 TB film would be impossible, but a compressed 5 GB version streams smoothly.
Less storage: Your phone can hold thousands of compressed photos. Without compression, you might fit only a few hundred.
Lower bandwidth: Compressed data uses less network capacity, which means less cost and less congestion on networks.

There are two fundamental types of compression, and understanding the difference between them is one of the most important concepts in your GCSE:

Lossy compression — permanently removes some data to achieve much smaller file sizes. The original cannot be perfectly restored.
Lossless compression — reduces file size without removing any data. The original can be perfectly reconstructed from the compressed version.

Key Concept: Compression Ratio The compression ratio measures how much a file has been shrunk. It is calculated as:

Compression Ratio = Original Size ÷ Compressed Size

For example, if a 20 MB file is compressed to 4 MB, the compression ratio is 20 ÷ 4 = 5:1 (read as “five to one”). This means the original is five times larger than the compressed version. You can also express compression as a percentage saved: (20 − 4) ÷ 20 × 100 = 80% reduction.

Explore

Lossy vs Lossless Compression

Lossy Compression

Lossy compression works by permanently removing data that is considered less important or less noticeable to human senses. Once this data is discarded, it is gone forever — the original file cannot be perfectly restored from the compressed version.

The trade-off is that lossy compression achieves dramatically smaller file sizes — often reducing files to just 5–10% of their original size. For media files where small reductions in quality are acceptable, this is an excellent deal.

Common lossy formats include:

JPEG (images) — reduces colour detail and removes fine textures that the human eye barely notices. A quality slider lets you choose how much data to discard.
MP3 (audio) — removes sounds at frequencies that the human ear cannot easily hear, such as very high-pitched tones or quiet sounds masked by louder ones.
MP4 / H.265 (video) — compresses video by storing only the differences between frames rather than every complete frame, and by removing visual detail the eye will not notice.

Best for: photographs, music, podcasts, video streaming — any media where a small reduction in quality is acceptable in return for a much smaller file.

Important: Every time you re-save a lossy file, more data is lost. If you open a JPEG, edit it, and save it again as a JPEG, the quality degrades further. This is called generation loss, and it is why professionals keep uncompressed originals and only export to lossy formats at the end.

Lossless Compression

Lossless compression reduces file size without removing any data at all. The original file can be perfectly reconstructed — bit for bit — from the compressed version. It achieves this by finding patterns and redundancy in the data and encoding them more efficiently.

The trade-off is that lossless compression typically achieves a smaller reduction in file size compared to lossy — usually around 50–70% of the original, depending on the type of data.

Common lossless formats include:

PNG (images) — perfect quality, commonly used for screenshots, logos, and images with sharp edges or text.
FLAC (audio) — audiophile-quality music with no data loss, roughly half the size of uncompressed audio.
ZIP / 7z (any file) — general-purpose lossless compression that can bundle and shrink any type of file.
GIF (images) — lossless but limited to a palette of 256 colours, best for simple graphics and short animations.

Best for: text files, source code, medical images, legal documents, logos, spreadsheets — any situation where quality must be preserved perfectly and no data can be lost.

Comparison Table

Feature	Lossy Compression	Lossless Compression
Data loss	Yes — some data is permanently removed	No — all original data is preserved
File size reduction	Very large (often 80–95%)	Moderate (typically 30–70%)
Reversibility	Irreversible — cannot recover original	Fully reversible — original perfectly restored
Example formats	JPEG, MP3, MP4, AAC, H.265	PNG, FLAC, ZIP, GIF, 7z
Best use cases	Photos, music, video, streaming	Text, code, medical images, logos, archives
Re-saving	Each re-save degrades quality further	Can be saved and opened repeatedly with no loss

Watch Out: Students often say that lossy compression “loses quality.” Be more precise in your exam answers: lossy compression permanently removes data that cannot be recovered. Saying “removes data” is more accurate and scientifically precise than “loses quality,” and examiners reward precise language.

Understand

How Compression Works

Run-Length Encoding (RLE) — A Simple Lossless Technique

Run-Length Encoding is one of the simplest and most intuitive compression algorithms. Instead of storing every single value individually, RLE replaces consecutive repeated values (called “runs”) with a single value and a count.

For example, instead of storing:

Before RLE Compression

  AAAAAABBBCCC

  12 characters = 12 bytes

We can store:

After RLE Compression

  6A3B3C

  6 characters = 6 bytes    (50% smaller!)

The compressed version tells us: “six As, three Bs, three Cs.” We can perfectly reconstruct the original, so this is lossless.

Worked Example: A 1-Bit Image Row

Imagine a row of pixels in a simple black-and-white image, where W = white and B = black:

RLE Worked Example — Image Row

  Original pixel row:  W W W W B B B B W W W
  Original length:     11 values

  RLE encoded:         4W 4B 3W
  Encoded length:      6 values  (3 pairs of count + colour)

  Saving:  11 values reduced to 6 values = 45% smaller

  To decode, simply expand each pair:
    4W  →  W W W W
    4B  →  B B B B
    3W  →  W W W
  Result:  W W W W B B B B W W W  (identical to original)

RLE works extremely well for data with lots of repetition — simple graphics, icons, fax transmissions, and bitmap images with large areas of uniform colour. It works poorly on data with little repetition (e.g., a photograph with constantly varying colours), where the encoded version might even be larger than the original.

Huffman Coding — Brief Explanation

Huffman coding is a more sophisticated lossless technique. It works by analysing how frequently each value appears in the data and assigning shorter binary codes to the most common values and longer codes to the rarer values.

Think of Morse code as an analogy: the letter E (the most common letter in English) is encoded as just a single dot ( · ), while the rare letter Q is encoded as a much longer sequence ( – – · – ). Huffman coding applies this same idea automatically to any data.

For example, if the letter ‘e’ appears 1,000 times in a text file and the letter ‘z’ appears only 3 times, Huffman coding would assign a very short binary code (perhaps just 2 bits) to ‘e’ and a longer code (perhaps 10 bits) to ‘z’. Overall, the total number of bits needed to encode the entire file is minimised.

Try This: Think about the text message “AAAAABBC”. The letter A appears 5 times, B appears 2 times, and C appears once. If we used a fixed-length code of 2 bits per character, we would need 8 × 2 = 16 bits. But with Huffman coding, we could assign: A = 0 (1 bit), B = 10 (2 bits), C = 11 (2 bits). Total: 5(1) + 2(2) + 1(2) = 11 bits — a saving of 31%!

How JPEG Compression Works (Simplified)

JPEG is the most widely used lossy image format. Understanding the basic steps helps you explain why lossy compression removes data without making images look terrible:

Divide the image into 8×8 blocks: The image is split into small squares of 8 by 8 pixels (64 pixels each).
Apply a mathematical transformation: A technique called the Discrete Cosine Transform (DCT) converts each block from pixel values into frequency information — essentially separating the “smooth gradients” from the “fine details.”
Remove fine detail: The high-frequency data (sharp edges, tiny textures) is reduced or discarded. Human eyes are much more sensitive to gradual colour changes than to fine detail, so this removal is often barely noticeable.
Quality slider: When you save a JPEG, the “quality” setting controls how aggressively step 3 discards data. Quality 100 keeps almost everything (large file). Quality 10 discards most of the detail (tiny file, visible degradation).

Explore

Real-World Case Studies

Case Study 1: Spotify Audio Streaming

A typical three-minute song stored as uncompressed audio (CD quality WAV) takes up approximately 30 MB. Spotify uses lossy compression (Ogg Vorbis and AAC codecs) to shrink these files dramatically:

Normal quality (160 kbps): approximately 3.5 MB — a compression ratio of about 8.5:1
High quality (256 kbps): approximately 5.5 MB — a compression ratio of about 5.5:1
Very high quality (320 kbps): approximately 7 MB — a compression ratio of about 4.3:1

How does it achieve this? The lossy codec analyses the audio and identifies frequencies that humans can barely hear — very high-pitched sounds above 16 kHz, quiet tones masked by louder simultaneous sounds, and extremely subtle variations. These are removed or simplified. Most listeners cannot tell the difference between 320 kbps and uncompressed audio.

Without compression, Spotify’s library of over 100 million tracks would require vastly more server storage, and streaming over mobile data would be impractical.

Case Study 2: Medical Imaging

Hospitals store X-rays, CT scans, and MRI images in a format called DICOM (Digital Imaging and Communications in Medicine). These images must use lossless compression — lossy formats like JPEG are forbidden for diagnostic images.

Why? Because even a tiny amount of data loss could mean the difference between spotting a small tumour and missing it, or correctly identifying a hairline fracture and overlooking it. A radiologist needs to see every single detail in the image, exactly as the scanner captured it.

Lossless compression still achieves a useful reduction of 50–70% in file size while preserving perfect quality. For a hospital generating thousands of scans per day, this saving in storage is significant — without compromising patient safety.

Case Study 3: 4K Video Streaming

This is where compression truly works miracles. Consider the numbers:

A single frame of 4K video (3840 × 2160 pixels, 24-bit colour) = approximately 24 MB
At 30 frames per second, one second of video = approximately 720 MB
A two-hour film at this rate = approximately 5.2 TB uncompressed

Modern video codecs like H.265 (HEVC) compress this to just 5–15 GB — a compression ratio of around 500:1. They achieve this by only storing the differences between consecutive frames (since most of the image stays the same from one frame to the next) and by applying lossy techniques to remove visual detail the eye will not notice during motion.

Without compression, services like Netflix and YouTube simply could not exist. No internet connection could stream 720 MB per second of uncompressed video, and no server could store billions of uncompressed films.

Did You Know? The first digital image was created in 1957 by Russell Kirsch at the US National Bureau of Standards. It was a picture of his infant son and measured just 176 × 176 pixels — a total of 30,976 pixels. By contrast, a modern 12-megapixel phone camera captures 12,000,000 pixels per photograph — nearly 400 times more data. Without compression, a single 12 MP photo in full 24-bit colour would take up 36 MB. Thanks to JPEG compression, that same photo typically takes up only 3–5 MB.

Interactive Exercise 1: Run-Length Encoding

Practice encoding and decoding with RLE. Use the buttons to switch mode.

Encode: Convert the repeated string into RLE notation (e.g., AAABBB → 3A3B)

Score: 0 / 0

Interactive Exercise 2: Compression Ratio Calculator

Calculate the compression ratio (e.g., 5:1) and the percentage saved (e.g., 80%) from the given file sizes.

Ratio: % saved: %

Score: 0 / 0

Interactive Exercise 3: Lossy or Lossless?

Read the scenario and decide: would lossy or lossless compression be more appropriate?

Lossy Lossless

Score: 0 / 0

Practice

Test Yourself

Click on each question to reveal the answer. Try to work it out yourself first!

Q1: Explain the difference between lossy and lossless compression.

Answer:

Lossy compression permanently removes data from the file that is considered less important or less noticeable to human senses. The original file cannot be perfectly reconstructed from the compressed version. It achieves very large reductions in file size (often 80–95%).

Lossless compression reduces file size by finding and encoding patterns more efficiently, without removing any data. The original file can be perfectly reconstructed from the compressed version. It typically achieves a smaller reduction (around 50–70%).

The key difference is reversibility: lossless is fully reversible; lossy is not.

Q2: Give two examples of lossy file formats and two examples of lossless file formats.

Answer:

Lossy: JPEG (images) and MP3 (audio). Other acceptable answers include MP4, AAC, or H.265.

Lossless: PNG (images) and ZIP (any file). Other acceptable answers include FLAC, GIF, or 7z.

For full marks, name the format and state what type of file it is used for.

Q3: Explain why hospitals use lossless compression for medical images rather than lossy compression.

Answer:

Hospitals use lossless compression because medical images such as X-rays, CT scans, and MRIs are used to diagnose illnesses and injuries. Even a tiny amount of data loss from lossy compression could mean that a small tumour, fracture, or other abnormality is no longer visible in the image. This could lead to a missed diagnosis, which could be life-threatening.

Lossless compression still reduces file sizes by 50–70%, which saves significant storage, while preserving every detail exactly as the scanner captured it. Patient safety requires that no data is lost.

Q4: What is run-length encoding (RLE)? Give an example.

Answer:

Run-length encoding (RLE) is a lossless compression technique that replaces consecutive repeated values (“runs”) with a single count-value pair.

Example: The string WWWWWBBBBRRR (12 characters) would be encoded as 5W4B3R (6 characters) — a 50% reduction. To decompress, you simply expand each pair: 5W becomes WWWWW, 4B becomes BBBB, and 3R becomes RRR.

RLE works best on data with long runs of identical values, such as simple images with large areas of uniform colour.

Q5: Why is lossy compression used for music streaming services like Spotify?

Answer:

Lossy compression is used for music streaming because:

An uncompressed three-minute song is approximately 30 MB. Lossy compression reduces this to around 3–7 MB, making streaming over mobile data practical.
The lossy codec removes sounds that humans can barely hear — such as very high frequencies and quiet tones masked by louder sounds. Most listeners cannot perceive the difference between a high-quality lossy stream and the uncompressed original.
With millions of users streaming simultaneously, the bandwidth and storage savings are enormous. Without lossy compression, the service would not be economically or technically feasible.

The small, largely imperceptible reduction in quality is an acceptable trade-off for dramatically smaller file sizes.

Q6: A 15 MB file is compressed to 3 MB. What is the compression ratio? What percentage of the file size has been saved?

Answer:

Compression ratio: 15 ÷ 3 = 5:1 (the original is 5 times larger than the compressed version).

Percentage saved: (15 − 3) ÷ 15 × 100 = 12 ÷ 15 × 100 = 80%.

Always show your working in the exam. State the formula, substitute the values, and give the final answer with the correct unit (:1 for ratio, % for percentage).

Q7: Explain why compression is essential for video streaming services.

Answer:

Compression is essential for video streaming because uncompressed video files are extraordinarily large. A single second of 4K video contains approximately 30 frames, each with over 8 million pixels in 24-bit colour — totalling roughly 720 MB per second. A two-hour film would be approximately 5 TB uncompressed.

No home internet connection could stream 720 MB per second, and no practical server could store billions of uncompressed films. Video codecs like H.265 compress this to just 5–15 GB by:

Storing only the differences between consecutive frames (since most of the image stays the same from frame to frame)
Removing visual detail that the human eye will not notice during motion

Without compression, services like Netflix and YouTube simply could not exist.

Understand

Key Vocabulary

Make sure you know all of these terms for your exam:

Term	Definition
Compression	The process of reducing the file size of data so it takes up less storage space and can be transmitted more quickly. There are two types: lossy and lossless.
Lossy Compression	A type of compression that permanently removes data considered less important. The original file cannot be perfectly reconstructed. Examples: JPEG, MP3, MP4.
Lossless Compression	A type of compression that reduces file size without removing any data. The original file can be perfectly reconstructed from the compressed version. Examples: PNG, FLAC, ZIP.
Run-Length Encoding (RLE)	A simple lossless compression technique that replaces consecutive repeated values with a count-value pair. For example, AAABBB becomes 3A3B.
Huffman Coding	A lossless compression technique that assigns shorter binary codes to frequently occurring values and longer codes to rare values, minimising the total number of bits needed.
Compression Ratio	A measure of how much a file has been compressed, calculated as original size divided by compressed size. A ratio of 5:1 means the original is five times larger than the compressed version.
Bit Rate	The number of bits processed or transmitted per second, typically measured in kilobits per second (kbps). Higher bit rates generally mean better quality but larger files. For example, Spotify streams at 160–320 kbps.
Codec	Short for “coder-decoder.” A codec is a program or algorithm that compresses (encodes) and decompresses (decodes) data. Examples include H.265 for video and MP3 for audio.

Understand

Exam Tips

Exam Tip 1: Use Precise Language About Lossy Compression Never say that lossy compression “loses quality” or “makes the file worse.” Instead, say it “permanently removes data that cannot be recovered.” This is scientifically precise and demonstrates the level of understanding examiners are looking for. For extra marks, specify what data is removed: “JPEG removes fine colour detail that the human eye barely notices” or “MP3 removes sound frequencies that the human ear cannot easily perceive.”

Exam Tip 2: Name Specific File Formats When asked about lossy or lossless compression, always give named examples of file formats. Do not just say “images can be lossy or lossless” — say “JPEG is a lossy image format; PNG is a lossless image format.” Examiners award marks for specific, concrete knowledge. Remember: JPEG = lossy image, MP3 = lossy audio, PNG = lossless image, FLAC = lossless audio, ZIP = lossless general-purpose.

Exam Tip 3: Explain WHY a Type Is Chosen A very common exam question gives a scenario and asks which type of compression is appropriate. Your answer must explain why, not just state the type. Structure your answer as: (1) state the type, (2) explain why it is suitable for this specific scenario, and (3) explain why the other type would be inappropriate. For example: “Lossless compression should be used because medical images are used for diagnosis and any data loss could cause a condition to be missed. Lossy compression would be unsuitable because it permanently removes data, which is unacceptable when lives may depend on the image quality.”

Exam Tip 4: Show Your Working for Compression Calculations When calculating compression ratios or percentages, always show the formula and your working:

Compression ratio: Original ÷ Compressed = ratio. Write it as X:1.
Percentage saved: (Original − Compressed) ÷ Original × 100 = percentage.

Even if you get the arithmetic slightly wrong, showing the correct method can earn you partial marks.

Exam Practice

Past Paper Questions

Try these exam-style questions, then click to reveal the mark scheme answer.

Explain the difference between lossy and lossless compression. Give an example of when each type would be appropriate. 4 marks

Mark scheme:

Lossy (2 marks):

Permanently removes data / some quality is lost / cannot be reversed (1)
Appropriate for: streaming media / web images / where smaller size matters more than perfect quality (e.g. MP3, JPEG) (1)

Lossless (2 marks):

No data is lost / original can be perfectly reconstructed (1)
Appropriate for: text files / program code / medical images / where accuracy is essential (e.g. PNG, ZIP) (1)

Apply Run Length Encoding (RLE) to compress the following data: AAABBBBCCCCCCDD 2 marks

Mark scheme:

Compressed: 3A 4B 6C 2D (1 mark)

Original = 15 characters, compressed = 8 characters — demonstrating the data has been reduced in size (1 mark)

Give two reasons why compression is used when transmitting files over the internet. 2 marks

Mark scheme:

Reduces the file size so it takes less time to transmit / download (1)
Uses less bandwidth / allows more data to be sent in the same time / reduces storage requirements on servers (1)

Reflect

Compression in Everyday Life

Compression is one of those technologies that is completely invisible yet absolutely essential to modern digital life. Every time you do any of these things, compression is working behind the scenes:

Take a photo on your phone: Your camera sensor captures raw data, but the image is immediately compressed to JPEG before being saved. Without this, your phone’s storage would fill up 5–10 times faster.
Stream a song on Spotify or Apple Music: The original studio recording is compressed from approximately 30 MB to 3–7 MB using lossy codecs, allowing it to stream smoothly over mobile data.
Watch a video on YouTube or Netflix: The video is compressed from potentially terabytes of raw footage into a few gigabytes, then streamed at a bit rate your internet connection can handle.
Send a ZIP file by email: Lossless compression bundles and shrinks your files so they can be attached to an email without exceeding the size limit.
Browse the web: Images on websites are compressed (JPEG, PNG, or the modern WebP format) so pages load quickly rather than taking minutes.

The next time you take a photo, stream a song, or watch a video, think about the incredible mathematics happening behind the scenes — shrinking millions or billions of bytes into something that fits in your pocket and streams over the air. Compression is one of computer science’s greatest achievements, and understanding it gives you real insight into how the digital world works.

Video Resources

▶️

Craig 'n' Dave: CompressionLossy vs lossless compression techniques

On This Page

Why Compress?

Lossy vs Lossless Compression

Lossy Compression

Lossless Compression

Comparison Table

How Compression Works

Run-Length Encoding (RLE) — A Simple Lossless Technique

Worked Example: A 1-Bit Image Row

Huffman Coding — Brief Explanation

How JPEG Compression Works (Simplified)

Real-World Case Studies

Interactive Exercise 1: Run-Length Encoding

Interactive Exercise 2: Compression Ratio Calculator

Interactive Exercise 3: Lossy or Lossless?

Test Yourself

Key Vocabulary

Exam Tips

Past Paper Questions

Compression in Everyday Life

Video Resources

Further Reading