How PDF Compression Works in the Browser, and When Each Level Helps
A practical look at why PDFs get bloated, what compression actually changes inside the file, and how to pick a quality level. With real numbers from a browser-based tool.
How PDF Compression Works in the Browser, and When Each Level Helps
A PDF that should be five megabytes is often forty. The file is the same on every machine, so the bloat travels with it. It gets stuck in email attachment limits, fills up shared drives, and makes mobile downloads painfully slow. The fix is compression, and the interesting question is what the tool actually changes inside the file when it compresses it.
This post walks through what makes a PDF large in the first place, what each kind of compression touches, the tradeoffs between quality levels, and when compression is the wrong move. Most of the examples come from running real documents through a browser-based PDF compressor that keeps the file on the user's device.
What is actually in a PDF
A PDF file is a container. Inside the container are streams of objects: text, fonts, images, vector paths, form fields, metadata, and the cross-reference tables that tie them together. Each object can be encoded in one of several ways, and the encoding choice usually dominates the final file size.
The biggest contributors to a bloated PDF are almost always:
- Raster images. A photo embedded at its full original resolution will eat tens of megabytes per page. PDFs created by exporting from Word or PowerPoint frequently keep the source image at print resolution even when the document is meant for screen reading.
- Embedded fonts. A document that uses three different fonts will embed all three. If the embedder did not subset, the full font file rides along, and modern font files are not small.
- Unused objects. PDFs created by exporting from design tools or by appending to existing documents often carry deleted-but-not-removed objects, old form fields, and revision history that nobody can see.
- Metadata and XMP packets. Each save through Acrobat or a design tool can append metadata that never gets cleaned up.
A scanned document is a special case. A scan is a PDF where every page is a single image, and the file size is dominated by the resolution and encoding of those images.
What compression actually changes
A compressor walks through the PDF, identifies which objects can be re-encoded more efficiently, and rewrites them. The categories of change are:
Image re-encoding. The compressor decodes each embedded image, optionally downsamples it to a smaller resolution, and re-encodes it as JPEG. The JPEG quality setting is the single biggest knob. A photo at quality 90 looks the same to the eye as the original. A photo at quality 60 saves substantial space and remains acceptable for screen reading. A photo at quality 40 starts to show artifacts in flat areas, like skies and skin tones.
Image downsampling. A 4000-pixel-wide photo embedded in a PDF that displays at 800 pixels of screen width is carrying five times the pixel data it can ever show. Downsampling to 150 or 200 dots per inch keeps the document sharp on screen and on a desktop printer, while cutting the file size by roughly the square of the resolution ratio.
Font subsetting. A full font file contains thousands of glyphs. A document usually uses a few dozen. Subsetting strips out the glyphs the document does not use and rebuilds the font table to point only to the ones it needs. The savings are large for documents with several fonts and limited text.
Object removal. Unused objects, old form fields, and revision history are dropped. The cross-reference table is rebuilt.
Stream re-encoding. Text and vector content streams are recompressed using Flate (zlib), which is usually already the default but is sometimes left uncompressed in PDFs exported by older tools.
The actual saving from any one of these operations varies wildly across documents. A text-heavy contract gains the most from object removal and font subsetting. A photo-heavy product brochure gains the most from image re-encoding.
The three levels in practice
A compressor that exposes only one button has to make a fixed tradeoff between size and quality. A compressor that exposes three levels lets the user pick where on the curve they want to sit. The three levels in the ReezoAI tool map onto these tradeoffs:
Light keeps the text crisp and selectable, leaves vector content untouched, and uses gentle JPEG quality on images (around 85). It is the safe option for documents you still need to read text from on a high-resolution screen. The typical reduction is 20 to 40 percent on text-heavy files, less on photo-heavy ones.
Recommended balances size against visual quality. Images are downsampled to around 150 DPI and re-encoded at JPEG quality 70 or so. Text stays selectable. Fonts are subsetted. This level is the right default for sharing documents over email or uploading to a portal. Typical reduction is 50 to 75 percent on mixed content.
Maximum is built for hitting strict size limits. Images go to 100 DPI or lower and JPEG quality 50 or so. The output is fine for on-screen reading but will look soft when zoomed in or printed. Text remains selectable because text and vector content are not the targets at this level. This is the option to reach for when an email gateway insists on a 5 megabyte limit and your document is 15 megabytes.
A real document, run through all three on a typical browser-based compressor, might land at 65 percent, 30 percent, and 12 percent of the original size respectively. The relative ranking is consistent. The absolute numbers depend entirely on what is inside the document.
When compression is the wrong move
Compression is destructive for raster images and removes information that cannot be put back. There are situations where this matters:
Signed or notarized PDFs. A digital signature is computed over the exact bytes of the document. Re-encoding any content invalidates the signature. If the PDF carries a legal signature, it must be transmitted in its original form. Compression has to happen before signing, not after.
Documents intended for OCR. A scanned page that has not been OCR'd yet needs the original image resolution for the OCR engine to work well. Compressing a 300 DPI scan down to 100 DPI before running OCR will produce noticeably worse text extraction.
Archive copies. A document being filed for long-term retention should usually be archived in its original form, with compression applied only to the working copy. The compressed file is a derivative; the original is the source of truth.
Documents where image fidelity is the deliverable. Product spec sheets where the photo is the spec, fine art reproductions, medical imaging, or any document where the image is being inspected at full quality, should not be compressed at the Recommended or Maximum levels. Light is safe; the others are not.
Documents that are already optimized. A PDF exported with "smallest file size" in the original tool may not have much room left to compress. The tool will still run, but the savings will be small or negative if the compressor adds metadata back.
If any of those apply, send the original. If none apply, compress.
The privacy angle for confidential PDFs
PDFs are the format for documents that matter. Contracts, financial statements, medical records, internal reports, board decks, and customer agreements all flow as PDFs. The category of "documents I should not upload to a stranger's server" is large.
Most online PDF compressors work by uploading the file, running the compression on a server, and sending the result back. That works, but the file is now in someone else's infrastructure. Whatever the privacy policy says, the file passes through their network, sits in their queue, and ends up in their logs. For sensitive documents, that is a non-trivial cost.
A browser-based compressor processes the file on the user's device. The file is parsed, the streams are re-encoded, and the new file is assembled, all using JavaScript libraries running in the page. No upload happens. You can verify this in the browser's developer tools: while the compression runs, the network tab shows no traffic.
This matters more for some documents than others. A printable menu compressed for upload to a marketing site does not need browser-only processing. A signed offer letter, a financial statement, or a patient intake form does. The advantage of browser-only is that the same tool covers both cases without forcing the user to decide which is sensitive.
How the ReezoAI Compress PDF tool works
The Compress PDF tool on this site reads the PDF in the browser using pdfjs-dist for parsing and pdf-lib for assembly. At the Light level, only metadata cleanup and stream re-encoding happen. At Recommended and Maximum, each page is also rendered to a hidden canvas, the canvas is re-encoded as a JPEG at the target quality and resolution, and a new PDF is built from the JPEG pages.
The tradeoff at Recommended and Maximum is that the page becomes an image, so any selectable text in the original becomes part of the rendered image instead. The tool keeps a button to download the original alongside the compressed version, so the user can decide which one to share.
The processing happens entirely on device. A typical 20 megabyte document compresses in three to ten seconds on a recent laptop. Mobile devices take longer, often fifteen to thirty seconds for the same file. Memory use scales with page count, so very large documents may need to be split first.
A short workflow for picking a level
When you drop a file into the compressor, the right level usually falls out of two questions:
- Does the recipient need to copy text out of it? If yes, Light. The text stays selectable.
- Is there a hard file-size cap on the other end? If yes, start at Recommended and check the size. If still over, jump to Maximum. If well under, Recommended is the safer choice for visible quality.
The Light level exists for documents where compression is incidental and quality is the priority. Recommended is the default for sharing. Maximum is for the gateway-imposed five-megabyte cap. Most users settle on Recommended for almost everything and reach for the other two when something specific demands it.
The summary
PDF files get large because of high-resolution images, embedded fonts, and accumulated cruft from successive saves. A compressor reduces size by re-encoding images, subsetting fonts, dropping unused objects, and rebuilding the streams. Each level trades quality for size, and the right level depends on whether text needs to stay selectable, whether there is a hard cap on size, and whether the document is going to be archived or shared.
Browser-based compression keeps the file on the user's device, which matters for any PDF that contains information you would not casually email to a stranger. The cost is a little more wait on mobile and a memory ceiling for very large documents. The benefit is a tool that handles both routine compression and confidential documents without changing tools.
If the document is signed, do not compress it. If it is a scan you still need to OCR, compress it carefully. For everything else, the compressor is a one-click step that saves time on every transfer that follows.
Open the tool.
Free with daily credits. The right tool for what you just read.
Related reading
Other articles
tools-tutorials
How to Merge PDF Files in the Browser Without Uploading Them
Combining PDFs is one of the most-Googled file operations on the web. Here is what merging actually does inside a PDF, how it can run in the browser without an upload, and what survives the combine.
10 min read
tools-tutorials
How to Split a PDF by Page Range in the Browser
Extracting specific pages from a PDF is a daily task for anyone who handles contracts, receipts, or long reports. Here is what splitting actually does to a PDF, how page ranges work, and why running it in the browser keeps the document private.
10 min read
tools-tutorials
Browser-Based Background Removal: How It Works and Where It Falls Short
An honest look at running image background removal in the browser instead of uploading to a service. What the model does, when the output is good enough, and when to pick a paid tool.
9 min read