← Blog

Why a 5 MB JPG becomes a 5 MB PDF — file size accounting

PDF size = sum of JPEG bytes + ~10–30 KB framing Single JPG, 5,242,880 bytes Image scan data 5,234,560 JPEG headers (DQT, DHT) 4,200 EXIF, IPTC, XMP metadata 3,800 JFIF marker, padding 320 → embed only scan + DQT/DHT (5,238,760 B) drop EXIF/IPTC/XMP (saves 3,800 B) Resulting PDF, 5,260,140 bytes PDF header, version 17 /Catalog, /Pages, /Page 720 Image XObject framing 600 Embedded JPEG scan + tables 5,238,760 Content stream (cm Do) 150 /Info, ICC, optional XMP 18,000 xref, trailer 1,500 ≈ 5,259,747 (+0.32%)

A common surprise: upload a 5 MB JPEG, get a 5 MB PDF. Some tools market "fast PDF" or "compressed PDF" output and the natural expectation is that the result should be smaller. The output's size is fundamentally bounded by the input's compressed pixel data — a JPG-to-PDF tool that doesn't recompress the photos has nowhere to go below that floor. A tool that does shrink the result is, somewhere along the way, lossily re-encoding the pixels.

Where the bytes go

In a typical 12 MP JPEG (4032 × 3024) at quality 85, the breakdown is roughly:

The PDF embedding pulls the scan data and tables (everything needed to decode the JPEG) into the page; EXIF Orientation is read and applied as a real rotation; the source's CreateDate, Author, and Copyright fields are then copied into the output PDF's /Info dictionary by a post-processing step. IPTC and XMP payloads typically don't survive into the PDF as PDF doesn't have native equivalents for those structures.

PDF framing overhead

Around any embedded JPEG, the resulting PDF carries a fixed set of structural objects: a header line, a /Catalog at the root, a /Pages tree, a /Page dictionary per page (with MediaBox, Resources, Contents references), an Image XObject per embedded photo (with width, height, color space, filter, length), a short content stream that places the image on the page, an /Info dictionary for document metadata, and an xref table plus trailer at the end.

The total per-photo framing overhead is on the order of 1.5–2 KB. For a 20-photo PDF, this scales to roughly 4–6 KB total — most of the per-page bytes (page dictionary, image-XObject dictionary, content stream) repeat once per photo, while the catalog and pages-tree top scales sublinearly. Compared to a typical photo size of 1–5 MB, the framing is essentially noise.

The ICC profile, when present

If the input JPEG has an embedded ICC color profile (Display P3 from iPhone is the common case), it survives into the resulting PDF — the page's Image XObject typically declares /ICCBased with the profile bytes attached as a separate stream. ICC profiles range from 500 bytes (sRGB v2 micro) to 10–20 KB (Display P3, Adobe RGB) to occasionally hundreds of kilobytes (custom monitor profiles).

For a single-photo PDF, this can dominate the framing overhead — a 12 KB Display P3 profile attached to a 50 KB photo noticeably inflates the file. For 20 large photos, it's noise. Whether the resulting PDF deduplicates a shared profile across pages depends on the underlying conversion library; some do, some don't, and the savings are at most a few hundred KB on a typical batch.

When the PDF comes out smaller than the JPG

  1. Re-encode of high-quality sources. JPEGs estimated to be at quality higher than 90 are re-encoded at quality 90, which can shave 30% off the photo's size before it lands in the PDF. The visual difference is barely visible on photographic content.
  2. Object-stream packing. Modern PDF writers can bundle multiple indirect objects into single Flate-compressed streams, which saves a few KB when the output has many similar pages. Whether this happens depends on the underlying writer's defaults.

When the PDF comes out larger

  1. ICC profile attached. +1 KB to +20 KB per page.
  2. Verbose content-stream framing. Some encoders write more PDF metadata than strictly needed — multiple operators where one would do — adding a few hundred bytes per page.
  3. Re-encoded inputs. If the input was a non-JPEG (PNG, HEIC) that we re-encoded as JPEG, the output JPEG is independent of the input size; it can be larger or smaller depending on the lossy encoding step.

How to actually shrink a PDF of JPEGs

If the goal is a smaller PDF and the photos themselves are at high quality, the path forward is to recompress the JPEGs at lower quality before (or during) PDF assembly. There is no PDF-side trick that compresses already-compressed JPEG data losslessly.

Practical options:

JPG2PDF preserves quality for sources at Q≤90 and does one round of recompression for higher-quality sources (target Q=90). Aggressive size reduction is a separate operation, not something the JPG-to-PDF step itself attempts.