← Blog

JPG to PDF — what actually happens to the pixels

A "JPG → PDF" conversion is mostly a wrapper change — the JPEG stream goes in untouched photo.jpg SOI marker (FF D8) JFIF / EXIF segments DQT (quantization tables) DHT (Huffman tables) SOF, SOS, scan data EOI marker (FF D9) ~3 MB iPhone photo, 12 MP, baseline JPEG extract scan data, wrap with PDF preamble, add page rectangle, declare DPI in matrix photo.pdf PDF header /Page → /MediaBox A4 /Resources /XObject /Im0 with /Filter /DCTDecode & raw JPEG content stream: cm Do ~3.05 MB output: JPEG bytes + ~50 KB framing

The most counterintuitive thing about JPG-to-PDF is how little re-encoding actually happens. The existing JPEG stream goes into a PDF Image XObject with a /Filter /DCTDecode declaration, told to live on a page-sized rectangle, and saved. The DCT data either passes through bit-for-bit, or — if JPG2PDF decides to recompress — gets re-encoded once at the new quality and embedded.

Why the pixels do not change

PDF's /DCTDecode filter is, literally, a JPEG decoder. The PDF spec was designed around this in 1993: the file format author at Adobe (the same company that owns the Photoshop JPEG implementation) made JPEG the native image filter so embedding photos didn't require recompression. A PDF reader displaying an Image XObject with /DCTDecode hands the bytes to its JPEG decoder, exactly as the original JPEG file would have done.

This means the output PDF is precisely as good — and precisely as compressed — as the input JPG. No quality loss, no extra blocking, no chroma blurring beyond what was already in the source.

What does get stripped

JPEGs carry metadata in markers before the actual scan data: JFIF density, EXIF (camera model, GPS, timestamp, orientation), IPTC (caption, keywords), XMP, ICC color profile, thumbnail. The PDF Image XObject itself has no place for most of these structures, but JPG2PDF preserves what it can: EXIF Orientation is applied as a real rotation; the source's CreateDate, Author, and Copyright are copied into the output PDF's /Info dictionary; the in-JPEG EXIF block survives the lossless repack pass; and ICC color profiles ride along with the embedded image so colors render correctly. IPTC and XMP payloads typically don't survive — PDF has no equivalent containers and the tool doesn't synthesize one.

Two pieces are not exactly dropped — they are read and used:

JPG2PDF preserves ICC profiles, applies EXIF orientation as a real rotation, and copies source CreateDate / Author / Copyright into the output PDF's /Info dictionary. IPTC and XMP payloads typically don't carry across, since PDF has no native equivalents.

The page around the image

The Image XObject is one part of a PDF page. The rest is the page itself:

The matrix in cm determines where the image lands on the page and at what scale. For a 4032×3024 photo on A4, JPG2PDF computes the largest scale that fits within page margins while preserving aspect ratio.

Multi-photo output

When you upload several JPGs, each becomes one page. The PDF's /Pages tree lists them in upload order (or the order you set in the UI by reordering tiles). Each page references its own Image XObject — there is no font sharing or cross-page resource pool to deduplicate.

This is why a 20-photo PDF is roughly the sum of the 20 JPGs plus ~30 KB of framing overhead. There is no compression magic happening between pages; each photo is its own self-contained payload.

When recompression does happen

JPG2PDF inspects the input JPEG's estimated quality (parsed from its quantization tables) and compares against the site's target quality (Q=90). The pipeline:

  1. If the source's estimated quality is greater than the target or undetectable, JPG2PDF does one lossy recompression pass to Q=90. The file shrinks, with one generation of quality loss.
  2. A lossless re-encoding pass then optimizes the Huffman tables. Pixel data unchanged; in-JPEG EXIF preserved.
  3. If the source's quality was already at or below Q=90, only the lossless pass runs — the DCT data is preserved bit-for-bit.
  4. If EXIF Orientation requires rotation, the rotation flips DCT blocks directly without going through a pixel-domain decode. For dimensions not on a block boundary (8 or 16 pixels depending on chroma subsampling), a few edge pixels are cropped to keep the rotation lossless.

The net effect: most photos straight from a phone (quality 80–90) pass through losslessly. Re-saved or studio JPEGs at Q=95+ are downsampled to Q=90 with one round of recompression.

The output is a real PDF

Not a "ZIP of JPEGs renamed" or some other shortcut. It is a standard PDF that opens in any reader, prints to any printer, can be edited, can have annotations added, and can be combined with other PDFs. The fact that it happens to use JPEG-compressed images internally is invisible to almost everyone except people who write PDF-handling code.