PDF Knowledge Base
PDF & Compression Glossary
A comprehensive reference of PDF, compression, and document management terms —
optimised for anyone wanting to understand how freepdfconvert works under the hood.
Adaptive Bit-Depth Reduction
A compression technique that analyses the effective colour palette of embedded images and reduces the bits-per-channel from 8 to 5 (or lower) where fewer unique colour values exist, removing redundant precision without perceptible quality loss.
Lossy Technique
Bit Depth
The number of bits used to represent a single colour channel in an image. Standard digital images use 8-bit per channel (256 values). Reducing to 5-bit (32 values) removes 37.5% of colour data per pixel in sparse-palette images.
Image Property
Blob URL
A temporary browser-local URL (blob:https://…) that references in-memory binary data. freepdfconvert uses Blob URLs to enable file downloads without any server transfer. They are automatically revoked after download.
Browser API
Client-Side Processing
Computation that occurs entirely within the user's browser, using JavaScript and WebAssembly, without transmitting data to any remote server. freepdfconvert is entirely client-side for all document operations.
Architecture
Compression Ratio
The ratio of the original file size to the compressed file size. A 10:1 compression ratio means a 10MB file compresses to 1MB. freepdfconvert achieves ratios from 3:1 (lossless text) to 20:1 (extreme lossy).
Core Metric
DCT (Discrete Cosine Transform)
A mathematical transformation used in JPEG and similar lossy compression standards. Converts 8×8 pixel blocks from spatial domain to frequency domain. High-frequency components (fine detail) are discarded first, as they are least perceptible to the human eye.
Lossy Algorithm
Deflate / Flate Compression
A lossless compression algorithm (combination of LZ77 and Huffman coding) used natively inside PDF files for stream compression. PDF specification allows Flate as the default compression filter for content streams.
Lossless Algorithm
Entropy (Shannon Entropy)
A statistical measure of information density in a data stream, measured in bits per symbol (0–8 for byte data). Low entropy (2–4 bits) indicates high compressibility. High entropy (7–8 bits) indicates the data is already dense or encrypted, limiting further compression.
Core Metric
Embedded Fonts
Font files included within a PDF to ensure consistent rendering regardless of the viewer's installed fonts. Full font embedding is one of the largest sources of PDF bloat. Font subsetting retains only the glyphs actually used, significantly reducing file size.
PDF Structure
FileReader API
A browser-native JavaScript interface that reads files from the user's filesystem into memory (as ArrayBuffer, text, or data URL) without any network request. freepdfconvert uses FileReader for all file intake operations.
Browser API
Flate Encoding
See Deflate / Flate Compression. The PDF name for stream-level zlib/deflate compression, referenced in PDF specification as /FlateDecode filter. Most modern PDF generators apply flate automatically to content streams.
PDF Spec
Huffman Coding
A statistical lossless encoding scheme that assigns shorter binary codes to more frequent symbols. Used as the second stage of Deflate compression and as a component of JPEG encoding. Optimal for data with non-uniform symbol distributions.
Lossless Algorithm
JPEG (Joint Photographic Experts Group)
A lossy image compression standard using DCT frequency decomposition. Embedded JPEG images are the primary target of lossy PDF compression. Quality settings from 1–100 control the aggressiveness of high-frequency coefficient quantisation.
Image Format
JSZip
A JavaScript library for creating, reading, and editing ZIP archives entirely in the browser. freepdfconvert uses JSZip to bundle PDF-to-Image output pages and Split PDF results into a single downloadable ZIP file without any server interaction.
Library
LZW (Lempel-Ziv-Welch)
A lossless data compression algorithm that builds a dynamic dictionary of repeated byte sequences during a single pass, replacing repetitions with shorter dictionary codes. Achieves 3:1–5:1 ratios on text-rich PDFs with zero quality loss. Originally used in GIF images and TIFF files.
Core Algorithm
Lossless Compression
Any compression technique that allows the original data to be perfectly reconstructed from the compressed version. LZW, Deflate, Huffman, and Run-Length Encoding are all lossless. Ideal for legal, financial, and academic documents requiring exact text preservation.
Algorithm Class
Lossy Compression
Compression that permanently discards some data, achieving higher ratios at the cost of some quality degradation. JPEG, DCT-based video codecs, and perceptual audio compression (MP3) are all lossy. For PDFs, lossy refers specifically to re-encoding embedded images at reduced quality.
Algorithm Class
Metadata (PDF)
Descriptive information embedded within a PDF that is not part of the visible content: author name, creation date, modification date, GPS coordinates from scanned images, creator application, producer, keywords, and custom properties. Stripping metadata can reduce file size and privacy exposure.
PDF Structure
Object Streams (PDF)
A PDF 1.5+ feature that compresses multiple PDF object structures into a single compressed stream, reducing overhead bytes significantly. pdf-lib's useObjectStreams option enables this, typically providing 10–25% additional size reduction on top of other compression.
PDF Spec
pdf-lib
An open-source JavaScript/TypeScript library for creating and modifying PDF documents. freepdfconvert uses pdf-lib for PDF loading, page manipulation, merging, splitting, metadata editing, and saving with object-stream compression.
Library
pdf.js
Mozilla's open-source JavaScript PDF renderer. freepdfconvert uses pdf.js to render individual PDF pages to HTML5 Canvas elements for the PDF-to-Images tool, supporting the full range of PDF content including text, images, and vector graphics.
Library
PDF/A
An ISO standard version of PDF designed for long-term archiving. PDF/A files must embed all fonts, cannot include encrypted content, and must use specific colour profiles. Compression of PDF/A files requires maintaining these constraints.
PDF Standard
Run-Length Encoding (RLE)
A simple lossless compression method that replaces consecutive identical values (runs) with a count and a single value. Effective for flat colour graphics, solid backgrounds, and bi-level scanned documents. Used inside PDF as the /RunLengthDecode filter.
Lossless Algorithm
Shannon Entropy
A mathematical formula (H = -Σ p(x) log₂ p(x)) that measures the average information content of a data stream in bits per symbol. Named after Claude Shannon (1948). freepdfconvert's Web Worker computes Shannon entropy on a 64KB byte-frequency sample to predict compressibility before processing.
Core Metric
Spatial Frequency
In image processing, the rate of change of pixel values across space. Low spatial frequency = smooth gradients. High spatial frequency = sharp edges and fine detail. Lossy compression discards high-frequency components first, as the human visual system is less sensitive to them.
Image Processing
Web Worker
A browser API that runs JavaScript in a separate background thread, isolated from the main UI thread. Enables heavy computation (like byte-frequency analysis) without blocking user interactions. freepdfconvert inlines its Worker as a Blob URL, requiring no separate file download.
Browser API
WebAssembly (WASM)
A binary instruction format that runs at near-native speed in browsers. pdf-lib and pdf.js use WASM internally for performance-critical parsing and rendering operations, enabling complex PDF manipulation without server infrastructure.
Browser Technology
XRef Table (PDF)
The cross-reference table at the end of a PDF file that maps object numbers to byte offsets, enabling random access to objects without reading the entire file. Compressed XRef streams (PDF 1.5+) reduce the overhead of this table using Flate compression.
PDF Structure