How to Compress PDFs with Apache PDFBox in Java

PDF files have a tendency to grow far beyond what you'd expect. A report with a handful of images can easily hit 20 or 30 MB, making it impractical to email, upload, or store at scale. If you're working with Java and already using Apache PDFBox, you can compress and optimize these files programmatically using only open-source tools.
This post covers how to use PDFBox to compress PDF files in Java – from image recompression and downsampling to stream compression and resource cleanup.
Why PDF Files Get Large
Before writing any compression code, you should know what actually makes a PDF file big. Most of the time, the culprit is one of these four:
- Embedded images – by far the biggest contributor. A few uncompressed photographs at 300 DPI can add 10-25 MB to a document. Many PDF generators embed images at their original resolution, even when the image displays at a fraction of that size on the page.
- Fonts – add up when the full font file is embedded rather than just the glyphs the document actually uses. A TTF font can be 200-500 KB, and if you're embedding four or five fonts, that's 1-2 MB of font data alone.
- Uncompressed content streams – the drawing instructions for each page (text positioning, line drawing, etc.). Flate compression can significantly reduce their size, but not all PDF generators apply it.
- Metadata and duplicate objects – XMP metadata blocks, thumbnails, and identical images embedded multiple times instead of being referenced once.
Knowing where the bloat comes from tells you which techniques will give you the biggest return. In practice, image compression usually accounts for the largest share of size reduction.
PDFBox Project Setup
You'll need Apache PDFBox 3.x and Java 8 or later. If you're new to PDFBox, our PDFBox tutorial covers PDF generation, custom fonts, and encryption. For compression, add the dependency to your pom.xml:
<dependency>
<groupId>org.apache.pdfbox</groupId>
<artifactId>pdfbox</artifactId>
<version>3.0.7</version>
</dependency>
Or if you use Gradle:
implementation 'org.apache.pdfbox:pdfbox:3.0.7'
PDFBox 3.0 introduced CompressParameters for object stream compression when saving documents. The examples in this guide use the 3.x API and Java 17 features (pattern matching for instanceof, HexFormat). If you're on PDFBox 2.x, the general approach to image manipulation is similar, though the loading API changed (PDDocument.load() vs Loader.loadPDF()) and you won't have access to CompressParameters. See the PDFBox 3.0 migration guide for all differences.
Compress PDF Images with PDFBox
Image recompression gives you the biggest file size reduction. The idea is simple: iterate through every page, find all embedded images, and replace oversized or poorly compressed images with optimized versions.
Recompressing Images as JPEG
Many PDFs embed images in lossless formats (PNG or raw pixel data) when JPEG would be perfectly acceptable. Converting these to JPEG with a reasonable quality setting can shrink each image by 5-10x.
import org.apache.pdfbox.Loader;
import org.apache.pdfbox.cos.COSName;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDResources;
import org.apache.pdfbox.pdmodel.graphics.PDXObject;
import org.apache.pdfbox.pdmodel.graphics.image.JPEGFactory;
import org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject;
import java.awt.image.BufferedImage;
import java.io.File;
import java.io.IOException;
public class PdfImageCompressor {
public static void compressImages(File inputFile, File outputFile,
float jpegQuality) throws IOException {
try (PDDocument document = Loader.loadPDF(inputFile)) {
for (PDPage page : document.getPages()) {
PDResources resources = page.getResources();
if (resources == null) continue;
for (COSName name : resources.getXObjectNames()) {
PDXObject xObject = resources.getXObject(name);
if (!(xObject instanceof PDImageXObject image)) continue;
// Skip images with transparency (JPEG doesn't support alpha)
BufferedImage buffered = image.getImage();
if (buffered.getColorModel().hasAlpha()) continue;
PDImageXObject compressed =
JPEGFactory.createFromImage(document, buffered, jpegQuality);
resources.put(name, compressed);
}
}
document.save(outputFile);
}
}
}
A jpegQuality of 0.75f gives a good balance between file size and visual quality. For documents where image fidelity matters less (internal reports, email attachments), you can go as low as 0.5f.
JPEG does not support transparency. Converting an image with alpha channels directly to JPEG will flatten the transparency to a black background. The code above skips these images, but if you want to compress them too:
- Split the image into RGB (compress with
JPEGFactory) and alpha mask (encode losslessly withLosslessFactory), then attach the mask viajpegImage.getCOSObject().setItem(COSName.SMASK, alphaMask) - Small images (under 64px) or bitmask transparency: keep lossless with
LosslessFactory.createFromImage()
Downsampling High-Resolution Images
A 4000x3000 pixel photograph displayed in a 400x300 point area on the page is wasting most of its pixels. Downsampling these images to match their actual display size (with a safety margin for print quality) can reduce their data by 75% or more, depending on the target DPI.
import java.awt.Graphics2D;
import java.awt.RenderingHints;
import java.awt.image.BufferedImage;
public class ImageDownsampler {
/**
* Downscale an image if its effective DPI exceeds the target.
*
* @param image the original image
* @param widthPt display width on the PDF page (in points)
* @param heightPt display height on the PDF page (in points)
* @param targetDpi maximum DPI to keep (e.g. 150 for screen, 300 for print)
* @return downscaled image, or the original if already below target DPI
*/
public static BufferedImage downsample(BufferedImage image,
float widthPt, float heightPt,
int targetDpi) {
float currentDpiX = image.getWidth() / (widthPt / 72f);
float currentDpiY = image.getHeight() / (heightPt / 72f);
if (currentDpiX <= targetDpi && currentDpiY <= targetDpi) {
return image; // already within target DPI
}
int newWidth = Math.round(widthPt / 72f * targetDpi);
int newHeight = Math.round(heightPt / 72f * targetDpi);
BufferedImage scaled = new BufferedImage(
newWidth, newHeight, image.getType());
Graphics2D g = scaled.createGraphics();
g.setRenderingHint(RenderingHints.KEY_INTERPOLATION,
RenderingHints.VALUE_INTERPOLATION_BICUBIC);
g.drawImage(image, 0, 0, newWidth, newHeight, null);
g.dispose();
return scaled;
}
}
To find the display dimensions of an image on the page, you need to look at the page's content stream and the current transformation matrix (CTM). For a simpler approach, you can skip the CTM parsing and just cap all images at a fixed DPI (e.g., 150 DPI for screen-quality documents, 300 DPI for print-ready PDFs). The complete compression utility later in this article uses a simpler pixel-count cap instead of calculating display dimensions.
CCITT Group 4 for Black and White Images
If your PDFs contain scanned documents or line art where the image is purely black and white, CCITT Group 4 compression is far more efficient than JPEG. This lossless compression format is specifically designed for bilevel images and typically achieves 10-20x compression ratios on text-heavy scans.
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.graphics.image.CCITTFactory;
import org.apache.pdfbox.pdmodel.graphics.image.LosslessFactory;
import org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject;
import java.awt.image.BufferedImage;
import java.io.IOException;
public class BitonalCompressor {
public static PDImageXObject compressAsCCITT(
PDDocument document, BufferedImage image) throws IOException {
// Convert to grayscale
BufferedImage gray = new BufferedImage(
image.getWidth(), image.getHeight(),
BufferedImage.TYPE_BYTE_GRAY);
var g = gray.createGraphics();
g.drawImage(image, 0, 0, null);
g.dispose();
// Find optimal threshold using Otsu's method
int threshold = calculateOtsuThreshold(gray);
// Apply threshold to create a binary image
BufferedImage binary = new BufferedImage(
gray.getWidth(), gray.getHeight(),
BufferedImage.TYPE_BYTE_BINARY);
for (int y = 0; y < gray.getHeight(); y++) {
for (int x = 0; x < gray.getWidth(); x++) {
int pixel = gray.getRGB(x, y) & 0xFF;
binary.setRGB(x, y, pixel > threshold ? 0xFFFFFF : 0x000000);
}
}
try {
return CCITTFactory.createFromImage(document, binary);
} catch (Exception e) {
// Fall back to lossless if CCITT encoding fails
return LosslessFactory.createFromImage(document, binary);
}
}
private static int calculateOtsuThreshold(BufferedImage gray) {
int[] histogram = new int[256];
for (int y = 0; y < gray.getHeight(); y++) {
for (int x = 0; x < gray.getWidth(); x++) {
histogram[gray.getRGB(x, y) & 0xFF]++;
}
}
int totalPixels = gray.getWidth() * gray.getHeight();
double sum = 0;
for (int i = 0; i < 256; i++) sum += i * histogram[i];
double sumB = 0;
int wB = 0;
double maxVariance = 0;
int bestThreshold = 0;
for (int t = 0; t < 256; t++) {
wB += histogram[t];
if (wB == 0) continue;
int wF = totalPixels - wB;
if (wF == 0) break;
sumB += t * histogram[t];
double mB = sumB / wB;
double mF = (sum - sumB) / wF;
double variance = (double) wB * wF * (mB - mF) * (mB - mF);
if (variance > maxVariance) {
maxVariance = variance;
bestThreshold = t;
}
}
return bestThreshold;
}
}
A naive approach using Graphics2D.drawImage() with TYPE_BYTE_BINARY relies on Java's default dithering, which produces noisy results on scanned documents. Otsu's method calculates the optimal threshold by analyzing the image histogram, giving much cleaner results on scanned pages.
To decide whether an image is a good candidate for CCITT, sample a subset of its pixels (e.g., every 10th pixel for performance). If 98% or more of the sampled pixels are near-black (brightness < 30) or near-white (brightness > 225), the image is bilevel and will compress well with CCITT.
Stream Compression with FlateDecode in PDFBox
Every PDF page has a content stream that contains the drawing instructions for that page (text positioning, line drawing, etc.). These streams can be stored either raw or compressed with FlateDecode (zlib/deflate).
PDFBox 3.x applies FlateDecode compression automatically when you call document.save(), so you don't need to handle this manually. This is one of the key improvements over PDFBox 2.x, where streams were saved uncompressed by default.
On top of content stream compression, PDFBox 3.x also supports object stream compression, which groups small PDF objects (metadata entries, cross-references) into shared compressed streams. To be explicit about it:
import org.apache.pdfbox.pdfwriter.compress.CompressParameters;
// Default behavior in PDFBox 3.x – streams are compressed automatically
document.save(outputFile);
// Explicitly enable object stream compression
document.save(outputFile, CompressParameters.DEFAULT_COMPRESSION);
// Disable compression (useful for debugging PDF internals)
document.save(outputFile, CompressParameters.NO_COMPRESSION);
Stream compression alone gives you 5-15% reduction on most documents – modest compared to image compression, but it's free and lossless.
Optimize PDF by Removing Unused Resources
After handling images and streams, you can squeeze out a few more kilobytes by cleaning up metadata and duplicate objects.
Stripping Document Metadata
PDF documents often carry XMP metadata, author information, keywords, and page thumbnails. If your use case doesn't require this data (e.g., the document title shown in PDF viewers or author info needed for compliance), removing it saves a few kilobytes:
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDDocumentInformation;
public class MetadataStripper {
public static void stripMetadata(PDDocument document) {
// Clear document information dictionary
document.setDocumentInformation(new PDDocumentInformation());
// Remove XMP metadata
document.getDocumentCatalog().setMetadata(null);
}
}
Deduplicating Images
Some PDF generators embed the same image multiple times, once for each page it appears on. You can detect duplicates by hashing the raw image bytes and replacing duplicates with references to a single copy:
import org.apache.pdfbox.cos.COSName;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDResources;
import org.apache.pdfbox.pdmodel.graphics.PDXObject;
import org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject;
import java.io.IOException;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import java.util.HashMap;
import java.util.HexFormat;
import java.util.Map;
public class ImageDeduplicator {
public static void deduplicateImages(PDDocument document)
throws IOException, NoSuchAlgorithmException {
Map<String, PDImageXObject> seen = new HashMap<>();
MessageDigest md5 = MessageDigest.getInstance("MD5");
for (PDPage page : document.getPages()) {
PDResources resources = page.getResources();
if (resources == null) continue;
for (COSName name : resources.getXObjectNames()) {
PDXObject xObject = resources.getXObject(name);
if (!(xObject instanceof PDImageXObject image)) continue;
byte[] rawBytes;
try (var is = image.getCOSObject().createRawInputStream()) {
rawBytes = is.readAllBytes();
}
String hash = HexFormat.of().formatHex(md5.digest(rawBytes));
PDImageXObject existing = seen.get(hash);
if (existing != null) {
// Replace with reference to the first occurrence
resources.put(name, existing);
} else {
seen.put(hash, image);
}
}
}
}
}
Complete Java PDF Compression Utility
This utility combines image recompression, downsampling, metadata removal, and error handling in a single pass. It skips images with alpha channels (since JPEG doesn't support transparency) and catches failures on individual images so a single problematic image won't abort the entire document.
Complete PDF compression utility (CompressPdf.java)
import org.apache.pdfbox.Loader;
import org.apache.pdfbox.cos.COSName;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDDocumentInformation;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDResources;
import org.apache.pdfbox.pdmodel.graphics.PDXObject;
import org.apache.pdfbox.pdmodel.graphics.form.PDFormXObject;
import org.apache.pdfbox.pdmodel.graphics.image.JPEGFactory;
import org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject;
import java.awt.Graphics2D;
import java.awt.RenderingHints;
import java.awt.image.BufferedImage;
import java.io.File;
import java.io.IOException;
public class CompressPdf {
private final float jpegQuality;
private final int targetDpi;
public CompressPdf(float jpegQuality, int targetDpi) {
this.jpegQuality = jpegQuality;
this.targetDpi = targetDpi;
}
public void compress(File input, File output) throws IOException {
try (PDDocument document = Loader.loadPDF(input)) {
for (PDPage page : document.getPages()) {
processResources(document, page.getResources());
}
// Strip metadata
document.setDocumentInformation(
new PDDocumentInformation());
document.getDocumentCatalog().setMetadata(null);
// PDFBox 3.x compresses streams by default
document.save(output);
}
}
private void processResources(PDDocument document,
PDResources resources)
throws IOException {
if (resources == null) return;
for (COSName name : resources.getXObjectNames()) {
PDXObject xObject = resources.getXObject(name);
if (xObject instanceof PDFormXObject form) {
// Recurse into nested form XObjects
processResources(document, form.getResources());
}
if (!(xObject instanceof PDImageXObject image)) continue;
try {
BufferedImage buffered = image.getImage();
// Skip images with alpha channels
if (buffered.getColorModel().hasAlpha()) continue;
int originalPixels =
buffered.getWidth() * buffered.getHeight();
// Downsample if image exceeds target DPI
// (simplified: cap total pixels based on targetDpi)
int maxPixels = targetDpi * targetDpi * 20;
if (originalPixels > maxPixels) {
double scale =
Math.sqrt((double) maxPixels / originalPixels);
int newW = (int) (buffered.getWidth() * scale);
int newH = (int) (buffered.getHeight() * scale);
BufferedImage scaled = new BufferedImage(
newW, newH, BufferedImage.TYPE_INT_RGB);
Graphics2D g = scaled.createGraphics();
g.setRenderingHint(
RenderingHints.KEY_INTERPOLATION,
RenderingHints.VALUE_INTERPOLATION_BICUBIC);
g.drawImage(buffered, 0, 0, newW, newH, null);
g.dispose();
buffered = scaled;
}
// Recompress as JPEG
PDImageXObject compressed =
JPEGFactory.createFromImage(
document, buffered, jpegQuality);
resources.put(name, compressed);
} catch (Exception e) {
// Keep the original image if compression fails
System.err.println("Skipping image " + name.getName()
+ ": " + e.getMessage());
}
}
}
public static void main(String[] args) throws IOException {
if (args.length < 2) {
System.err.println(
"Usage: CompressPdf <input.pdf> <output.pdf>");
System.exit(1);
}
File input = new File(args[0]);
File output = new File(args[1]);
long before = input.length();
// Medium quality: JPEG 0.65, target 200 DPI
CompressPdf compressor = new CompressPdf(0.65f, 200);
compressor.compress(input, output);
long after = output.length();
double reduction =
(1.0 - (double) after / before) * 100;
System.out.printf("Before: %,d bytes%n", before);
System.out.printf("After: %,d bytes%n", after);
System.out.printf("Reduction: %.1f%%%n", reduction);
}
}
If you're using Maven, run it with:
mvn exec:java -Dexec.mainClass="CompressPdf" \
-Dexec.args="report.pdf report-compressed.pdf"
PDF Size Reduction Results by Technique
The amount of file size reduction depends heavily on the content of your PDF. Here's what you can realistically expect from each technique:
| Technique | Typical reduction | Best for |
|---|---|---|
| JPEG recompression (quality 0.65) | 30-70% | PDFs with embedded PNG or uncompressed images |
| Image downsampling (300 to 150 DPI) | 40-75% | Scanned documents, high-resolution photos |
| CCITT Group 4 | 60-95% | Black and white scanned pages |
| FlateDecode stream compression | 5-15% | PDFs with uncompressed content streams |
| Object stream compression | 3-10% | Any PDF (free optimization when saving) |
| Metadata removal | 1-5% | PDFs with large XMP blocks or thumbnails |
| Image deduplication | 10-90% | Documents that repeat logos or headers |
In our tests on a 35.8 MB product catalog, JPEG recompression alone (quality 0.65) reduced the file to 6.3 MB (83%), and the complete utility (JPEG + downsampling + metadata removal) produced a 5.8 MB file (85%). CCITT conversion brought it down to 0.8 MB (98%), though at the cost of losing all color. On a separate 21.6 MB document with the same image repeated across 10 pages, deduplication alone cut the size to 2.3 MB (90%).
Text-only PDFs won't benefit much from image techniques, but stream and object compression will still help.
Limitations of Manual PDF Compression in Java
Building your own PDF compression pipeline with PDFBox works, but it comes with real tradeoffs:
- Transparency and color spaces – JPEG doesn't support alpha channels. CMYK images need different processing than RGB. ICC profiles can get lost during recompression. Handling all these cases correctly requires significant testing.
- Font subsetting – PDFBox can manipulate font resources, but properly subsetting an embedded font (keeping only the glyphs used in the document) involves parsing font tables, which is error-prone and not well-documented.
- Edge cases – PDF is a complex format with decades of backward compatibility. Images can be nested inside form XObjects (
PDFormXObject), and you'll encounter inline images, JBIG2-encoded content, and other structures that need special handling. - Maintenance – when PDFBox releases a new major version, your compression code needs updating. When you encounter a new PDF variant in production, you need to adapt.
If you're generating PDFs from HTML, optimizing your HTML for PDF output can reduce file sizes before compression even enters the picture.
Easier Alternative: Compress PDFs via API
PDFBox only compresses existing PDF files – you load a PDF from disk, process it, and save it back. If you need to generate and compress PDFs in one step (e.g., from HTML to PDF or URL to PDF), a PDF API can do both while also handling alpha channels, CMYK color spaces, and font subsetting. With PDFBolt, you add a compression parameter and the API takes care of the rest.
Four compression levels are available:
| Level | JPEG quality | Image DPI | Best for |
|---|---|---|---|
lossless | 1.0 | 300 | Documents where image quality cannot be compromised |
low | 0.80 | 300 | Client-facing reports, branded materials |
medium | 0.55 | 150 | Email attachments, web downloads |
high | 0.35 | 100 | Archival storage, bandwidth-constrained delivery |
Here's the Java equivalent of everything this guide covers, with far less code:
import java.net.http.HttpClient;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse;
import java.net.URI;
import java.nio.file.Files;
import java.nio.file.Paths;
public class CompressPdfWithApi {
public static void main(String[] args) throws Exception {
String jsonBody = """
{
"url": "https://example.com/report",
"compression": "medium"
}
""";
var client = HttpClient.newHttpClient();
var request = HttpRequest.newBuilder()
.uri(URI.create("https://api.pdfbolt.com/v1/direct"))
.header("API-KEY", "YOUR-API-KEY")
.header("Content-Type", "application/json")
.POST(HttpRequest.BodyPublishers.ofString(jsonBody))
.build();
var response = client.send(request,
HttpResponse.BodyHandlers.ofByteArray());
if (response.statusCode() == 200) {
Files.write(Paths.get("compressed.pdf"), response.body());
System.out.println("Compressed PDF saved");
} else {
System.err.println("Error: " + response.statusCode());
}
}
}
No PDFBox dependency, no image analysis, no edge case handling. The API applies CCITT encoding for bilevel images, smart alpha handling, image deduplication, and font optimization automatically.
To put the difference in perspective: our PDFBox CompressPdf utility reduced a 35.8 MB product catalog to 5.8 MB (85%). A similar catalog compressed via the PDFBolt API went from 23.7 MB down to 1.6 MB (93%) on the high level – because the API also handles font subsetting, alpha channel optimization, and grayscale detection that the manual approach skips. See the full results.
You can test compression levels in the PDFBolt playground before writing any code, or check the Java quick start guide for more examples.
Frequently Asked Questions
Does PDFBox compress PDFs by default when saving?
PDFBox 3.x applies FlateDecode and object stream compression by default when you call document.save(). However, it does not automatically recompress images or remove unused resources. For image-heavy PDFs, you need to handle image compression yourself using the techniques described in this guide. To disable default compression (e.g., for debugging), pass CompressParameters.NO_COMPRESSION to the save() method.
What JPEG quality setting should I use for PDF compression?
A value of 0.65 to 0.75 works well for most documents. At 0.75, the quality loss is barely noticeable on screen. At 0.65, you'll see slight artifacts on close inspection but it's fine for reports and internal documents. Going below 0.5 produces visible degradation and is only suitable for thumbnails or previews.
Can I compress a PDF without losing image quality?
Yes, but the size reduction will be smaller. Lossless techniques like downsampling, stream compression, deduplication, and metadata removal typically achieve 10-40% reduction compared to 40-70% with lossy JPEG recompression.
How do I handle PDF images with transparency during compression?
Images with alpha channels (transparency) cannot be compressed directly as JPEG. Check image.getImage().getColorModel().hasAlpha() before converting. To compress them without losing transparency, split the image: compress the RGB data with JPEGFactory and encode the alpha mask losslessly with LosslessFactory, then attach it as an SMask. This preserves transparency while still compressing the color data. For simple cases, you can skip transparent images entirely or composite them onto a white background before JPEG conversion.
Why does my compressed PDF sometimes get larger than the original?
This happens when the original PDF already uses efficient compression and your re-encoding introduces overhead. Common causes: converting an already-compressed JPEG to a new JPEG (double encoding), adding FlateDecode to streams that were already compressed with a different algorithm, or expanding a JBIG2-encoded image into a larger format. Always compare the output size and skip recompression when the result would be larger.
Is PDFBox or iText better for PDF compression in Java?
Both can handle PDF compression, but they differ in licensing. Apache PDFBox is fully open-source (Apache 2.0) and gives you low-level access to PDF internals. iText (AGPL / commercial license) has higher-level optimization APIs but requires a commercial license for closed-source projects. See our comparison of Java PDF libraries for details.
How much can I reduce a scanned PDF file size?
Scanned PDFs respond very well to compression. Combining downsampling (300 → 150 DPI), JPEG recompression, and CCITT Group 4 for black and white pages typically gives 60-80% reduction. CCITT alone can achieve 10-20x compression on bilevel images.
How do I compress a PDF programmatically in Java?
The most common open-source option is Apache PDFBox. Load the PDF with Loader.loadPDF(), iterate through each page's image resources, recompress them as JPEG with JPEGFactory, and call document.save() – PDFBox 3.x applies stream compression by default. The CompressPdf class in this guide combines all these steps into a single utility. For a managed alternative, PDF compression APIs like PDFBolt handle optimization in a single HTTP call.
Does PDFBox support font subsetting to reduce PDF size?
PDFBox supports font subsetting when embedding fonts during PDF generation (PDType0Font.load(doc, stream, true)), but it does not provide an API to re-subset fonts already embedded in an existing PDF. Reducing font size in a loaded document requires parsing font tables manually, which is complex and error-prone.
Conclusion
To shrink PDF file size with Apache PDFBox: recompress images as JPEG, downsample anything above your target DPI, use CCITT for black and white pages, and enable object stream compression when saving. Image optimization drives most of the reduction.
The CompressPdf class from this guide is ready to use as-is. Adapt the JPEG quality and target DPI to your needs, and add transparency handling if your documents require it. If maintaining your own compression code isn't worth it, the API approach shown above reduces the entire pipeline to a single HTTP call.
For more on what you can build with PDFBox, check out our Apache PDFBox tutorial covering PDF generation from scratch, or browse the full list of Java PDF libraries to compare alternatives.
Less is more. Especially when "more" won't fit in an email attachment. 📧
