How to Compress PDFs with Apache PDFBox in Java

March 25, 2026 · 15 min read

Michał Szymanowski

PDFBolt Co-Founder

How to compress PDF files using Apache PDFBox in Java

PDF files have a tendency to grow far beyond what you'd expect. A report with a handful of images can easily hit 20 or 30 MB, making it impractical to email, upload, or store at scale. If you're working with Java and already using Apache PDFBox, you can compress and optimize these files programmatically using only open-source tools.

This post covers how to use PDFBox to compress PDF files in Java – from image recompression and downsampling to stream compression and resource cleanup.

Why PDF Files Get Large

Before writing any compression code, you should know what actually makes a PDF file big. Most of the time, the culprit is one of these four:

Embedded images – by far the biggest contributor. A few uncompressed photographs at 300 DPI can add 10-25 MB to a document. Many PDF generators embed images at their original resolution, even when the image displays at a fraction of that size on the page.
Fonts – add up when the full font file is embedded rather than just the glyphs the document actually uses. A TTF font can be 200-500 KB, and if you're embedding four or five fonts, that's 1-2 MB of font data alone.
Uncompressed content streams – the drawing instructions for each page (text positioning, line drawing, etc.). Flate compression can significantly reduce their size, but not all PDF generators apply it.
Metadata and duplicate objects – XMP metadata blocks, thumbnails, and identical images embedded multiple times instead of being referenced once.

Knowing where the bloat comes from tells you which techniques will give you the biggest return. In practice, image compression usually accounts for the largest share of size reduction.

PDFBox Project Setup

You'll need Apache PDFBox 3.x and Java 8 or later. If you're new to PDFBox, our PDFBox tutorial covers PDF generation, custom fonts, and encryption. For compression, add the dependency to your pom.xml:

<dependency>
    <groupId>org.apache.pdfbox</groupId>
    <artifactId>pdfbox</artifactId>
    <version>3.0.7</version>
</dependency>

Or if you use Gradle:

implementation 'org.apache.pdfbox:pdfbox:3.0.7'

PDFBox 3.x vs 2.x

PDFBox 3.0 introduced CompressParameters for object stream compression when saving documents. The examples in this guide use the 3.x API and Java 17 features (pattern matching for instanceof, HexFormat). If you're on PDFBox 2.x, the general approach to image manipulation is similar, though the loading API changed (PDDocument.load() vs Loader.loadPDF()) and you won't have access to CompressParameters. See the PDFBox 3.0 migration guide for all differences.

Compress PDF Images with PDFBox

Image recompression gives you the biggest file size reduction. The idea is simple: iterate through every page, find all embedded images, and replace oversized or poorly compressed images with optimized versions.

Recompressing Images as JPEG

Many PDFs embed images in lossless formats (PNG or raw pixel data) when JPEG would be perfectly acceptable. Converting these to JPEG with a reasonable quality setting can shrink each image by 5-10x.

import org.apache.pdfbox.Loader;
import org.apache.pdfbox.cos.COSName;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDResources;
import org.apache.pdfbox.pdmodel.graphics.PDXObject;
import org.apache.pdfbox.pdmodel.graphics.image.JPEGFactory;
import org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject;

import java.awt.image.BufferedImage;
import java.io.File;
import java.io.IOException;

public class PdfImageCompressor {

    public static void compressImages(File inputFile, File outputFile,
                                      float jpegQuality) throws IOException {
        try (PDDocument document = Loader.loadPDF(inputFile)) {
            for (PDPage page : document.getPages()) {
                PDResources resources = page.getResources();
                if (resources == null) continue;

                for (COSName name : resources.getXObjectNames()) {
                    PDXObject xObject = resources.getXObject(name);
                    if (!(xObject instanceof PDImageXObject image)) continue;

                    // Skip images with transparency (JPEG doesn't support alpha)
                    BufferedImage buffered = image.getImage();
                    if (buffered.getColorModel().hasAlpha()) continue;

                    PDImageXObject compressed =
                        JPEGFactory.createFromImage(document, buffered, jpegQuality);
                    resources.put(name, compressed);
                }
            }
            document.save(outputFile);
        }
    }
}

A jpegQuality of 0.75f gives a good balance between file size and visual quality. For documents where image fidelity matters less (internal reports, email attachments), you can go as low as 0.5f.

Transparency handling

JPEG does not support transparency. Converting an image with alpha channels directly to JPEG will flatten the transparency to a black background. The code above skips these images, but if you want to compress them too:

Split the image into RGB (compress with JPEGFactory) and alpha mask (encode losslessly with LosslessFactory), then attach the mask via jpegImage.getCOSObject().setItem(COSName.SMASK, alphaMask)
Small images (under 64px) or bitmask transparency: keep lossless with LosslessFactory.createFromImage()

Downsampling High-Resolution Images

A 4000x3000 pixel photograph displayed in a 400x300 point area on the page is wasting most of its pixels. Downsampling these images to match their actual display size (with a safety margin for print quality) can reduce their data by 75% or more, depending on the target DPI.

import java.awt.Graphics2D;
import java.awt.RenderingHints;
import java.awt.image.BufferedImage;

public class ImageDownsampler {

    /**
     * Downscale an image if its effective DPI exceeds the target.
     *
     * @param image     the original image
     * @param widthPt   display width on the PDF page (in points)
     * @param heightPt  display height on the PDF page (in points)
     * @param targetDpi maximum DPI to keep (e.g. 150 for screen, 300 for print)
     * @return downscaled image, or the original if already below target DPI
     */
    public static BufferedImage downsample(BufferedImage image,
                                           float widthPt, float heightPt,
                                           int targetDpi) {
        float currentDpiX = image.getWidth() / (widthPt / 72f);
        float currentDpiY = image.getHeight() / (heightPt / 72f);

        if (currentDpiX <= targetDpi && currentDpiY <= targetDpi) {
            return image; // already within target DPI
        }

        int newWidth = Math.round(widthPt / 72f * targetDpi);
        int newHeight = Math.round(heightPt / 72f * targetDpi);

        BufferedImage scaled = new BufferedImage(
            newWidth, newHeight, image.getType());
        Graphics2D g = scaled.createGraphics();
        g.setRenderingHint(RenderingHints.KEY_INTERPOLATION,
            RenderingHints.VALUE_INTERPOLATION_BICUBIC);
        g.drawImage(image, 0, 0, newWidth, newHeight, null);
        g.dispose();

        return scaled;
    }
}

To find the display dimensions of an image on the page, you need to look at the page's content stream and the current transformation matrix (CTM). For a simpler approach, you can skip the CTM parsing and just cap all images at a fixed DPI (e.g., 150 DPI for screen-quality documents, 300 DPI for print-ready PDFs). The complete compression utility later in this article uses a simpler pixel-count cap instead of calculating display dimensions.

CCITT Group 4 for Black and White Images

If your PDFs contain scanned documents or line art where the image is purely black and white, CCITT Group 4 compression is far more efficient than JPEG. This lossless compression format is specifically designed for bilevel images and typically achieves 10-20x compression ratios on text-heavy scans.

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.graphics.image.CCITTFactory;
import org.apache.pdfbox.pdmodel.graphics.image.LosslessFactory;
import org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject;

import java.awt.image.BufferedImage;
import java.io.IOException;

public class BitonalCompressor {

    public static PDImageXObject compressAsCCITT(
            PDDocument document, BufferedImage image) throws IOException {
        // Convert to grayscale
        BufferedImage gray = new BufferedImage(
            image.getWidth(), image.getHeight(),
            BufferedImage.TYPE_BYTE_GRAY);
        var g = gray.createGraphics();
        g.drawImage(image, 0, 0, null);
        g.dispose();

        // Find optimal threshold using Otsu's method
        int threshold = calculateOtsuThreshold(gray);

        // Apply threshold to create a binary image
        BufferedImage binary = new BufferedImage(
            gray.getWidth(), gray.getHeight(),
            BufferedImage.TYPE_BYTE_BINARY);
        for (int y = 0; y < gray.getHeight(); y++) {
            for (int x = 0; x < gray.getWidth(); x++) {
                int pixel = gray.getRGB(x, y) & 0xFF;
                binary.setRGB(x, y, pixel > threshold ? 0xFFFFFF : 0x000000);
            }
        }

        try {
            return CCITTFactory.createFromImage(document, binary);
        } catch (Exception e) {
            // Fall back to lossless if CCITT encoding fails
            return LosslessFactory.createFromImage(document, binary);
        }
    }

    private static int calculateOtsuThreshold(BufferedImage gray) {
        int[] histogram = new int[256];
        for (int y = 0; y < gray.getHeight(); y++) {
            for (int x = 0; x < gray.getWidth(); x++) {
                histogram[gray.getRGB(x, y) & 0xFF]++;
            }
        }

        int totalPixels = gray.getWidth() * gray.getHeight();
        double sum = 0;
        for (int i = 0; i < 256; i++) sum += i * histogram[i];

        double sumB = 0;
        int wB = 0;
        double maxVariance = 0;
        int bestThreshold = 0;

        for (int t = 0; t < 256; t++) {
            wB += histogram[t];
            if (wB == 0) continue;
            int wF = totalPixels - wB;
            if (wF == 0) break;

            sumB += t * histogram[t];
            double mB = sumB / wB;
            double mF = (sum - sumB) / wF;
            double variance = (double) wB * wF * (mB - mF) * (mB - mF);

            if (variance > maxVariance) {
                maxVariance = variance;
                bestThreshold = t;
            }
        }
        return bestThreshold;
    }
}

A naive approach using Graphics2D.drawImage() with TYPE_BYTE_BINARY relies on Java's default dithering, which produces noisy results on scanned documents. Otsu's method calculates the optimal threshold by analyzing the image histogram, giving much cleaner results on scanned pages.

To decide whether an image is a good candidate for CCITT, sample a subset of its pixels (e.g., every 10th pixel for performance). If 98% or more of the sampled pixels are near-black (brightness < 30) or near-white (brightness > 225), the image is bilevel and will compress well with CCITT.

Stream Compression with FlateDecode in PDFBox

Every PDF page has a content stream that contains the drawing instructions for that page (text positioning, line drawing, etc.). These streams can be stored either raw or compressed with FlateDecode (zlib/deflate).

PDFBox 3.x applies FlateDecode compression automatically when you call document.save(), so you don't need to handle this manually. This is one of the key improvements over PDFBox 2.x, where streams were saved uncompressed by default.

On top of content stream compression, PDFBox 3.x also supports object stream compression, which groups small PDF objects (metadata entries, cross-references) into shared compressed streams. To be explicit about it:

import org.apache.pdfbox.pdfwriter.compress.CompressParameters;

// Default behavior in PDFBox 3.x – streams are compressed automatically
document.save(outputFile);

// Explicitly enable object stream compression
document.save(outputFile, CompressParameters.DEFAULT_COMPRESSION);

// Disable compression (useful for debugging PDF internals)
document.save(outputFile, CompressParameters.NO_COMPRESSION);

Stream compression alone gives you 5-15% reduction on most documents – modest compared to image compression, but it's free and lossless.

Optimize PDF by Removing Unused Resources

After handling images and streams, you can squeeze out a few more kilobytes by cleaning up metadata and duplicate objects.

Stripping Document Metadata

PDF documents often carry XMP metadata, author information, keywords, and page thumbnails. If your use case doesn't require this data (e.g., the document title shown in PDF viewers or author info needed for compliance), removing it saves a few kilobytes:

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDDocumentInformation;

public class MetadataStripper {

    public static void stripMetadata(PDDocument document) {
        // Clear document information dictionary
        document.setDocumentInformation(new PDDocumentInformation());

        // Remove XMP metadata
        document.getDocumentCatalog().setMetadata(null);
    }
}

Deduplicating Images

Some PDF generators embed the same image multiple times, once for each page it appears on. You can detect duplicates by hashing the raw image bytes and replacing duplicates with references to a single copy:

import org.apache.pdfbox.cos.COSName;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDResources;
import org.apache.pdfbox.pdmodel.graphics.PDXObject;
import org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject;

import java.io.IOException;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import java.util.HashMap;
import java.util.HexFormat;
import java.util.Map;

public class ImageDeduplicator {

    public static void deduplicateImages(PDDocument document)
            throws IOException, NoSuchAlgorithmException {
        Map<String, PDImageXObject> seen = new HashMap<>();
        MessageDigest md5 = MessageDigest.getInstance("MD5");

        for (PDPage page : document.getPages()) {
            PDResources resources = page.getResources();
            if (resources == null) continue;

            for (COSName name : resources.getXObjectNames()) {
                PDXObject xObject = resources.getXObject(name);
                if (!(xObject instanceof PDImageXObject image)) continue;

                byte[] rawBytes;
                try (var is = image.getCOSObject().createRawInputStream()) {
                    rawBytes = is.readAllBytes();
                }

                String hash = HexFormat.of().formatHex(md5.digest(rawBytes));
                PDImageXObject existing = seen.get(hash);
                if (existing != null) {
                    // Replace with reference to the first occurrence
                    resources.put(name, existing);
                } else {
                    seen.put(hash, image);
                }
            }
        }
    }
}

Complete Java PDF Compression Utility

This utility combines image recompression, downsampling, metadata removal, and error handling in a single pass. It skips images with alpha channels (since JPEG doesn't support transparency) and catches failures on individual images so a single problematic image won't abort the entire document.

Complete PDF compression utility (CompressPdf.java)

import org.apache.pdfbox.Loader;
import org.apache.pdfbox.cos.COSName;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDDocumentInformation;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDResources;
import org.apache.pdfbox.pdmodel.graphics.PDXObject;
import org.apache.pdfbox.pdmodel.graphics.form.PDFormXObject;
import org.apache.pdfbox.pdmodel.graphics.image.JPEGFactory;
import org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject;

import java.awt.Graphics2D;
import java.awt.RenderingHints;
import java.awt.image.BufferedImage;
import java.io.File;
import java.io.IOException;

public class CompressPdf {

    private final float jpegQuality;
    private final int targetDpi;

    public CompressPdf(float jpegQuality, int targetDpi) {
        this.jpegQuality = jpegQuality;
        this.targetDpi = targetDpi;
    }

    public void compress(File input, File output) throws IOException {
        try (PDDocument document = Loader.loadPDF(input)) {
            for (PDPage page : document.getPages()) {
                processResources(document, page.getResources());
            }

            // Strip metadata
            document.setDocumentInformation(
                new PDDocumentInformation());
            document.getDocumentCatalog().setMetadata(null);

            // PDFBox 3.x compresses streams by default
            document.save(output);
        }
    }

    private void processResources(PDDocument document,
                                  PDResources resources)
            throws IOException {
        if (resources == null) return;

        for (COSName name : resources.getXObjectNames()) {
            PDXObject xObject = resources.getXObject(name);

            if (xObject instanceof PDFormXObject form) {
                // Recurse into nested form XObjects
                processResources(document, form.getResources());
            }

            if (!(xObject instanceof PDImageXObject image)) continue;

            try {
                BufferedImage buffered = image.getImage();

                // Skip images with alpha channels
                if (buffered.getColorModel().hasAlpha()) continue;

                int originalPixels =
                    buffered.getWidth() * buffered.getHeight();

                // Downsample if image exceeds target DPI
                // (simplified: cap total pixels based on targetDpi)
                int maxPixels = targetDpi * targetDpi * 20;
                if (originalPixels > maxPixels) {
                    double scale =
                        Math.sqrt((double) maxPixels / originalPixels);
                    int newW = (int) (buffered.getWidth() * scale);
                    int newH = (int) (buffered.getHeight() * scale);

                    BufferedImage scaled = new BufferedImage(
                        newW, newH, BufferedImage.TYPE_INT_RGB);
                    Graphics2D g = scaled.createGraphics();
                    g.setRenderingHint(
                        RenderingHints.KEY_INTERPOLATION,
                        RenderingHints.VALUE_INTERPOLATION_BICUBIC);
                    g.drawImage(buffered, 0, 0, newW, newH, null);
                    g.dispose();
                    buffered = scaled;
                }

                // Recompress as JPEG
                PDImageXObject compressed =
                    JPEGFactory.createFromImage(
                        document, buffered, jpegQuality);
                resources.put(name, compressed);
            } catch (Exception e) {
                // Keep the original image if compression fails
                System.err.println("Skipping image " + name.getName()
                    + ": " + e.getMessage());
            }
        }
    }

    public static void main(String[] args) throws IOException {
        if (args.length < 2) {
            System.err.println(
                "Usage: CompressPdf <input.pdf> <output.pdf>");
            System.exit(1);
        }

        File input = new File(args[0]);
        File output = new File(args[1]);

        long before = input.length();

        // Medium quality: JPEG 0.65, target 200 DPI
        CompressPdf compressor = new CompressPdf(0.65f, 200);
        compressor.compress(input, output);

        long after = output.length();
        double reduction =
            (1.0 - (double) after / before) * 100;

        System.out.printf("Before: %,d bytes%n", before);
        System.out.printf("After:  %,d bytes%n", after);
        System.out.printf("Reduction: %.1f%%%n", reduction);
    }
}

If you're using Maven, run it with:

mvn exec:java -Dexec.mainClass="CompressPdf" \
  -Dexec.args="report.pdf report-compressed.pdf"

PDF Size Reduction Results by Technique

The amount of file size reduction depends heavily on the content of your PDF. Here's what you can realistically expect from each technique:

Technique	Typical reduction	Best for
JPEG recompression (quality 0.65)	30-70%	PDFs with embedded PNG or uncompressed images
Image downsampling (300 to 150 DPI)	40-75%	Scanned documents, high-resolution photos
CCITT Group 4	60-95%	Black and white scanned pages
FlateDecode stream compression	5-15%	PDFs with uncompressed content streams
Object stream compression	3-10%	Any PDF (free optimization when saving)
Metadata removal	1-5%	PDFs with large XMP blocks or thumbnails
Image deduplication	10-90%	Documents that repeat logos or headers

In our tests on a 35.8 MB product catalog, JPEG recompression alone (quality 0.65) reduced the file to 6.3 MB (83%), and the complete utility (JPEG + downsampling + metadata removal) produced a 5.8 MB file (85%). CCITT conversion brought it down to 0.8 MB (98%), though at the cost of losing all color. On a separate 21.6 MB document with the same image repeated across 10 pages, deduplication alone cut the size to 2.3 MB (90%).

Text-only PDFs won't benefit much from image techniques, but stream and object compression will still help.

Limitations of Manual PDF Compression in Java

Building your own PDF compression pipeline with PDFBox works, but it comes with real tradeoffs:

Transparency and color spaces – JPEG doesn't support alpha channels. CMYK images need different processing than RGB. ICC profiles can get lost during recompression. Handling all these cases correctly requires significant testing.
Font subsetting – PDFBox can manipulate font resources, but properly subsetting an embedded font (keeping only the glyphs used in the document) involves parsing font tables, which is error-prone and not well-documented.
Edge cases – PDF is a complex format with decades of backward compatibility. Images can be nested inside form XObjects (PDFormXObject), and you'll encounter inline images, JBIG2-encoded content, and other structures that need special handling.
Maintenance – when PDFBox releases a new major version, your compression code needs updating. When you encounter a new PDF variant in production, you need to adapt.

If you're generating PDFs from HTML, optimizing your HTML for PDF output can reduce file sizes before compression even enters the picture.

Easier Alternative: Compress PDFs via API

PDFBox only compresses existing PDF files – you load a PDF from disk, process it, and save it back. If you need to generate and compress PDFs in one step (e.g., from HTML to PDF or URL to PDF), a PDF API can do both while also handling alpha channels, CMYK color spaces, and font subsetting. With PDFBolt, you add a compression parameter and the API takes care of the rest.

Four compression levels are available:

Level	JPEG quality	Image DPI	Best for
`lossless`	1.0	300	Documents where image quality cannot be compromised
`low`	0.80	300	Client-facing reports, branded materials
`medium`	0.55	150	Email attachments, web downloads
`high`	0.35	100	Archival storage, bandwidth-constrained delivery

Here's the Java equivalent of everything this guide covers, with far less code:

import java.net.http.HttpClient;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse;
import java.net.URI;
import java.nio.file.Files;
import java.nio.file.Paths;

public class CompressPdfWithApi {
    public static void main(String[] args) throws Exception {
        String jsonBody = """
            {
                "url": "https://example.com/report",
                "compression": "medium"
            }
            """;

        var client = HttpClient.newHttpClient();
        var request = HttpRequest.newBuilder()
                .uri(URI.create("https://api.pdfbolt.com/v1/direct"))
                .header("API-KEY", "YOUR-API-KEY")
                .header("Content-Type", "application/json")
                .POST(HttpRequest.BodyPublishers.ofString(jsonBody))
                .build();

        var response = client.send(request,
            HttpResponse.BodyHandlers.ofByteArray());

        if (response.statusCode() == 200) {
            Files.write(Paths.get("compressed.pdf"), response.body());
            System.out.println("Compressed PDF saved");
        } else {
            System.err.println("Error: " + response.statusCode());
        }
    }
}

No PDFBox dependency, no image analysis, no edge case handling. The API applies CCITT encoding for bilevel images, smart alpha handling, image deduplication, and font optimization automatically.

To put the difference in perspective: our PDFBox CompressPdf utility reduced a 35.8 MB product catalog to 5.8 MB (85%). A similar catalog compressed via the PDFBolt API went from 23.7 MB down to 1.6 MB (93%) on the high level – because the API also handles font subsetting, alpha channel optimization, and grayscale detection that the manual approach skips. See the full results.

You can test compression levels in the PDFBolt playground before writing any code, or check the Java quick start guide for more examples.

Frequently Asked Questions

Does PDFBox compress PDFs by default when saving?

PDFBox 3.x applies FlateDecode and object stream compression by default when you call document.save(). However, it does not automatically recompress images or remove unused resources. For image-heavy PDFs, you need to handle image compression yourself using the techniques described in this guide. To disable default compression (e.g., for debugging), pass CompressParameters.NO_COMPRESSION to the save() method.

What JPEG quality setting should I use for PDF compression?

A value of 0.65 to 0.75 works well for most documents. At 0.75, the quality loss is barely noticeable on screen. At 0.65, you'll see slight artifacts on close inspection but it's fine for reports and internal documents. Going below 0.5 produces visible degradation and is only suitable for thumbnails or previews.

Can I compress a PDF without losing image quality?

Yes, but the size reduction will be smaller. Lossless techniques like downsampling, stream compression, deduplication, and metadata removal typically achieve 10-40% reduction compared to 40-70% with lossy JPEG recompression.

How do I handle PDF images with transparency during compression?

Images with alpha channels (transparency) cannot be compressed directly as JPEG. Check image.getImage().getColorModel().hasAlpha() before converting. To compress them without losing transparency, split the image: compress the RGB data with JPEGFactory and encode the alpha mask losslessly with LosslessFactory, then attach it as an SMask. This preserves transparency while still compressing the color data. For simple cases, you can skip transparent images entirely or composite them onto a white background before JPEG conversion.

Why does my compressed PDF sometimes get larger than the original?

This happens when the original PDF already uses efficient compression and your re-encoding introduces overhead. Common causes: converting an already-compressed JPEG to a new JPEG (double encoding), adding FlateDecode to streams that were already compressed with a different algorithm, or expanding a JBIG2-encoded image into a larger format. Always compare the output size and skip recompression when the result would be larger.

Is PDFBox or iText better for PDF compression in Java?

Both can handle PDF compression, but they differ in licensing. Apache PDFBox is fully open-source (Apache 2.0) and gives you low-level access to PDF internals. iText (AGPL / commercial license) has higher-level optimization APIs but requires a commercial license for closed-source projects. See our comparison of Java PDF libraries for details.

How much can I reduce a scanned PDF file size?

Scanned PDFs respond very well to compression. Combining downsampling (300 → 150 DPI), JPEG recompression, and CCITT Group 4 for black and white pages typically gives 60-80% reduction. CCITT alone can achieve 10-20x compression on bilevel images.

How do I compress a PDF programmatically in Java?

The most common open-source option is Apache PDFBox. Load the PDF with Loader.loadPDF(), iterate through each page's image resources, recompress them as JPEG with JPEGFactory, and call document.save() – PDFBox 3.x applies stream compression by default. The CompressPdf class in this guide combines all these steps into a single utility. For a managed alternative, PDF compression APIs like PDFBolt handle optimization in a single HTTP call.

Does PDFBox support font subsetting to reduce PDF size?

PDFBox supports font subsetting when embedding fonts during PDF generation (PDType0Font.load(doc, stream, true)), but it does not provide an API to re-subset fonts already embedded in an existing PDF. Reducing font size in a loaded document requires parsing font tables manually, which is complex and error-prone.

Conclusion

To shrink PDF file size with Apache PDFBox: recompress images as JPEG, downsample anything above your target DPI, use CCITT for black and white pages, and enable object stream compression when saving. Image optimization drives most of the reduction.

The CompressPdf class from this guide is ready to use as-is. Adapt the JPEG quality and target DPI to your needs, and add transparency handling if your documents require it. If maintaining your own compression code isn't worth it, the API approach shown above reduces the entire pipeline to a single HTTP call.

For more on what you can build with PDFBox, check out our Apache PDFBox tutorial covering PDF generation from scratch, or browse the full list of Java PDF libraries to compare alternatives.

Less is more. Especially when "more" won't fit in an email attachment. 📧

Why PDF Files Get Large​

PDFBox Project Setup​

Compress PDF Images with PDFBox​

Recompressing Images as JPEG​

Downsampling High-Resolution Images​

CCITT Group 4 for Black and White Images​

Stream Compression with FlateDecode in PDFBox​

Optimize PDF by Removing Unused Resources​

Stripping Document Metadata​

Deduplicating Images​

Complete Java PDF Compression Utility​

PDF Size Reduction Results by Technique​

Limitations of Manual PDF Compression in Java​

Easier Alternative: Compress PDFs via API​

Frequently Asked Questions​

Conclusion​