Why Generating PDFs in PHP Is Its Own Skill
Invoices, receipts, contracts, reports — almost every business application eventually needs to produce a PDF document, and the path to doing it well in PHP is less obvious than most other features, because PHP has no native PDF support and the available libraries vary widely in approach and capability.
The Two Main Approaches
There are broadly two ways to generate a PDF from PHP: render HTML/CSS and convert it to PDF (libraries like DomPDF or wkhtmltopdf-based tools), or build the PDF programmatically element by element (libraries like TCPDF or FPDF, drawing text and shapes at specific coordinates). The HTML-to-PDF approach is far easier for anyone who already knows HTML/CSS and wants reasonably complex layouts; the programmatic approach gives more precise control but means learning a coordinate-based drawing API.
HTML-to-PDF in Practice
use Dompdf\Dompdf;
$html = view('invoices.pdf-template', ['invoice' => $invoice])->render();
$dompdf = new Dompdf();
$dompdf->loadHtml($html);
$dompdf->setPaper('A4', 'portrait');
$dompdf->render();
$dompdf->stream("invoice-{$invoice->number}.pdf");The big advantage here: the same templating skills used for the website's actual pages directly produce the PDF layout, and design changes are CSS changes, not redrawing coordinates.
What Trips People Up: CSS Support Is Limited
HTML-to-PDF libraries do not implement a full browser rendering engine — DomPDF in particular has well-known gaps in CSS support (flexbox and grid are unreliable, some modern CSS simply does not render as expected). Invoice templates need to be built with simpler, table-based or basic block layouts rather than modern CSS layout techniques, which can feel like a step backward for a developer used to writing regular web pages.
Embedding Images and Fonts
A PDF that needs a company logo or a non-default font requires those assets to be embedded directly, since the PDF must be self-contained and renderable without internet access on the viewer's end:
$dompdf->getOptions()->setIsRemoteEnabled(true); // allow loading remote images, with care
$dompdf->getOptions()->setDefaultFont('DejaVu Sans'); // a font with broad Unicode coverageSetting setIsRemoteEnabled(true) deserves a security note: it allows the PDF renderer to fetch remote URLs found in the HTML, which is a potential server-side request forgery vector if any part of the HTML being rendered comes from user input. Prefer embedding images as local files or base64 data URIs over remote URLs when the content is not fully trusted.
Performance: Generating PDFs Asynchronously
PDF generation, especially for multi-page reports with charts or many line items, can take a noticeable amount of time — long enough that generating it synchronously during a web request creates a poor user experience or even times out. The better pattern is queuing PDF generation as a background job and notifying the user (or polling) once it is ready:
GenerateInvoicePdf::dispatch($invoice)->onQueue('pdfs');Storing Generated PDFs
For documents with legal or financial weight — invoices, signed contracts — store the generated PDF itself rather than regenerating it on demand every time it is viewed. Business data (prices, tax rates, company details) can change after an invoice was issued, and regenerating "the same" invoice later with updated data produces a document that no longer matches what was actually sent to the customer at the time.
Programmatic Generation for Precise Layouts
For cases where exact positioning matters more than design flexibility — printing onto a pre-printed form, generating a shipping label to a strict template — TCPDF's coordinate-based API is the better fit:
$pdf = new TCPDF();
$pdf->AddPage();
$pdf->SetFont('helvetica', 'B', 16);
$pdf->Text(20, 20, 'INVOICE');
$pdf->SetFont('helvetica', '', 10);
$pdf->Text(20, 35, 'Invoice #: ' . $invoice->number);
$pdf->Output('invoice.pdf', 'D');Closing Thought
PDF generation in PHP is a solved problem in the sense that mature libraries exist for both major approaches, but it is not a zero-thought problem — the choice between HTML-to-PDF and programmatic drawing, the CSS limitations of HTML renderers, and the decision to generate synchronously versus in a background job all meaningfully affect whether the feature feels solid or fragile once real users depend on it.
Need reliable invoice or report generation built into your platform? Let's talk.
Pagination and Page Breaks in Multi-Page PDFs
A report spanning many pages needs explicit thought about where page breaks fall — a table row split awkwardly across two pages looks unprofessional and confuses the reader. HTML-to-PDF libraries support CSS page-break properties for this:
.invoice-table tr { page-break-inside: avoid; }
.section-header { page-break-before: always; }These rules tell the renderer to keep a table row intact rather than splitting it, and to force certain sections to always start on a fresh page — small details, but the kind that separate a PDF that looks machine-generated from one that looks deliberately designed.
Headers, Footers, and Page Numbers
Multi-page reports almost always need a repeated header (company name, document title) and footer (page X of Y, generation date) on every page. Most PDF libraries provide a callback mechanism specifically for this, since header/footer content has to be drawn on every page independently of the main content flow:
$dompdf->getCanvas()->page_text(500, 800, "Page {PAGE_NUM} of {PAGE_COUNT}", null, 9);Generating PDFs in Different Languages and Character Sets
A PDF needs to embed whichever font covers the character set being rendered — the default fonts bundled with most PDF libraries cover basic Latin characters well but often fail silently (rendering boxes or blank space) for Cyrillic, Arabic, or CJK text. Choosing a font with broad Unicode coverage (DejaVu Sans is a common choice specifically because of this) and testing with real non-Latin sample data before launch avoids an embarrassing bug that monolingual testing during development would never surface.
Securing Generated PDFs
For documents containing sensitive data, two further considerations matter: setting the correct HTTP headers so the PDF downloads rather than displays inline in contexts where that matters, and, for genuinely sensitive documents, applying PDF-level password protection or restricting printing/copying through the library's encryption options, rather than relying solely on access control at the point the download link is served.
Generating PDFs from Existing Web Pages Versus Dedicated Templates
It is tempting to reuse an existing web page's template directly for PDF generation, since the content and layout already exist. This usually produces a poor result: web pages are designed for screens with navigation, interactive elements, and responsive layouts that have no meaning on a printed page. A dedicated PDF template — sharing the underlying data and perhaps some base styles, but laid out specifically for a fixed page size with print-appropriate typography — consistently produces better results than trying to force a responsive web layout into a static document format.
Testing PDF Output
PDF generation is one of the easier features to under-test, because a passing test that confirms "a PDF byte stream was returned" tells you almost nothing about whether the actual rendered document looks correct. A more meaningful test confirms the PDF contains expected text content (most PDF libraries expose the data used to generate it, or a parsing library can extract text back out for assertions) and, for visually critical documents, periodic manual review of the actual rendered output remains valuable in a way that is hard to fully automate.
public function testInvoicePdfContainsCustomerNameAndTotal()
{
$pdfContent = $this->generateInvoicePdf($invoice);
$text = (new \Smalot\PdfParser\Parser())->parseContent($pdfContent)->getText();
$this->assertStringContainsString($invoice->customer_name, $text);
$this->assertStringContainsString(number_format($invoice->total, 2), $text);
}Localizing Number and Currency Formats in Generated Documents
An invoice generated for an international customer base needs to format currency and dates according to the recipient's locale, not a single hardcoded format — a thousand separator and decimal point that are swapped between European and US conventions, a date written day-first versus month-first, can genuinely confuse a recipient or look unprofessional. PHP's NumberFormatter and locale-aware date formatting handle this correctly when given the right locale, rather than hand-rolling string formatting that only works for one region.
Storage and Retention Considerations
Generated PDFs, especially for financial documents, often carry legal retention requirements — tax authorities in many jurisdictions expect invoices to be retrievable for several years. Plan storage (and backup) of generated documents with this in mind from the start, rather than treating them as disposable artifacts that can simply be regenerated later if needed; as covered earlier, regenerating later from current data may not match what was actually issued at the time.
Case Study: An Invoice PDF That Looked Different on Every Customer's Printer
A B2B platform generated PDF invoices using percentage-based CSS widths inherited from the company's main website stylesheet, reused without modification for the PDF template. On screen, this looked fine; printed on different paper sizes and printer drivers across customers in different countries, table columns wrapped unpredictably and occasionally pushed the total amount onto a second page by itself, looking like an error. The underlying issue was treating the PDF template as "the website, but exported" rather than as its own fixed-size document with its own layout rules. The fix, eventually, was a dedicated invoice template using fixed pixel or millimeter widths matched to the target paper size (A4), tested by literally printing sample output on a few different printers before considering the feature done — a manual verification step that automated tests alone would not have caught, since the bug was about visual layout on physical paper, not about the data being correct.
A Glossary for This Topic
Rasterization: the underlying process of converting a layout description (HTML/CSS, or direct drawing commands) into the fixed pixel/vector content actually embedded in a PDF page. Embedded fonts: fonts saved directly inside the PDF file so it renders identically on any viewer, regardless of what fonts happen to be installed on that viewer's machine. Streaming a PDF: sending the generated document directly to the browser as the HTTP response, versus saving it to disk first and serving a link to the saved file.
Frequently Asked Questions
Can I let users edit a PDF after it's generated? Not meaningfully — PDFs are designed as a fixed, print-ready format, not an editable document format. If editing is genuinely needed, keep the underlying structured data editable in your application and regenerate the PDF from updated data, rather than trying to edit the PDF file itself.
Why is my generated PDF so much larger in file size than expected? Usually embedded images at full original resolution, or embedded fonts including glyphs that are never actually used in the document. Compressing images before embedding and subsetting fonts (including only the characters actually used) both meaningfully reduce file size.
Do I need a real browser engine like headless Chrome instead of DomPDF? For genuinely complex modern CSS layouts (real flexbox, real grid, JavaScript-rendered content), yes — tools that drive a real browser to print to PDF handle modern CSS far more faithfully than DomPDF's custom rendering engine, at the cost of needing a heavier runtime environment to run that browser.
Step-by-Step: Building a Branded Invoice PDF From Scratch
Step one: design the invoice layout first as a regular HTML page in the browser, since iterating on layout there is far faster than repeatedly regenerating a PDF to check small CSS tweaks. Step two: once the layout looks right on screen, restrict the CSS to properties known to render reliably in your chosen PDF library, removing anything relying on modern layout features the renderer does not support well. Step three: extract the page into a dedicated PDF-only template (not shared with the live website's layout), feeding it the same underlying invoice data. Step four: add header/footer callbacks for page numbers and repeated branding elements, and add explicit page-break rules around tables and totals so multi-page invoices never split awkwardly. Step five: test with realistic edge-case data — a customer name in a non-Latin script, a very long product description, an invoice with forty line items spanning several pages — rather than only testing with clean, short sample data that never reveals layout problems. Step six: move generation to a background queue if invoices are ever generated in bulk (a monthly batch run for hundreds of customers at once), to avoid one slow request blocking the entire process.
A Comparison Table: PDF Libraries at a Glance
DomPDF: pure PHP, no external dependencies to install, good for moderately complex HTML/CSS layouts, limited support for modern CSS layout techniques. TCPDF: pure PHP, coordinate-based drawing API, excellent for precise, form-like layouts where exact positioning matters more than HTML/CSS convenience. wkhtmltopdf-based tools: drive an actual rendering engine for far more accurate CSS support, but require installing and maintaining a separate binary dependency on the server. Headless-browser-based PDF generation: the most accurate CSS and JavaScript support of any approach, at the cost of the heaviest runtime footprint and slowest generation time per document.
Security Considerations for Generated PDFs
Beyond the SSRF risk from remote image loading already discussed, a generated PDF embedding any user-supplied content (a customer-entered note on an invoice, a free-text field on a contract) must have that content properly escaped before being placed into the HTML fed to the PDF renderer, exactly as it would need escaping in a normal web page — an unescaped field is a stored XSS-equivalent risk specific to whatever renders the HTML, even though the end artifact is a PDF rather than a browser-displayed page. Access control on download links deserves equal care: a predictable or sequential invoice URL (/invoices/1042.pdf) without a server-side check that the requester is actually authorized to view that specific invoice is a common, serious information-disclosure bug, allowing one customer to view another's financial documents simply by guessing or incrementing an ID.
Accessibility in PDF Documents
PDF accessibility is frequently overlooked entirely — a PDF generated purely as a rasterized or unstructured layout is effectively invisible to screen readers, unlike a well-structured PDF with proper tagging that preserves reading order and labels for assistive technology. For documents intended for broad public distribution, particularly in jurisdictions with accessibility compliance requirements, this is worth deliberate attention rather than an afterthought discovered during a compliance audit.
Final Checklist Before Shipping a PDF Generation Feature
Is the template built specifically for print, not reused unmodified from a responsive web layout? Are page breaks and headers/footers handled explicitly for multi-page documents? Is any user-supplied content properly escaped before being rendered into the PDF? Is access to generated documents authorized per-document, not just per-feature? Is generation moved to a background queue for anything beyond a single, small, fast document? Are documents with legal or financial weight stored as-generated rather than regenerated on demand from data that may have since changed?
Testing PDF Generation Pipelines End to End
Beyond asserting on extracted text content, a useful end-to-end test confirms the whole pipeline from data to downloadable file works without errors across a range of realistic inputs — a customer with an unusually long name, an invoice with zero line items, an invoice with the maximum number of line items the UI allows — specifically because PDF rendering failures often only appear at these edges, not in the simple middle-of-the-road case a developer tests by hand during development.
public function testInvoiceWithManyLineItemsGeneratesWithoutError()
{
$invoice = Invoice::factory()->has(InvoiceLine::factory()->count(60))->create();
$pdf = (new InvoicePdfGenerator())->generate($invoice);
$this->assertNotEmpty($pdf);
$this->assertStringStartsWith('%PDF', $pdf);
}Closing Thought
PDF generation sits at an unusual intersection of design, data, and document-format technicalities, which is exactly why it tends to accumulate small, easily-overlooked bugs — a page break in the wrong place, a missing font for one customer's name, a security gap in how download links are authorized. None of these individually requires deep expertise to fix, but catching them before a customer notices requires testing with realistic, varied data rather than the one clean example used during initial development.
How This Plays Out Differently Across Document Volumes
A small consultancy generating a handful of invoices a month can reasonably generate PDFs synchronously, on request, with no queue at all — the added complexity of background processing would be solving a problem that does not yet exist for them. A platform issuing thousands of invoices in a single end-of-month billing run needs queued, rate-limited generation, careful monitoring for partial failures across the batch, and likely a separate worker pool dedicated to document generation so a PDF-heavy billing run does not starve other background work of resources. The right architecture genuinely depends on volume, and copying a pattern designed for one scale onto a problem at a very different scale is a common, avoidable source of either wasted engineering effort or production incidents.
What to Do When You Inherit a PDF Feature With No Tests
Untested PDF generation code is common, partly because testing visual output feels harder than testing plain data, as this guide has already addressed. The pragmatic first step when inheriting such code is not necessarily full visual regression testing immediately, but the text-extraction-based assertions shown earlier — cheap to write, and enough to catch the most damaging class of regression, where a code change silently breaks generation entirely or drops expected content from the output without anyone noticing until a customer complains about a broken or incomplete invoice.
A Final Word on Documents as a Trust Surface
An invoice or contract a business sends a customer carries real weight — it represents the business, and a broken or unprofessional-looking document quietly damages trust in a way that is hard to measure but real. The technical details in this guide are not academic; a PDF that lays out correctly, contains accurate data, and was generated through a properly secured and tested pipeline is part of how a business credibly presents itself, not just a backend implementation detail.