PHP Excel Import and Export: Handling Bulk Data Reliably

19 Jun 2026
Darwinbark Team
Web Development

Why "Just Parse the CSV" Undersells the Problem

Letting users bulk-upload data via a spreadsheet, or export records to Excel for offline analysis, sounds like a small feature. In practice it touches file format quirks, data validation at scale, memory limits, and what to do when row 4,000 of 5,000 fails validation but the first 3,999 were fine. This guide covers building both directions — import and export — properly.

CSV Versus XLSX: Not the Same Problem

A CSV file is plain text with no formatting, no multiple sheets, and a deceptively simple-looking structure that still has real edge cases (quoted fields containing commas, embedded newlines, inconsistent encodings). An XLSX file is actually a ZIP archive containing several XML files describing sheets, styles, and shared strings — far more capable, but requiring a real parsing library rather than a simple line-by-line read. PhpSpreadsheet is the standard PHP library that handles both formats through one consistent API.

Exporting Data: The Memory Trap

The most common bug in PHP Excel export code is loading every row into memory before writing anything, which works fine in development with 50 test rows and then exhausts memory or times out in production with 50,000 real rows. The fix is streaming writers that flush rows to disk incrementally rather than building the entire spreadsheet in memory first:

$writer = new Xlsx($spreadsheet);
$writer->setPreCalculateFormulas(false);
// PhpSpreadsheet's default writer still builds in memory;
// for very large exports, query and write in chunks instead:
$sheet = $spreadsheet->getActiveSheet();
$row = 2;
Order::query()->chunk(1000, function ($orders) use ($sheet, &$row) {
    foreach ($orders as $order) {
        $sheet->setCellValue("A{$row}", $order->id);
        $sheet->setCellValue("B{$row}", $order->customer_name);
        $sheet->setCellValue("C{$row}", $order->total);
        $row++;
    }
});

chunk() pulls records from the database in manageable batches instead of one giant query, which matters just as much for memory as the spreadsheet-writing side does.

Importing Data: Validate Before You Insert Anything

The single most important decision in an import feature is what happens when some rows are valid and others are not. Inserting valid rows as you go and only reporting errors at the end leaves the database in a half-imported, ambiguous state if the user needs to fix and re-upload. The safer pattern: validate every row first, collect all errors, and only commit to the database if the whole batch passes (or with an explicit per-row partial-import mode the user opts into, with clear reporting of exactly which rows succeeded and which didn't):

$errors = [];
$validRows = [];
foreach ($rows as $i => $row) {
    $validator = Validator::make($row, [
        'email' => 'required|email',
        'amount' => 'required|numeric|min:0',
    ]);
    if ($validator->fails()) {
        $errors[] = "Row " . ($i + 2) . ": " . $validator->errors()->first();
    } else {
        $validRows[] = $row;
    }
}
if (!empty($errors)) {
    return back()->withErrors($errors); // nothing inserted yet
}
DB::transaction(function () use ($validRows) {
    foreach ($validRows as $row) {
        Order::create($row);
    }
});

Wrapping the actual insert in a transaction means that if something unexpected fails partway through the insert loop itself, the whole batch rolls back rather than leaving a partially-imported mess.

Handling Large Imports Without Timing Out

A web request has a practical time limit, and parsing plus validating tens of thousands of rows synchronously during that request risks hitting it. For large imports, the better pattern is: accept the upload immediately, queue the actual parsing and import as a background job, and let the user check back (or get notified) once it completes — the same pattern used for slow PDF generation, applied here.

Giving Useful Feedback on Failure

"Import failed" with no further detail is the single most common usability failure of bulk-upload features. Always report which specific row and which specific field caused a problem, in terms the user who built the spreadsheet will understand — "Row 47: email is not a valid email address" is actionable; a stack trace or a generic error is not.

Closing Thought

Excel import and export look like a thin wrapper around a parsing library, and the library handles the file-format complexity well. What it does not handle for you is the validation strategy, the memory management for large datasets, and the decision about synchronous versus background processing — all of which determine whether the feature holds up once real users start uploading real, messy spreadsheets instead of clean test data.

Need a bulk import or export feature built to handle real-world messy data? We can build it properly.

Handling Different Header Conventions

Not every spreadsheet a user uploads will have headers matching your exact expected column names — "Email" versus "email address" versus "E-Mail". A naive import that expects exact header matches breaks the moment a user's spreadsheet uses slightly different wording. A more forgiving approach normalizes headers (lowercase, trim whitespace, strip punctuation) before matching them against expected fields, and offers a mapping step in the UI for anything that cannot be matched automatically — let the user confirm "this column means email" rather than silently guessing or silently failing.

Detecting and Handling Duplicate Rows

Bulk imports frequently contain accidental duplicates — the same customer record exported twice, a row pasted in twice by mistake. Decide explicitly what "duplicate" means for your data (same email? same order number?) and check for it before inserting, rather than discovering duplicate records days later when someone notices two identical orders in the system.

$existingEmails = Customer::pluck('email')->flip();
foreach ($validRows as $row) {
    if (isset($existingEmails[$row['email']])) {
        $duplicates[] = $row;
        continue;
    }
    Customer::create($row);
}

Formula Cells: A Trap for the Unwary

A spreadsheet cell can contain a formula rather than a literal value, and reading it incorrectly returns the formula text ("=SUM(A1:A10)") instead of the calculated number a user actually sees on screen. PhpSpreadsheet supports reading either the formula or its calculated value explicitly — for import purposes, you almost always want the calculated value, and forgetting to specify this produces confusing, silently wrong imported data rather than an obvious error.

$value = $cell->getCalculatedValue(); // not getValue(), which returns the raw formula

Exporting with Formatting That Matches User Expectations

An exported spreadsheet that dumps raw values with no formatting — dates as serial numbers, currency as plain floats, no column widths or header styling — technically contains the right data but looks unprofessional and is harder to read than a polished export. A small amount of formatting effort (bold headers, currency number formats, auto-sized columns) meaningfully changes how a client perceives the quality of an exported report, for very little extra code.

Streaming Large File Uploads Instead of Loading Them Whole

A multi-megabyte spreadsheet upload, read entirely into memory before any processing begins, risks the same memory exhaustion problems covered for image processing earlier. PhpSpreadsheet supports a read filter that lets you process a file row by row without loading the entire parsed structure into memory at once — essential once import files routinely exceed a few thousand rows:

class ChunkReadFilter implements IReadFilter {
    private int $startRow; private int $endRow;
    public function setRows(int $start, int $chunkSize): void {
        $this->startRow = $start; $this->endRow = $start + $chunkSize;
    }
    public function readCell($column, $row, $worksheetName = ''): bool {
        return $row >= $this->startRow && $row <= $this->endRow;
    }
}

Import Templates: Giving Users a Correct Starting Point

A significant share of import errors come from users guessing at the expected format rather than being told explicitly. Providing a downloadable template spreadsheet with correct headers, a couple of example rows, and basic data-validation dropdowns built into the spreadsheet itself (Excel supports cell-level dropdown validation) meaningfully reduces the error rate on real imports, since many formatting mistakes are prevented before the file is ever uploaded rather than caught and reported after the fact.

Audit Trails for Bulk Operations

An import that creates or modifies hundreds of records at once is exactly the kind of operation that needs a clear audit trail — which user ran this import, when, how many rows were affected, and ideally a way to identify or even reverse the specific batch if a mistake is discovered afterward. Tagging every record created or updated by a given import with a batch identifier makes "undo this specific import" a realistic operation rather than an impossible one once thousands of records have been mixed in with pre-existing data.

foreach ($validRows as $row) {
    Order::create(array_merge($row, ['import_batch_id' => $batchId]));
}
// later, if needed:
Order::where('import_batch_id', $batchId)->delete();

Exporting Filtered, Permission-Aware Data

An export feature needs to respect exactly the same authorization rules as the screens displaying that data — a common mistake is building an export endpoint that pulls data with fewer restrictions than the corresponding list view, on the assumption that "it's just an export." If a user cannot see another department's records in the UI, an export triggered from the same screen must not include them either; treating export as a separate, less-guarded code path is how sensitive data quietly leaks through a feature nobody thought to apply the same access checks to.

Case Study: A Bulk Import That Silently Overwrote Good Data With Blanks

An HR platform let admins bulk-update employee records via spreadsheet upload, matching rows to existing employees by employee ID. A well-meaning admin exported the current employee list, intended to edit only the salary column for a handful of people, and accidentally also deleted the contents of several other columns while editing in their spreadsheet software before re-uploading. The import dutifully treated every blank cell as "set this field to empty," overwriting phone numbers and emergency contact information for dozens of employees who were never meant to be touched. The platform had no concept of partial updates — an uploaded row replaced every field, blank or not, rather than only updating fields the admin had actually changed. The fix introduced a distinction between "this cell is blank, meaning clear the field" and "this cell was not included in the update at all, meaning leave the existing value alone," surfaced to admins as an explicit checkbox: "treat blank cells as no change" versus "treat blank cells as clearing the field" — giving the admin control over behavior that had previously been an unannounced, irreversible default.

A Glossary for This Topic

Idempotent import: re-running the same import file produces the same end result rather than creating duplicate records each time. Dry run: validating and previewing what an import would do without actually committing any changes, letting a user catch mistakes before they happen. Upsert: update a record if a matching one exists, otherwise insert a new one — a common requirement for imports that should handle both new and existing records in one pass.

Frequently Asked Questions

Should I support both CSV and XLSX upload, or just one? Supporting both costs little extra code with a library like PhpSpreadsheet that already handles both formats, and avoids forcing non-technical users to figure out how to convert between formats themselves.

How large a file should an import feature support before requiring a background job? There is no universal number, but as a rule of thumb, anything likely to take longer than a few seconds to fully validate and import — often in the low thousands of rows, depending on validation complexity — is worth moving to a background job rather than risking a timed-out request.

What is the safest default for handling validation errors in a large import? Reject the entire batch and report every error clearly, rather than silently skipping bad rows and importing the rest — a partial import that the uploader does not realize was partial is a worse outcome than a clearly failed import they can fix and retry.

Step-by-Step: Building a Reliable Bulk Customer Import Feature

Step one: provide a downloadable template with the exact expected headers and a couple of filled-in example rows, so most uploads start from a known-correct format rather than a guess. Step two: on upload, parse using a chunked read filter so memory use stays flat regardless of file size. Step three: normalize headers (case, whitespace, common synonyms) and present any unmatched columns to the user for manual mapping rather than silently ignoring or misreading them. Step four: validate every row fully before inserting anything, collecting every error with its specific row number and field, and stopping short of any database writes if the batch contains failures, unless the user has explicitly opted into a partial-import mode. Step five: check for duplicates against existing records using whatever field actually defines uniqueness for your data, and surface duplicates to the user rather than silently skipping or silently overwriting them. Step six: tag every created or updated record with a batch identifier, enabling a clean undo if the import turns out to have been a mistake. Step seven: for large files, perform the entire import in a queued background job, notifying the user with a summary (rows succeeded, rows failed, with details) once it completes rather than making them wait on the upload request itself.

A Comparison Table: Import Strategies at a Glance

All-or-nothing batch import: simplest to reason about, safest against partial-failure confusion, but a single bad row blocks the entire file until fixed. Partial import with per-row reporting: more forgiving for large files where a handful of bad rows shouldn't block the rest, but requires very clear reporting so users understand exactly what did and did not get imported. Upsert-based import: handles both new and existing records in one pass, convenient for periodic re-imports of the same dataset, but requires a reliable, explicit field to match existing records against, or it risks creating unwanted duplicates.

Security Considerations for Bulk Import Features

An import endpoint accepting arbitrary spreadsheet files is processing untrusted input at scale, and a few specific risks deserve attention beyond ordinary input validation. Formula injection (sometimes called CSV injection) occurs when a cell value beginning with characters like =, +, or @ is later opened in spreadsheet software and interpreted as a formula rather than as text — relevant both for files you accept (if you ever re-display imported cell values inside another spreadsheet) and, just as importantly, for files you export, where data copied verbatim from user-controlled fields (a customer name, a free-text note) could itself be a malicious formula a recipient's spreadsheet software executes when they open your export. Prefixing any cell value starting with a formula-triggering character with a leading apostrophe or single quote before writing it neutralizes this on export.

Resource Exhaustion Through Crafted Spreadsheets

A maliciously crafted spreadsheet can declare an enormous number of rows or columns, or reference an excessive number of styles, in a way that causes a naive parser to allocate far more memory than the file's actual content would suggest — a denial-of-service angle specific to spreadsheet parsing libraries. Enforcing a maximum file size, and where the library supports it, limiting the maximum row/column count processed, protects against this without meaningfully restricting legitimate use.

Final Checklist Before Shipping a Bulk Import or Export Feature

Does the import use a chunked or streaming reader rather than loading the entire file into memory? Is every row validated, with clear per-row error reporting, before any database writes happen? Are duplicates detected against a clearly defined uniqueness rule rather than silently allowed or silently dropped? Is every import batch tagged with an identifier enabling audit and undo? Does export respect the exact same authorization rules as the equivalent list view? Is exported data protected against formula injection for any field containing user-controlled text?

Testing a Bulk Import Pipeline

The most valuable tests for an import feature are the ones exercising malformed or boundary-case input, not the clean-file happy path, since real-world uploaded spreadsheets are reliably messier than whatever clean fixture a developer first tests with.

public function testImportRejectsEntireBatchIfAnyRowInvalid()
{
    $file = $this->spreadsheetWithRows([
        ['email' => 'valid@example.com', 'amount' => 100],
        ['email' => 'not-an-email', 'amount' => 50],
    ]);
    $result = (new BulkImporter())->import($file);
    $this->assertFalse($result->succeeded());
    $this->assertCount(0, Order::all());
    $this->assertStringContainsString('Row 3', $result->errors()[0]);
}

public function testImportDetectsDuplicateEmailsWithinSameFile()
{
    $file = $this->spreadsheetWithRows([
        ['email' => 'same@example.com', 'amount' => 100],
        ['email' => 'same@example.com', 'amount' => 200],
    ]);
    $result = (new BulkImporter())->import($file);
    $this->assertTrue($result->hasDuplicates());
}

Closing Thought

Bulk import and export features look like thin wrappers around a parsing library because, mechanically, they mostly are — the genuine engineering work is in the validation strategy, memory-safe handling of large files, clear and specific error reporting, and the security considerations around formula injection and authorization that are easy to skip when a feature is built quickly under the assumption that "it's just CSV." Building these properly the first time is what separates an import feature that quietly handles years of real, messy business data from one that generates a steady trickle of confusing support tickets.

How This Plays Out Differently Across Data Volumes

An internal tool importing a few dozen rows from an occasional spreadsheet upload does not need chunked reading, background queues, or batch-tagging for undo — straightforward synchronous validation and insertion is appropriately simple for that scale. A platform accepting customer-uploaded catalogs of tens of thousands of products needs every technique covered in this guide: streaming reads, background processing, careful duplicate detection, and an audit trail robust enough to undo a bad import cleanly. The lesson, consistent with the other features covered in this series, is that good engineering judgment is as much about correctly sizing the solution to the actual problem as it is about knowing the individual techniques in the first place.

What to Do When You Inherit an Import Feature With No Validation

An existing bulk-import feature that "just inserts whatever is in the spreadsheet" is a common find in older business applications, and tightening it retroactively without breaking existing user workflows takes care — start by adding validation that logs warnings without rejecting anything, observe what real uploaded files actually contain, and only then turn rejection on once you understand the realistic shape of the data flowing through it, rather than guessing at validation rules and breaking legitimate uploads that do not match your assumptions.

A Final Word on Bulk Data Operations and Trust

Import and export features sit at a point where a single mistake can affect hundreds or thousands of records at once, which is a fundamentally different risk profile from a single-record form submission. The extra care this guide recommends — validation before any writes, audit trails, careful authorization on export — is proportionate to that risk, not excessive caution. A bulk operation that goes wrong quietly, with no audit trail and no way to undo it, is one of the more painful categories of incident a backend team can face, precisely because of how much data it can touch in one pass.

Web Developer

Mobile Apps

SaaS Products

WhatsApp Solutions

Mobile Apps

Web Platforms

PHP Excel Import and Export: Handling Bulk Data Reliably

Why "Just Parse the CSV" Undersells the Problem

CSV Versus XLSX: Not the Same Problem

Exporting Data: The Memory Trap

Importing Data: Validate Before You Insert Anything

Handling Large Imports Without Timing Out

Giving Useful Feedback on Failure

Closing Thought

Handling Different Header Conventions

Detecting and Handling Duplicate Rows

Formula Cells: A Trap for the Unwary

Exporting with Formatting That Matches User Expectations

Streaming Large File Uploads Instead of Loading Them Whole

Import Templates: Giving Users a Correct Starting Point

Audit Trails for Bulk Operations

Exporting Filtered, Permission-Aware Data

Case Study: A Bulk Import That Silently Overwrote Good Data With Blanks

A Glossary for This Topic

Frequently Asked Questions

Step-by-Step: Building a Reliable Bulk Customer Import Feature

A Comparison Table: Import Strategies at a Glance

Security Considerations for Bulk Import Features

Resource Exhaustion Through Crafted Spreadsheets

Final Checklist Before Shipping a Bulk Import or Export Feature

Testing a Bulk Import Pipeline

Closing Thought

How This Plays Out Differently Across Data Volumes

What to Do When You Inherit an Import Feature With No Validation

A Final Word on Bulk Data Operations and Trust

Related Post

Blog Categories

Related Posts