Full-Text Search in PHP: From MySQL to Elasticsearch

19 Jun 2026
Darwinbark Team
Web Development

Why LIKE '%term%' Is Not Search

The most common first attempt at search in a PHP application is a WHERE title LIKE '%term%' query. It works for a demo, and it fails almost every real search expectation: no relevance ranking (an exact match ranks the same as a tangential partial match), no handling of typos or word variations, and devastating performance on large tables since a leading-wildcard LIKE cannot use a standard index at all, forcing a full table scan on every search. Understanding the real options above this baseline is worth doing before search becomes a user-facing pain point.

Step One: MySQL's Built-In Full-Text Search

Before reaching for a separate search engine, MySQL's own full-text indexes solve a meaningful chunk of the problem with no new infrastructure:

ALTER TABLE products ADD FULLTEXT INDEX ft_search (name, description);

SELECT *, MATCH(name, description) AGAINST (:term IN NATURAL LANGUAGE MODE) AS relevance
FROM products
WHERE MATCH(name, description) AGAINST (:term IN NATURAL LANGUAGE MODE)
ORDER BY relevance DESC;

This already gives real relevance ranking and proper index usage, a significant upgrade over LIKE for a small to mid-sized table, with zero additional services to deploy and maintain.

Where MySQL Full-Text Search Stops Being Enough

MySQL's full-text search lacks fuzzy matching for typos, has limited support for language-specific stemming (matching "running" to a search for "run"), cannot easily facet results (count how many matches fall into each category alongside the results), and its performance and ranking quality degrade as both data volume and query complexity grow. These limits are exactly where dedicated search engines earn their additional operational cost.

Step Two: Elasticsearch and the Inverted Index

Elasticsearch (and similar engines like Meilisearch or Typesense, which trade some of Elasticsearch's power for simpler operation) build an inverted index — a structure mapping each word to every document containing it — purpose-built for fast, ranked text search across large datasets, with built-in support for typo tolerance, stemming, synonyms, and faceted filtering.

$response = $client->search([
    'index' => 'products',
    'body' => [
        'query' => ['match' => ['description' => $term]],
        'highlight' => ['fields' => ['description' => new \stdClass()]],
    ],
]);

The Real Cost: Keeping Two Data Stores in Sync

The part of adopting a dedicated search engine that catches teams off guard is not the search queries themselves but keeping the search index synchronized with the actual source of truth, your primary database. Every create, update, and delete of a searchable record needs to propagate to the search index, typically through a queued job triggered by model events, rather than a risky assumption that the index will somehow stay in sync on its own:

class Product extends Model {
    protected static function booted() {
        static::saved(fn($product) => SyncProductToSearchIndex::dispatch($product));
        static::deleted(fn($product) => RemoveProductFromSearchIndex::dispatch($product->id));
    }
}

Without this discipline, search results gradually drift from reality — deleted products that still appear in search, updated prices that show stale values — a quiet, slow-building bug class specific to systems with a separate search index.

Relevance Tuning: Search Quality Is a Continuous Effort, Not a One-Time Setup

Out-of-the-box relevance ranking is a reasonable starting point, not a finished product. Real search quality work involves boosting certain fields (a match in a product name probably matters more than a match buried in a long description), handling synonyms specific to your domain, and reviewing what users actually search for versus what results they click on — data that reveals gaps a generic ranking algorithm cannot anticipate on its own.

When to Make the Jump From MySQL to a Dedicated Engine

A reasonable rule of thumb: stay on MySQL full-text search as long as your dataset is moderate in size, your ranking needs are basic, and search is a secondary feature rather than core to the product. Move to a dedicated engine once search becomes a primary user-facing feature (an e-commerce catalog, a content platform where discovery is central), once typo tolerance or faceted filtering become real user expectations, or once query performance on MySQL full-text search starts noticeably degrading as data grows.

Closing Thought

Search is one of the clearest examples in web development of a feature that looks simple at small scale and reveals real depth as both data volume and user expectations grow. Starting with MySQL's built-in full-text capability and graduating to a dedicated search engine only once its limits are actually felt is a sound, low-risk progression — far better than either settling permanently for LIKE queries or over-engineering a small site with a full search cluster it does not yet need.

Need real search built into your platform, done right from the start? Let's talk.

Autocomplete and Search-as-You-Type

Modern user expectations increasingly include instant suggestions as a user types, not just results after submitting a full query. This requires sub-100-millisecond response times, which rules out a full-text query against a large table on every keystroke. Dedicated search engines support purpose-built autocomplete indexes (prefix matching structures) designed for exactly this latency requirement, while a MySQL-based approach typically needs to fall back to a smaller, pre-filtered dataset (recent searches, popular terms) to stay fast enough to feel instant.

$response = $client->search([
    'index' => 'products',
    'body' => ['suggest' => ['product-suggest' => [
        'prefix' => $partialTerm,
        'completion' => ['field' => 'suggest', 'size' => 5],
    ]]],
]);

Faceted Search: Filtering Alongside Free-Text Search

Real search interfaces rarely offer just a text box — they pair it with filters (price range, category, brand) and show how many results exist within each filter option, updated live as the search term or other filters change. This is meaningfully harder to build well on top of plain SQL than it is with a search engine designed around aggregations specifically for this purpose:

'aggs' => [
    'by_category' => ['terms' => ['field' => 'category_id']],
    'by_brand' => ['terms' => ['field' => 'brand_id']],
],

Handling Typos and Synonyms Without Frustrating Users

A search for "shoos" should reasonably return results for "shoes," and a search for "sneakers" probably should include results tagged "trainers" in markets where that is the common term. Fuzzy matching (edit-distance tolerance) handles the former; an explicit synonym list, maintained for your specific domain, handles the latter — neither comes free with a naive setup, and both meaningfully affect whether users perceive your search as actually working or as frustratingly literal.

Search Analytics: Learning From What Users Actually Search For

Logging every search query, whether it returned results, and what (if anything) the user clicked on afterward turns search from a black box into a source of real product insight — queries that consistently return zero results reveal either missing content or a synonym gap worth addressing; queries with results but no clicks reveal a ranking problem where the right content exists but is not surfacing near the top. Without this logging, search quality problems are invisible until a user happens to complain, which catches only a small fraction of the people who silently gave up on a frustrating search and left.

SearchLog::create([
    'query' => $term, 'result_count' => $results->total(),
    'clicked_result_id' => null, // updated later if a result is clicked
    'user_id' => $user?->id,
]);

Indexing Strategy: What to Include and How Often to Update It

Not every field on a database record belongs in a search index — including too much (internal notes, unrelated metadata) dilutes relevance and slows indexing; including too little misses content users reasonably expect to be searchable. Equally important is deciding how fresh the index needs to be: near-real-time updates for content where staleness is visibly wrong (a product just marked out of stock still showing as available in search), versus a periodic batch reindex for less time-sensitive content, which is simpler to operate and sufficient for many use cases.

Search Performance Under Load

A search feature that performs well in testing with a small dataset can behave very differently once a production index holds millions of documents and faces real concurrent query load. Both MySQL full-text indexes and dedicated search engines need attention to resource allocation (memory for the search engine's caches, appropriate index sharding for very large datasets) as data grows — a search infrastructure sized for launch-day data volume is a common source of degraded search performance well before the rest of the application shows any strain.

Case Study: A Search Box That Quietly Lost the Business Sales

An online retailer noticed that a popular product, heavily promoted in marketing campaigns under the name "trainers," returned zero results when customers searched for "sneakers" — the term used in the product catalog itself. Analytics later revealed this single missing synonym had been silently costing a measurable share of search-driven sales for months, invisible in any dashboard that simply tracked "searches performed" without tracking "searches that returned zero results." Once a basic search-analytics logging system (exactly the kind described in this guide) was added, the zero-result-query report immediately surfaced the gap, and a short synonym list fixed it within a day. The deeper lesson: search quality problems are frequently invisible without deliberate logging, because users experiencing a failed search rarely complain — they simply leave, and the business never learns why.

A Glossary for This Topic

Inverted index: a data structure mapping each word to the documents containing it, the foundational structure behind fast full-text search engines. Stemming: reducing a word to its root form (running → run) so a search matches grammatical variations of the same underlying word. Relevance score: a numeric measure of how well a document matches a query, used to rank results from most to least relevant rather than returning them in an arbitrary order. Faceted search: presenting filterable categories (price range, brand, category) alongside search results, typically with counts showing how many results fall into each facet.

Frequently Asked Questions

Is Elasticsearch overkill for a small website? For a small catalog or content set, MySQL full-text search is usually sufficient and avoids the operational overhead of running and maintaining a separate search cluster; Elasticsearch earns its complexity once search becomes a primary feature at meaningful scale.

How do I keep a separate search index from drifting out of sync with my database? Trigger reindexing through model events on every create/update/delete, ideally via a queued job, and run a periodic full reindex as a safety net to catch anything the event-based sync might have missed.

Can search results be personalized per user? Yes, and increasingly users expect it — boosting results based on a user's past purchase or browsing history is a meaningful improvement over one-size-fits-all ranking, though it adds real complexity and is usually worth tackling only after basic relevance and synonym handling are already solid.

Step-by-Step: Migrating From LIKE Queries to Real Search

Step one: identify the actual pain points with the current search — missing relevance ranking, slow queries, no typo tolerance — rather than migrating speculatively without a clear problem to solve. Step two: add a MySQL full-text index as a low-cost first upgrade, and measure whether it resolves the identified pain points before considering anything more complex. Step three: if gaps remain (typo tolerance, faceting, very large dataset performance), evaluate dedicated search engines, choosing based on operational complexity tolerance — Meilisearch or Typesense for simpler operation, Elasticsearch for maximum capability at higher operational cost. Step four: build the index-synchronization layer (model event listeners dispatching queued sync jobs) before migrating any real search traffic, and verify it stays correctly in sync under realistic write load. Step five: run the new search engine in parallel with the old search, comparing result quality on real historical queries, before fully cutting over user-facing search traffic. Step six: add search analytics logging from day one of the new system, so future relevance problems are visible rather than silently costing conversions.

A Comparison Table: Search Backends at a Glance

MySQL full-text index: zero new infrastructure, solid relevance for small-to-medium datasets, no typo tolerance or advanced faceting. Meilisearch/Typesense: purpose-built for search with simpler operations than Elasticsearch, strong typo tolerance and instant search support, smaller ecosystem and fewer advanced features than Elasticsearch. Elasticsearch: the most capable and most widely adopted dedicated search engine, supports complex aggregations and massive scale, meaningfully higher operational and learning overhead.

Security Considerations Checklist

Sanitize search query input before using it in any query construction, even with parameterized queries, since search-specific features (special syntax for exact phrases, wildcards) can introduce injection-adjacent risks if handled carelessly. Rate limit search endpoints, since unauthenticated search is an easy target for scraping or denial-of-service style abuse against an expensive backend operation. If search results include data with access restrictions (private listings, draft content), ensure the search index itself respects those restrictions rather than indexing and exposing everything indiscriminately regardless of the searching user's actual permissions.

Accessibility Considerations

Search result pages need the same semantic structure as any content listing — proper heading hierarchy, descriptive link text rather than generic "click here" results, and clear indication of result count and any active filters for screen reader users. Autocomplete suggestion dropdowns specifically need careful keyboard navigation support (arrow keys to move through suggestions, Enter to select, Escape to dismiss) since a mouse-only autocomplete implementation excludes keyboard and screen-reader users from a feature sighted mouse users take for granted.

How This Plays Out at Different Scales

A small content site with a few hundred pages can rely entirely on MySQL full-text search with no real performance concerns. A growing e-commerce catalog with tens of thousands of products starts to feel real pressure for faceted filtering and typo tolerance that justify a dedicated search engine. A large marketplace or content platform with millions of searchable items needs to think seriously about index sharding, query caching, and dedicated search infrastructure capacity planning as a first-class part of overall system architecture, not an afterthought bolted onto the main application database.

What to Do When You Inherit a Site With Only LIKE-Based Search

Inheriting a search feature built entirely on LIKE '%term%' queries against a now-large table is a common, fixable situation, not a reason for an immediate full search-engine migration. The first step is almost always adding a MySQL full-text index alongside the existing query path, running both in parallel temporarily, and comparing result quality and query performance directly before removing the old LIKE-based code. This often resolves the most painful symptoms (slow queries, poor ranking) immediately and cheaply, and gives you real evidence about whether the remaining gaps (typo tolerance, faceting) actually justify the added operational cost of a dedicated search engine, rather than assuming a full Elasticsearch migration is automatically the right next step without that evidence.

Final Checklist Before Shipping a Search Feature

Relevance ranking confirmed reasonable on real representative queries, not just contrived test cases. Search analytics logging in place from day one. Synonym handling reviewed for known domain-specific terms. Index synchronization with the source database verified under realistic concurrent write load. Performance tested with production-scale data volume, not just a small development dataset. Faceted filters (if present) verified to update counts correctly as filters are combined.

Closing Thought, Revisited

Search quality is one of the easiest things to under-invest in because a mediocre search feature does not produce visible errors — it simply, quietly loses users and conversions to a frustrating experience that no error log ever captures. The search analytics logging covered in this guide is the single highest-leverage addition for catching that invisible cost, turning a guessing game about search quality into a measurable, improvable product surface backed by real query and click data rather than assumption.

Handling Multi-Word and Phrase Search Correctly

Users frequently expect quoted phrases ("red running shoes") to match as an exact sequence rather than as three independently matched words scattered anywhere across a document. Supporting this distinction — natural language mode for general queries, boolean/phrase mode when a query is quoted — meaningfully improves perceived search quality for the subset of users who already know what specific thing they are looking for and are frustrated by overly loose matching that buries their exact target under loosely related results.

SELECT * FROM products WHERE MATCH(description) AGAINST ('"red running shoes"' IN BOOLEAN MODE);

Web Developer

Mobile Apps

SaaS Products

WhatsApp Solutions