xWiki Search: Solr Configuration and Advanced Features

Search is the gateway to everything stored in your wiki. When users cannot find what they need within seconds, they stop trusting the platform and revert to siloed documents and email chains. xWiki relies on Apache Solr as its search backend, and while the default configuration works out of the box for small installations, scaling search to handle large wikis with hundreds of thousands of pages requires deliberate tuning. This guide from MassiveGRID covers the architecture, configuration, and optimization techniques that make xWiki search fast and relevant at scale.

How Solr Powers xWiki Search

xWiki delegates all full-text search operations to Apache Solr, an open-source search platform built on Apache Lucene. Every time a page is created, modified, or deleted, xWiki sends an indexing request to Solr, which tokenizes the content, builds an inverted index, and makes the document searchable. When a user types a query, xWiki translates it into a Solr query, retrieves ranked results, and renders them in the wiki interface. This decoupled architecture means you can tune search performance independently from the application server.

Embedded vs. Standalone Solr

By default, xWiki ships with an embedded Solr instance that runs inside the same JVM as the wiki application. This simplifies deployment for small teams, but it introduces a significant tradeoff: Solr and xWiki compete for the same heap memory, CPU cycles, and garbage collection pauses. For any wiki with more than a few thousand pages, switching to a standalone Solr server is strongly recommended.

Aspect	Embedded Solr	Standalone Solr
Deployment	Built into xWiki JVM	Separate process or server
Memory	Shares heap with xWiki	Dedicated heap allocation
Scalability	Limited to single instance	Supports SolrCloud clustering
Performance impact	Indexing can slow wiki responses	No impact on wiki application
Recommended for	Development, small teams (<1,000 pages)	Production, large wikis (1,000+ pages)

To switch to standalone mode, deploy a Solr server on the same host or a dedicated machine, then update xwiki.properties to point to the remote Solr URL. On MassiveGRID infrastructure, we typically deploy Solr on a separate VPS instance with dedicated CPU and memory, connected to the xWiki application server over a private network link for low-latency communication.

Configuring Solr for Large Wikis

The default Solr schema and configuration are designed for general use. For wikis with tens of thousands of pages or heavy attachment indexing, several parameters need adjustment. Start with the JVM heap: allocate at least 2 GB for Solr, and increase to 4-8 GB for wikis exceeding 100,000 documents. Configure the solrconfig.xml to increase the ramBufferSizeMB for faster indexing throughput and adjust mergeFactor to control how aggressively Solr consolidates index segments.

# Solr JVM heap allocation in solr.in.sh
SOLR_JAVA_MEM="-Xms2g -Xmx4g"

# In solrconfig.xml — increase RAM buffer for indexing
<ramBufferSizeMB>256</ramBufferSizeMB>

# Auto-commit settings for near-real-time search
<autoCommit>
  <maxTime>15000</maxTime>
  <openSearcher>false</openSearcher>
</autoCommit>

Custom Field Indexing

xWiki stores structured data in objects attached to pages — things like task status fields, project metadata, or custom application properties. By default, Solr indexes page content and titles, but you can extend the schema to index custom object fields as well. This enables users to search not just for text within pages but for specific metadata values, turning xWiki's search into a structured query engine for your internal applications. Define custom fields in the Solr schema, then configure xWiki's Solr indexer to map object properties to those fields.

Faceted Search Configuration

Faceted search lets users narrow results by category — filtering by wiki space, author, document type, modification date, or custom metadata. Solr supports faceting natively, and xWiki exposes several facet fields out of the box. To add custom facets, define facet fields in your Solr schema and configure the xWiki search UI to display them. Faceted navigation is especially valuable in large enterprise wikis where users need to scope their search to a specific department, project, or content type without crafting complex queries manually.

Search Relevance Tuning

Default relevance ranking uses Solr's TF-IDF algorithm, which works reasonably well for general content. However, in a wiki environment, you often want to boost certain signals: page titles should weigh more heavily than body content, recently modified pages should rank higher than stale ones, and pages in frequently accessed spaces may deserve a relevance boost. Configure field boosts in the Solr qf (query fields) parameter and add a recency boost function to the query handler.

<!-- In solrconfig.xml request handler -->
<str name="qf">title^10 content^1 filename^5 authorDisplay^2</str>
<str name="bf">recip(ms(NOW,date),3.16e-11,1,1)</str>

The configuration above gives page titles ten times the weight of body content and applies a decay function that progressively reduces the relevance score of older documents. Adjust these values based on your users' search behavior — if your wiki is a technical knowledge base, title matching matters most; if it is a project journal, recency dominates.

Monitoring Search Performance

Solr exposes a comprehensive metrics API and an administration dashboard that reports query latency, cache hit rates, indexing throughput, and segment counts. Monitor the average query response time — it should stay below 200 milliseconds for a responsive user experience. Watch the filter cache and query result cache hit ratios; if they drop below 80%, consider increasing cache sizes. For production wikis, export Solr metrics to your monitoring stack (Prometheus, Grafana, or similar) so you can set alerts on latency spikes before users notice degraded search quality.

Troubleshooting Common Search Issues

The most frequent complaint is "search returns nothing for content I know exists." This usually means the Solr index is out of sync with the database. Trigger a full reindex from xWiki's Administration panel under Search > Solr. For large wikis, a full reindex can take hours — schedule it during off-peak hours and monitor Solr's indexing queue. Another common issue is high memory consumption during indexing of large attachments (PDFs, office documents). Configure Solr's extractionHandler to limit the maximum file size for content extraction, and consider whether indexing binary attachments is truly necessary for your use case.

If search queries are slow despite adequate hardware, examine the query logs for expensive wildcard patterns. Leading wildcard queries (e.g., *configuration) bypass the inverted index entirely and force a sequential scan. Educate users on effective search syntax, or configure Solr's NGramFilterFactory to support partial matching without the performance penalty of leading wildcards.

Scaling Search with MassiveGRID

Deploying Solr on MassiveGRID's xWiki-optimized infrastructure gives you the flexibility to allocate dedicated compute and memory to your search backend. For high-availability requirements, we support SolrCloud deployments with replicated shards across multiple nodes, ensuring that search remains available even during node failures. Our managed cloud servers handle the operational complexity of Solr clustering so your team can focus on building a wiki that people actually use. You may also want to review our backup and disaster recovery guide to ensure your Solr indexes are protected alongside your wiki data.

A fast, accurate search experience is what separates a wiki people rely on from one they abandon. To deploy xWiki with a properly tuned Solr backend on infrastructure built for search-heavy workloads, explore our xWiki Hosting solutions or reach out to our team for architecture guidance.

Published by MassiveGRID — enterprise-grade hosting for xWiki deployments that demand performance and reliability.