Implement sharding with remote federated search

Sharding is the process of splitting an index containing many documents into multiple smaller indexes, often called shards. This horizontal scaling technique is useful when handling large databases. In Meilisearch, the best way to implement a sharding strategy is to use remote federated search. This guide walks you through activating the /network route, configuring the network object, and performing remote federated searches. Enterprise Edition Sharding is an Enterprise Edition feature. You are free to use it for evaluation purposes. Please reach out to us before using it in production.

Configuring multiple instances

To minimize issues and limit unexpected behavior, instance, network, and index configuration should be identical for all shards. This guide describes the individual steps you must take on a single instance and assumes you will replicate them across all instances.

Prerequisites

Multiple Meilisearch projects (instances) running Meilisearch >=v1.19

Activate the `/network` endpoint

Meilisearch Cloud

If you are using Meilisearch Cloud, contact support to enable this feature in your projects.

Self-hosting

Use the /experimental-features route to enable network:

curl \
  -X PATCH 'MEILISEARCH_URL/experimental-features/' \
  -H 'Content-Type: application/json'  \
  --data-binary '{
    "network": true
  }'

Meilisearch should respond immediately, confirming the route is now accessible. Repeat this process for all instances.

Configuring the network object

Next, you must configure the network object. It consists of the following fields:

remotes: defines a list with the required information to access each remote instance
self: specifies which of the configured remotes corresponds to the current instance
sharding: whether to use sharding.

Setting up the list of remotes

Use the /network route to configure the remotes field of the network object. remotes should be an object containing one or more objects. Each one of the nested objects should consist of the name of each instance, associated with its URL and an API key with search permission:

curl \
  -X PATCH 'MEILISEARCH_URL/network' \
  -H 'Content-Type: application/json' \
  --data-binary '{
    "remotes": {
      "REMOTE_NAME_1": {
        "url": "INSTANCE_URL_1",
        "searchApiKey": "SEARCH_API_KEY_1"
      },
      "REMOTE_NAME_2": {
        "url": "INSTANCE_URL_2",
        "searchApiKey": "SEARCH_API_KEY_2"
      },
      "REMOTE_NAME_3": {
        "url": "INSTANCE_URL_3",
        "searchApiKey": "SEARCH_API_KEY_3"
      },
      …
    }
  }'

Configure the entire set of remote instances in your sharded database, making sure to send the same remotes to each instance.

Specify the name of the current instance

Now all instances share the same list of remotes, set the self field to specify which of the remotes corresponds to the current instance:

curl \
  -X PATCH 'MEILISEARCH_URL/network' \
  -H 'Content-Type: application/json' \
  --data-binary '{
    "self": "REMOTE_NAME_1"
  }'

Meilisearch processes searches on the remote that corresponds to self locally instead of making a remote request.

Enabling sharding

Finally enable the automatic sharding of documents by Meilisearch on all instances:

curl \
  -X PATCH 'MEILISEARCH_URL/network' \
  -H 'Content-Type: application/json' \
  --data-binary '{
    "sharding": true
  }'

Adding or removing an instance

Changing the topology of the network involves moving some documents from an instance to another, depending on your hashing scheme. As Meilisearch does not provide atomicity across multiple instances, you will need to either:

accept search downtime while migrating documents
accept some documents will not appear in search results during the migration
accept some duplicate documents may appear in search results during the migration

Reducing downtime

If your disk space allows, you can reduce the downtime by applying the following algorithm:

Create a new temporary index in each remote instance
Compute the new instance for each document
Send the documents to the temporary index of their new instance
Once Meilisearch has copied all documents to their instance of destination, swap the new index with the previously used index
Delete the temporary index after the swap
Update network configuration and search queries across all instances

Create indexes

Create the same empty indexes with the same settings on all instances. Keeping the settings and indexes in sync is important to avoid errors and unexpected behavior, though not strictly required.

Add documents

Pick a single instance to send all your documents to. Documents will be replicated to the other instances. Each instance will index the documents they are responsible for and ignore the others. You may send send the same document to multiple instances, the task will be replicated to all instances, and only the instance responsible for the document will index it. Similarly, you may send any future versions of any document to the instance you picked, and only the correct instance will process that document.

Updating index settings

Changing settings in a sharded database is not fundamentally different from changing settings on a single Meilisearch instance. If the update enables a feature, such as setting filterable attributes, wait until all changes have been processed before using the filter search parameter in a query. Likewise, if an update disables a feature, first remove it from your search requests, then update your settings.

Perform a search

Send your federated search request containing one query per instance:

curl \
  -X POST 'MEILISEARCH_URL/multi-search' \
  -H 'Content-Type: application/json' \
  --data-binary '{
    "federation": {},
    "queries": [
      {
        "indexUid": "movies",
        "q": "batman",
        "federationOptions": {
          "remote": "ms-00"
        }
      },
      {
        "indexUid": "movies",
        "q": "batman",
        "federationOptions": {
          "remote": "ms-01"
        }
      }
    ]
  }'

If all instances share the same network configuration, you can send the search request to any instance. Having "remote": "ms-00" appear in the list of queries on the instance of that name will not cause an actual proxy search thanks to network.self.

Getting started

AI-powered search

Conversational search

Self-hosted

Analytics

Teams

Tasks and asynchronous operations

Configuration

Filtering and sorting

Security and permissions

Multi-search

Update and migration

Data backup

Indexing

Engine

Relevancy

Resources

Implement sharding with remote federated search

Configuring multiple instances

Prerequisites

Activate the `/network` endpoint

Meilisearch Cloud

Self-hosting

Configuring the network object

Setting up the list of remotes

Specify the name of the current instance

Enabling sharding

Adding or removing an instance

Reducing downtime

Create indexes

Add documents

Updating index settings

Perform a search

Getting started

AI-powered search

Conversational search

Self-hosted

Analytics

Teams

Tasks and asynchronous operations

Configuration

Filtering and sorting

Security and permissions

Multi-search

Update and migration

Data backup

Indexing

Engine

Relevancy

Resources

​Configuring multiple instances

​Prerequisites

​Activate the /network endpoint

​Meilisearch Cloud

​Self-hosting

​Configuring the network object

​Setting up the list of remotes

​Specify the name of the current instance

​Enabling sharding

​Adding or removing an instance

​Reducing downtime

​Create indexes

​Add documents

​Updating index settings

​Perform a search

Configuring multiple instances

Prerequisites

Activate the `/network` endpoint

Meilisearch Cloud

Self-hosting

Configuring the network object

Setting up the list of remotes

Specify the name of the current instance

Enabling sharding

Adding or removing an instance

Reducing downtime

Create indexes

Add documents

Updating index settings

Perform a search