Large indexing jobs
On this page
Algolia can handle a large number of indexing requests. But when you need to import tens or hundreds of millions of records in a short period of time, be aware of the caveats.
Algolia always prioritizes search operations, so that indexing operations don’t impact the search performance.
Preparation
If possible, keep Algolia in the loop when you plan to index a large number of records. This lets Algolia monitor and optimize the configuration of your servers and indices, for example, by fine-tuning the sharding of internal indices for your specific data.
Contact your customer success manager or the Algolia support team.
Configure indices before uploading records
Configure an index before uploading records.
Setting the searchableAttributes
parameter beforehand is particularly important
to ensure the best indexing speed.
By default, Algolia indexes all attributes,
but you’ll likely want to search in only a few of them.
For more information, see Searchable attributes.
Ensure the data fits on your servers
For best performance, Algolia stores all indices in memory on your servers. It’s best to keep the combined size of your indices below 80% of the total allocated RAM. When the index size exceeds the RAM capacity, the solid-state hard drive is used, which is much slower.
Your index is usually between two and three times larger than the raw size of your data. The exact factor depends on the structure of your data and your index configuration.
Data upload
Batch indexing jobs
Algolia’s API clients have saveObjects
helper methods
for uploading records in batches.
This is more efficient than uploading one record after the other.
Batches between 1,000 and 100,000 records tend to be optimal,
depending on the average record size.
Each batch should be smaller than 10 MB.
The API can handle batches of up to 1 GB,
but smaller batches lead to faster indexing.
Multi-thread your indexing
You can use several parallel workers to make multiple indexing requests in parallel.
Large datasets
If you have large indexing jobs, you might run into limitations which you can avoid by optimizing the settings of the API clients:
- Change batch size when using the
saveObjects
method. Finding the ideal batch size depends on your average record size and might require a few iterations of trial and error. - Compress the record.
You should also inspect the Algolia HTTP error to decide whether it’s too big to process.