How to configure Ollama options for AI Engine

Important: this feature is planned for version 1.4.1, with release expected by June 30, 2026.
Some details, screenshots, or options shown on this page may change before the official release.

Overview

This page explains how to configure the available Ollama options in the AI Engine section of the plugin.

Ollama allows you to generate embeddings using a local model instead of relying on an external API provider.

These settings help you adapt the embeddings generation process to different machines, models, and runtime conditions.

You can configure:

  • the Ollama model
  • the Ollama runtime profile
  • the Ollama endpoint mode
  • custom runtime limits, when using the Custom profile

Ollama model

Before using Ollama with the AI Engine, you need to install an embeddings model on the machine where Ollama is running.

You can find compatible models in the official Ollama embeddings models list.

After choosing a model, install it using the command shown on the model page. For example:

ollama pull nomic-embed-text

Install the nomic-embed-text model with Ollama on Windows, macOS, and Linux

Once the model has been installed, select or enter the same model name in the plugin settings.

Make sure to use a model designed for embeddings. Regular text generation models are not suitable for this feature.

Available Ollama models

Ollama runtime profile

The Ollama runtime profile controls the limits used by the plugin during embeddings generation.

Ollama runtime profile

These limits affect how the plugin splits long content into chunks, how many chunks can be processed for each post, and how much time the generation process can use.

Profile Chunk size Overlap Max chunks Request timeout Runtime budget
Compatibility 2500 150 32 25s 90s
Balanced 4500 250 48 30s 120s
Performance 6000 350 64 40s 180s
Custom Manual Manual Manual Manual Manual

The default profile is Balanced.

Use Compatibility if your machine has limited resources or if the selected model is slow or heavy.

Use Balanced in most cases.

Use Performance only when Ollama runs on a more powerful machine.

Use Custom only if you need to manually adjust the generation limits.

When you switch from Custom to one of the predefined profiles, the plugin uses the selected preset values for generation. Your previously saved Custom values are preserved and are restored when you switch back to Custom.

Custom options

The Custom options are available only when Ollama runtime profile is set to Custom.

These fields allow you to manually control the limits used during embeddings generation.

Ollama runtime profile options

If you select Compatibility, Balanced, or Performance, the Custom fields are disabled for generation and the preset values are used instead. The saved Custom values are not overwritten by the preset.

Field Default Min Max Step Description
Chunk size 4500 300 12000 100 Maximum size of the text block sent to the model.
Chunk overlap 250 0 6000 50 Amount of text shared between consecutive chunks. The value 0 is valid and disables overlap.
Max chunks per post 48 1 160 1 Maximum number of chunks that can be processed for each post. If a post generates more chunks than this value, it may be indexed only partially.
Request timeout 30 5 120 1 Maximum waiting time for each request sent to Ollama.
Runtime budget per post 120 10 600 5 Maximum total processing time allowed for generating the embedding of a single post.

Max chunks behavior

The Max chunks per post value defines the maximum number of chunks that the plugin can process for each post.

At runtime, this limit follows the value configured by the user. It is not tied to the initial estimated number of chunks.

If a post generates more chunks than the configured limit, the post may be indexed only partially.

For example, if Max chunks per post is set to 1, partial indexing can be expected even on short texts with some models. In this case, the log may show a message such as processed 1/2 with the cause chunk cap reached.

Important notes about runtime limits

The limits configured in the interface are not hard limits imposed by Ollama.

They are guardrails applied by the plugin to reduce the risk of long-running or unstable generation processes.

For models with a smaller context, the plugin may reduce the effective chunk size even if the slider is set to a higher value.

During foreground generation, the plugin also aligns the PHP execution budget with the selected runtime profile. However, web server or browser requests can still be interrupted if the local machine is under heavy load. In that case, the plugin checks server-side progress before marking the batch as failed.

If you change the Ollama model or tag and the embedding size is different, it is recommended to regenerate the existing embeddings.

Recommended configuration

The best configuration depends on the machine running Ollama and on the selected model.

Slow machine

Use this configuration if Ollama runs on a machine with limited resources or if you frequently experience timeouts.

Option Recommended value
Ollama runtime profile Compatibility
Ollama endpoint Auto

This profile uses smaller chunks and more conservative limits.

It is slower, but it can reduce the risk of failed or partial generation.

Average machine

Use this configuration in most cases.

Option Recommended value
Ollama runtime profile Balanced
Ollama endpoint Auto

The Balanced profile provides a good compromise between stability, speed, and content coverage.

Powerful machine

Use this configuration only if Ollama runs on a machine with adequate resources.

Option Recommended value
Ollama runtime profile Performance
Ollama endpoint Auto

This profile increases chunk size, max chunks, request timeout, and runtime budget.

It can improve coverage for long content, but it requires more resources.

Ollama endpoint

The Ollama endpoint option controls which Ollama API endpoint is used by the plugin.

Ollama endpoint
Option Description
Auto Default option. The plugin tries /api/embed first and falls back to /api/embeddings if needed.
Legacy Always uses /api/embeddings.
Modern Always uses /api/embed.

In most cases, you should keep this option set to Auto.

Use Legacy or Modern only if you need to force a specific Ollama endpoint.

Log messages and troubleshooting

During embeddings generation with Ollama, the plugin may display log messages related to batch progress, partial indexing, timeouts, HTTP errors, or invalid embedding responses.

These messages can help you understand whether the selected model, endpoint, or runtime profile is suitable for your current machine.

If the browser request is interrupted but the server continues processing, the log may show a connection recovery message. When recovery succeeds and the embedding metadata reports coverage_complete: true, the saved embedding is complete even if the connection warning was shown.

For the complete list of log messages and their meaning, please refer to the generation log messages documentation.

FAQ

Which runtime profile should I use?

In most cases, use Balanced, which is the default profile.

If you experience timeouts, errors, or partial indexing, try Compatibility.

If Ollama runs on a powerful machine, you can try Performance.

When should I use Custom?

Use Custom only if you need to manually control chunk size, overlap, timeout, max chunks, or runtime budget.

This can be useful when the predefined profiles are not suitable for your machine or selected model.

What happens if I set Chunk overlap to 0?

The value 0 is valid and disables overlap between consecutive chunks.

Why do I see partial indexing when Max chunks per post is set to 1?

If Max chunks per post is set to 1, the plugin can process only one chunk for each post.

Some models may split even short texts into more than one chunk. In that case, partial indexing is expected and the log may show processed 1/2 with the cause chunk cap reached.

To avoid this, increase Max chunks per post or use one of the predefined runtime profiles.

Are the slider values Ollama limits?

No.

They are guardrails applied by the plugin.

The plugin may still reduce the effective chunk size if the selected model has a smaller context.

Should I regenerate embeddings after changing the Ollama model?

Yes, it is recommended to regenerate embeddings if you change the Ollama model or tag and the embedding size is different.

Which Ollama endpoint should I select?

In most cases, use Auto.

The plugin will try /api/embed first and fall back to /api/embeddings if needed.