Skip to content

Using AI Models

As an addition to the built-in TensorFlow models, PhotoPrism lets you generate captions and labels with Ollama and the OpenAI API. Our step-by-step guides explain how to set them up and provide tested configuration examples you can use as a starting point.

Learn more ›

Model Engines

PhotoPrism currently supports the following runtimes and services:

Engine Resolution Runs Best For
TensorFlow 224 px Built-in Fast, offline default models for core features (labels, faces, NSFW)
Ollama 720 px Self-Hosted Good for generating quality captions & labels; a server with GPU is recommended
OpenAI API 720 px Cloud Highest quality captions & labels, also suitable for users without a GPU; requires API key and network access

Performance

  • TensorFlow: Our built-in models generally perform well on all types of hardware.
  • Ollama: Generating labels for an image on an NVIDIA RTX 4060 usually takes 1-4 seconds. The exact time varies depending on the model used and the number of labels generated.
  • OpenAI: Processing one image takes about 3 seconds, though this can vary by model, region, and demand.

Without GPU acceleration, Ollama models will be significantly slower, taking anywhere from 10 seconds to over a minute to complete. This may be acceptable if you only want to process a few pictures or are willing to wait.

vision.yml Reference

Custom AI engines, models, and run modes can be specified in a vision.yml file located in the storage/config directory. The file defines a list of models and thresholds to be used, e.g.:

Models:
- Type: caption
  Model: gemma3:latest
  Engine: ollama
  Run: auto
  Options:
    Temperature: 0.05
  Service:
    Uri: http://ollama:11434/api/generate
- Type: labels
  Model: qwen3-vl:latest
  Engine: ollama
  Service:
    Uri: http://ollama:11434/api/generate
Thresholds:
  Confidence: 10
  Topicality: 0
  NSFW: 75

If a model type is omitted, PhotoPrism will use the built-in defaults for labels, nsfw, face, or caption. The optional Thresholds block can be used to filter out labels with a low probability or adjust the probability of flagging content as NSFW.

Field Default Notes
Type (required) labels, caption, face, nsfw. Drives routing & scheduling.
Model "" Raw identifier override; precedence: Service.ModelModelName.
Name derived from type/version Display name; lower-cased by helpers.
Version latest (non-OpenAI) OpenAI payloads omit version.
Engine inferred from service/alias Aliases set formats, file scheme, resolution. Explicit Service values still win.
Run auto See Run modes table below.
Default false Keep one per type for TensorFlow fallbacks.
Disabled false Registered but inactive.
Resolution 224 (TensorFlow) / 720 (Ollama/OpenAI) Thumbnail edge in px; TensorFlow models default to 224 unless you override.
System / Prompt engine defaults / empty Override prompts per model.
Format "" Response hint (json, text, markdown).
Schema / SchemaFile engine defaults / empty Inline vs file JSON schema (labels).
TensorFlow engine defaults / empty Local TF model info (paths, tags).
Options engine defaults / empty Sampling/settings merged with engine defaults.
Service engine defaults / empty Remote endpoint config (see below).

Run Modes

Value When it runs Recommended use
auto TensorFlow defaults during index; external via metadata/schedule Leave as-is for most setups.
manual Only when explicitly invoked (CLI/API) Experiments and diagnostics.
on-index During indexing + manual Fast built-in models only.
newly-indexed Metadata worker after indexing + manual External/Ollama/OpenAI without slowing import.
on-demand Manual, metadata worker, and scheduled jobs Broad coverage without index path.
on-schedule Scheduled jobs + manual Nightly/cron-style runs.
always Indexing, metadata, scheduled, manual High-priority models; watch resource use.
never Never executes Keep definition without running it.

For performance reasons, on-index is only supported for the built-in TensorFlow models.

Options

Adjusts model parameters, such as temperature and top-p, as well as other constraints, when using Ollama or OpenAI:

Option Engines Default Description
Temperature Ollama, OpenAI engine default Controls randomness with a value between 0.01 and 2.0; not used for OpenAI's GPT-5.
TopK Ollama engine default Limits sampling to the top K tokens to reduce rare or noisy outputs.
TopP Ollama, OpenAI engine default Nucleus sampling; keeps the smallest token set whose cumulative probability ≥ p.
MinP Ollama engine default Drops tokens whose probability mass is below p, trimming the long tail.
TypicalP Ollama engine default Keeps tokens with typicality under the threshold; combine with TopP/MinP for flow.
TfsZ Ollama engine default Tail free sampling parameter; lower values reduce repetition.
Seed Ollama random per run Fix for reproducible outputs; unset for more variety between runs.
NumKeep Ollama engine default How many tokens to keep from the prompt before sampling starts.
RepeatLastN Ollama engine default Number of recent tokens considered for repetition penalties.
RepeatPenalty Ollama engine default Multiplier >1 discourages repeating the same tokens or phrases.
PresencePenalty OpenAI engine default Increases the likelihood of introducing new tokens by penalizing existing ones.
FrequencyPenalty OpenAI engine default Penalizes tokens in proportion to their frequency so far.
PenalizeNewline Ollama engine default Whether to apply repetition penalties to newline tokens.
Stop Ollama, OpenAI engine default Array of stop sequences (e.g., ["\\n\\n"]).
Mirostat Ollama engine default Enables Mirostat sampling (0 off, 1/2 modes).
MirostatTau Ollama engine default Controls surprise target for Mirostat sampling.
MirostatEta Ollama engine default Learning rate for Mirostat adaptation.
NumPredict Ollama engine default Ollama-specific max output tokens; synonymous intent with MaxOutputTokens.
MaxOutputTokens Ollama, OpenAI engine default Upper bound on generated tokens; adapters raise low values to defaults.
ForceJson Ollama, OpenAI engine default Forces structured output when enabled.
SchemaVersion Ollama, OpenAI derived from schema Override when coordinating schema migrations.
CombineOutputs OpenAI engine default Controls whether multi-output models combine results automatically.
Detail OpenAI engine default Controls OpenAI vision detail level (low, high, auto).
NumCtx Ollama, OpenAI engine default Context window length (tokens).
NumThread Ollama runtime auto Caps CPU threads for local engines.
NumBatch Ollama engine default Batch size for prompt processing.
NumGpu Ollama engine default Number of GPUs to distribute work across.
MainGpu Ollama engine default Primary GPU index when multiple GPUs are present.
LowVram Ollama engine default Enable VRAM-saving mode; may reduce performance.
VocabOnly Ollama engine default Load vocabulary only for quick metadata inspection.
UseMmap Ollama engine default Memory map model weights instead of fully loading them.
UseMlock Ollama engine default Lock model weights in RAM to reduce paging.
Numa Ollama engine default Enable NUMA-aware allocations when available.

Service

Configures the endpoint URL, method, format, and authentication for Ollama, OpenAI, and other engines that perform remote HTTP requests:

Field Default Notes
Uri engine default Service endpoint URL. Empty for local models.
Method POST Override only if provider needs it.
Key "" Bearer token; supports env expansion (OpenAI: OPENAI_API_KEY, Ollama: OLLAMA_API_KEY1).
Username / Password "" Injected as basic auth when Uri lacks userinfo.
Model "" Endpoint-specific override; wins over model/name.
Org / Project "" Organization / Project ID when using OpenAI.
RequestFormat / ResponseFormat engine default Explicit values win over engine defaults.
FileScheme engine default Controls image transport e.g. data or base64.
Disabled false Disables the endpoint without removing the model.

Authentication: All credentials and identifiers support ${ENV_VAR} expansion. Service.Key sets Authorization: Bearer <token>; Username/Password injects HTTP basic authentication into the service URI when it is not already present. When Service.Key is empty, PhotoPrism defaults to OPENAI_API_KEY (OpenAI engine) or OLLAMA_API_KEY1 (Ollama engine), also honoring their _FILE counterparts.


  1. Can be used with our preview build and in the next stable release.