docs: update search config docs
All checks were successful
ci/woodpecker/pr/test/2 Pipeline was successful
ci/woodpecker/pr/test/1 Pipeline was successful

And fix formatting issues and typos
This commit is contained in:
Oneric 2026-05-26 00:00:00 +00:00
commit 1f214f94f9
2 changed files with 110 additions and 41 deletions

View file

@ -859,25 +859,6 @@ config :logger, :ex_syslogger,
format: "$metadata[$level] $message"
```
## Database options
### RUM indexing for full text search
!!! warning
It is recommended to use PostgreSQL v11 or newer. We have seen some minor issues with lower PostgreSQL versions.
* `rum_enabled`: If RUM indexes should be used. Defaults to `false`.
RUM indexes are an alternative indexing scheme that is not included in PostgreSQL by default. While they may eventually be mainlined, for now they have to be installed as a PostgreSQL extension from https://github.com/postgrespro/rum.
Their advantage over the standard GIN indexes is that they allow efficient ordering of search results by timestamp, which makes search queries a lot faster on larger servers, by one or two orders of magnitude. They take up around 3 times as much space as GIN indexes.
To enable them, both the `rum_enabled` flag has to be set and the following special migration has to be run:
`mix ecto.migrate --migrations-path priv/repo/optional_migrations/rum_indexing/`
This will probably take a long time.
## Authentication
### :admin_token

View file

@ -18,10 +18,37 @@ config :pleroma, Pleroma.Search, task_timeout: 51_610
To use built-in search that has no external dependencies, set the search module to `Pleroma.Activity`:
> config :pleroma, Pleroma.Search, module: Pleroma.Search.DatabaseSearch
```elixir
config :pleroma, Pleroma.Search, module: Pleroma.Search.DatabaseSearch
```
While it has no external dependencies, it has problems with performance and relevancy.
It has no external dependencies and requires the least amount of disk usage.
However, it is slower than external providers and for performance reasons
is limited to sorting results by recency alone instead of match quality.
Result quality may depend on how well your FTS config (typically a language)
matches the posts on your server.
Also keep in mind to make use of PostgreSQLs websearch syntax. Full documentation can be found
[here](https://www.postgresql.org/docs/current/textsearch-controls.html#TEXTSEARCH-PARSING-QUERIES).
In short, searching for `apple pie` will match everything containing those two words in any order or place of the text.
`"apple pie"` only matches if both words appear and `pie` immediately follows `apple`.
`apple -orange` matches everything referencing apple(s) but not orange(s).
And `or` acts as an operator to be used for alternatives; for a literal word it must be quoted `"or"`.
#### Change FTS config
By default Akkoma uses the `simple` config which performs almost no normalisation (except casing)
nor removes any stop words, making it rather strict but independent of language.
You can change your config with the `database set_text_search_config` task.
See [docs on CJK search](./howto_search_cjk.md) for advanced examples.
For many languages, Postgres already has a preset built-in you can use as is; e.g.
```sh
...database set_text_search_config english
```
#### Fuzzy search
You may choose to limit the set of posts considered during a FTS search for better performance
in exchange for potentially non-deterministic and less relevant results. See documentation of
`gin_fuzzy_search_limit` in [PostgreSQLs docs](https://www.postgresql.org/docs/17/gin.html#GIN-TIPS).
@ -31,36 +58,93 @@ By default FTS search is exact and considers everything. To enforce a limit only
config :pleroma, Pleroma.Search.DatabaseSearch, gin_fuzzy_search_limit: 10_000
```
#### RUM
Instead of a GIN index, the built-in database search can also use a RUM index.
The hope was to improve performance at the cost of higher disk-storage (around 3× more)
by leveraging RUMs capability to include extra data (here timestamps for sorting)
in the index thus avoiding additional heap lookups.
However, since search queries still need to filter results down to respect visibility etc
heap lookups are still needed anyway and in practice this probably doesnt actually help much if at all
while taking up more disk space and needing extra setup work.
You probably dont want to use this and if you already do, consider migrating back to GIN.
##### Enable
!!! warning
It is recommended to use PostgreSQL v11 or newer. We have seen some minor issues with lower PostgreSQL versions.
RUM indexes are a third-party PostgreSQL extension.
First install it via your distro if available or manually from source: https://github.com/postgrespro/rum.
Then change your Akkoma config to set:
```elixir
config :pleroma, :database, rum_enabled: true
```
To enable them, both the `rum_enabled` flag has to be set and the following special migration has to be run:
Finally run the special RUM migrations:
```sh
mix ecto.migrate --migrations-path priv/repo/optional_migrations/rum_indexing/
```
This will probably take a long time.
##### Disable
Just delete the config setting again and revert RUM-specific migrations with
```sh
mix ecto.rollback --all --migrations-path priv/repo/optional_migrations/rum_indexing/
```
Then reapply any potential regular GIN versions of the reverted migrations:
```sh
mix ecto.migrate
```
You can now remove the extension again.
### Meilisearch
Note that it's quite a bit more memory hungry than PostgreSQL (around 4-5G for ~1.2 million
posts while idle and up to 7G while indexing initially). The disk usage for this additional index is also
around 4 gigabytes. Like [RUM](./cheatsheet.md#rum-indexing-for-full-text-search) indexes, it offers considerably
higher performance and ordering by timestamp in a reasonable amount of time.
Additionally, the search results seem to be more accurate.
Note that it's quite a bit more memory hungry than PostgreSQL (around 4-5GB for ~1.2 million
posts while idle and up to 7GB while indexing initially). It also requires significantly more
disk space (around 4GB for the previous example setup).
This however allows it to process individually queries faster while also sorting results
by match quality ("relevancy") rather than just by recency.
Additionally, the search profits from Meilisearchs typo tolerance etc;
see: [Meilisearchs documentation](https://www.meilisearch.com/docs/capabilities/full_text_search/overview).
Due to high memory usage, it may be best to set it up on a different machine, if running akkoma on a low-resource
Due to high memory usage, it may be best to set it up on a different machine, if running Akkoma on a low-resource
computer, and use private key authentication to secure the remote search instance.
To use [meilisearch](https://www.meilisearch.com/), set the search module to `Pleroma.Search.Meilisearch`:
> config :pleroma, Pleroma.Search, module: Pleroma.Search.Meilisearch
```elixir
config :pleroma, Pleroma.Search, module: Pleroma.Search.Meilisearch
```
You then need to set the address of the meilisearch instance, and optionally the private key for authentication. You might
also want to change the `initial_indexing_chunk_size` to be smaller if your server is not very powerful, but not higher than `100_000`,
because Meilisearch will refuse to process it if it's too big. However, in general you want this to be as big as possible, because Meilisearch
indexes faster when it can process many posts in a single batch.
> config :pleroma, Pleroma.Search.Meilisearch,
> url: "http://127.0.0.1:7700/",
> private_key: "private key",
> search_key: "search key",
> initial_indexing_chunk_size: 100_000
```elixir
config :pleroma, Pleroma.Search.Meilisearch,
url: "http://127.0.0.1:7700/",
private_key: "private key",
search_key: "search key",
initial_indexing_chunk_size: 100_000
```
Information about setting up Meilisearch can be found in the
[official documentation](https://docs.meilisearch.com/learn/getting_started/installation.html).
You probably want to start it with `MEILI_NO_ANALYTICS=true` environment variable to disable analytics.
At least version 0.25.0 is required, but you are strongly adviced to use at least 0.26.0, as it introduces
At least version 0.25.0 is required, but you are strongly advised to use at least 0.26.0, as it introduces
the `--enable-auto-batching` option which drastically improves performance. Without this option, the search
is hardly usable on a somewhat big instance.
@ -87,7 +171,7 @@ just leave `search_key` completely unset in Akkoma's config.
#### Initial indexing
After setting up the configuration, you'll want to index all of your already existsing posts. Only public posts are indexed. You'll only
After setting up the configuration, you'll want to index all of your already existing posts. Only public posts are indexed. You'll only
have to do it one time, but it might take a while, depending on the amount of posts your instance has seen. This is also a fairly RAM
consuming process for `meilisearch`, and it will take a lot of RAM when running if you have a lot of posts (seems to be around 5G for ~1.2
million posts while idle and up to 7G while indexing initially, but your experience may be different).
@ -156,18 +240,22 @@ As with Meilisearch, this can be rather memory-hungry, but it is very good at wh
To use [Elasticsearch](https://www.elastic.co/), set the search module to `Pleroma.Search.Elasticsearch`:
> config :pleroma, Pleroma.Search, module: Pleroma.Search.Elasticsearch
```elixir
config :pleroma, Pleroma.Search, module: Pleroma.Search.Elasticsearch
```
You then need to set the URL and authentication credentials if relevant.
> config :pleroma, Pleroma.Search.Elasticsearch.Cluster,
> url: "http://127.0.0.1:9200/",
> username: "elastic",
> password: "changeme",
```elixir
config :pleroma, Pleroma.Search.Elasticsearch.Cluster,
url: "http://127.0.0.1:9200/",
username: "elastic",
password: "changeme"
```
#### Initial indexing
After setting up the configuration, you'll want to index all of your already existsing posts. You'll only have to do it one time, but it might take a while, depending on the amount of posts your instance has seen.
After setting up the configuration, you'll want to index all of your already existing posts. You'll only have to do it one time, but it might take a while, depending on the amount of posts your instance has seen.
The sequence of actions is as follows: