Websites are increasingly getting more bloated with tricks like inlining content (e.g., CNN.com) which puts pages at or above 5MB. This value may still be too low.
Rich Media parsing was previously handled on-demand with a 2 second HTTP request timeout and retained only in Cachex. Every time a Pleroma instance is restarted it will have to request and parse the data for each status with a URL detected. When fetching a batch of statuses they were processed in parallel to attempt to keep the maximum latency at 2 seconds, but often resulted in a timeline appearing to hang during loading due to a URL that could not be successfully reached. URLs which had images links that expire (Amazon AWS) were parsed and inserted with a TTL to ensure the image link would not break.
Rich Media data is now cached in the database and fetched asynchronously. Cachex is used as a read-through cache. When the data becomes available we stream an update to the clients. If the result is returned quickly the experience is almost seamless. Activities were already processed for their Rich Media data during ingestion to warm the cache, so users should not normally encounter the asynchronous loading of the Rich Media data.
Implementation notes:
- The async worker is a Task with a globally unique process name to prevent duplicate processing of the same URL
- The Task will attempt to fetch the data 3 times with increasing sleep time between attempts
- The HTTP request obeys the default HTTP request timeout value instead of 2 seconds
- URLs that cannot be successfully parsed due to an unexpected error receives a negative cache entry for 15 minutes
- URLs that fail with an expected error will receive a negative cache with no TTL
- Activities that have no detected URLs insert a nil value in the Cachex :scrubber_cache so we do not repeat parsing the object content with Floki every time the activity is rendered
- Expiring image URLs are handled with an Oban job
- There is no automatic cleanup of the Rich Media data in the database, but it is safe to delete at any time
- The post draft/preview feature makes the URL processing synchronous so the rendered post preview will have an accurate rendering
Overall performance of timelines and creating new posts which contain URLs is greatly improved.
This lets us:
- avoid issues with broken hash indices for PostgreSQL <10
- drop runtime checks and legacy codepaths for <11 in db search
- always enable custom query plans for performance optimisation
PostgreSQL 11 is already EOL since 2023-11-09, so
in theory everyone should already have moved on to 12 anyway.
From experience, setting DB type to "Online transaction processing
system" seems to give the most optimal configuration in terms of
performance.
I also increased the recomended max connections to 25-30 as that leaves
some room for maintenance tasks to run without running out of
connections.
Finally, I removed the example configs since they're probably out of
date and I think it's better to direct people to use PGTune instead.
Using only the admin key works as well currently
and Akkoma needs to know the admin key to be able
to add new entries etc. However the Meilisearch
key descriptions suggest the admin key is not
supposed to be used for searches, so let’s not.
For compatibility with existings configs, search_key remains optional.
This makes show-key’s output match our documentation as of Meilisearch
1.8.0-8-g4d5971f343c00d45c11ef0cfb6f61e83a8508208. Since I’m not sure
if older versions maybe only provided description, it will fallback to
the latter if no name parameter exists.
Meilisearch is already configured to return results sorted by a
particular ranking configured in the meilisearch CLI task.
Resorting the returned top results by date partially negates this and
runs counter to what someone with tweaked settings expects.
Issue and fix identified by AdamK2003 in
AkkomaGang/akkoma#579
But instead of using a O(n^2) resorting, this commit directly
retrieves results in the correct order from the database.
Closes: AkkomaGang/akkoma#579
And while add it point to this via a top-level
FEDERATION.md document as standardised by FEP-67ff.
Also add a few missing descriptions to the config cheatsheet
and move the recently removed C2S extension into an appropiate
subsection.
Trying to display non-media as media crashed the renderer,
but when posting a status with a valid, non-media object id
the post was still created, but then crashed e.g. timeline rendering.
It also crashed C2S inbox reads, so this could not be used to leak
private posts.
Afaict this was never used, but keeping this (in theory) possible
hinders detecting which objects are actually media uploads and
which proper ActivityPub objects.
It was originally added as part of upload support itself in
02d3dc6869 without being used
and `git log -S:activity_type` and `git log -Sactivity_type:`
don't find any other commits using this.