akkoma

Author	SHA1	Message	Date
Floatingghost	ad52135bf5	Convert rich media backfill to oban task	2024-06-11 18:06:51 +01:00
Floatingghost	28d357f52c	add diagnostic script	2024-06-10 15:10:47 +01:00
Floatingghost	9c5feb81aa	fix tests	2024-06-09 21:26:29 +01:00
Floatingghost	a360836ce3	fix oembed test	2024-06-09 21:17:12 +01:00
Floatingghost	840c70c4fa	remove prints	2024-06-09 18:52:09 +01:00
Floatingghost	c65379afea	attempt to fix some tests	2024-06-09 18:45:38 +01:00
Floatingghost	16bed0562d	Fix tests	2024-06-09 18:28:00 +01:00
Mark Felder	a801dd7b07	Fix module struct matching	2024-06-09 17:38:28 +01:00
Mark Felder	1e86da43f5	Credo	2024-06-09 17:38:24 +01:00
Mark Felder	411831458c	Credo	2024-06-09 17:38:18 +01:00
Mark Felder	56463b2121	Fix compile warning warning: "else" clauses will never match because all patterns in "with" will always match lib/pleroma/web/rich_media/parser/ttl/opengraph.ex:10	2024-06-09 17:38:12 +01:00
Mark Felder	2f5eb79473	Mastodon API: Remove deprecated GET /api/v1/statuses/:id/card endpoint Removed back in 2019 https://github.com/mastodon/mastodon/pull/11213	2024-06-09 17:38:06 +01:00
Mark Felder	f4daa90bd8	Remove test validating missing descriptions are returned as an empty string	2024-06-09 17:37:59 +01:00
Mark Felder	688748b531	Improve test description	2024-06-09 17:37:32 +01:00
Mark Felder	2e5aa71176	Rich Media Cards are fetched asynchonously and not guaranteed to be available on first post render	2024-06-09 17:37:22 +01:00
Mark Felder	7ca655a999	Rich Media Cards are cached by URL not per status	2024-06-09 17:36:57 +01:00
Mark Felder	4746f98851	Fix broken Rich Media parsing when the image URL is a relative path	2024-06-09 17:36:28 +01:00
Mark Felder	765c7e98d2	Respect the TTL returned in OpenGraph tags	2024-06-09 17:36:15 +01:00
Mark Felder	ddbe989461	Fix broken tests	2024-06-09 17:35:47 +01:00
Floatingghost	4a3dd5f65e	lost in cherry-pick	2024-06-09 17:34:41 +01:00
Mark Felder	bfe4152385	Increase the :max_body for Rich Media to 5MB Websites are increasingly getting more bloated with tricks like inlining content (e.g., CNN.com) which puts pages at or above 5MB. This value may still be too low.	2024-06-09 17:34:29 +01:00
Mark Felder	5da9cbd8a5	RichMedia refactor Rich Media parsing was previously handled on-demand with a 2 second HTTP request timeout and retained only in Cachex. Every time a Pleroma instance is restarted it will have to request and parse the data for each status with a URL detected. When fetching a batch of statuses they were processed in parallel to attempt to keep the maximum latency at 2 seconds, but often resulted in a timeline appearing to hang during loading due to a URL that could not be successfully reached. URLs which had images links that expire (Amazon AWS) were parsed and inserted with a TTL to ensure the image link would not break. Rich Media data is now cached in the database and fetched asynchronously. Cachex is used as a read-through cache. When the data becomes available we stream an update to the clients. If the result is returned quickly the experience is almost seamless. Activities were already processed for their Rich Media data during ingestion to warm the cache, so users should not normally encounter the asynchronous loading of the Rich Media data. Implementation notes: - The async worker is a Task with a globally unique process name to prevent duplicate processing of the same URL - The Task will attempt to fetch the data 3 times with increasing sleep time between attempts - The HTTP request obeys the default HTTP request timeout value instead of 2 seconds - URLs that cannot be successfully parsed due to an unexpected error receives a negative cache entry for 15 minutes - URLs that fail with an expected error will receive a negative cache with no TTL - Activities that have no detected URLs insert a nil value in the Cachex :scrubber_cache so we do not repeat parsing the object content with Floki every time the activity is rendered - Expiring image URLs are handled with an Oban job - There is no automatic cleanup of the Rich Media data in the database, but it is safe to delete at any time - The post draft/preview feature makes the URL processing synchronous so the rendered post preview will have an accurate rendering Overall performance of timelines and creating new posts which contain URLs is greatly improved.	2024-06-09 17:33:48 +01:00
Floatingghost	a924e117fd	Add pool timeouts	2024-06-09 17:20:29 +01:00
floatingghost	d1c4b97613	Merge pull request 'Raise minimum PostgreSQL version to 12' (#786 ) from Oneric/akkoma:psql-min-ver into develop Reviewed-on: AkkomaGang/akkoma#786	2024-06-07 16:53:22 +00:00
Oneric	2180d068ae	Raise log level for start failures	2024-06-07 16:21:21 +02:00
Oneric	a3840e7d1f	Raise minimum PostgreSQL version to 12 This lets us: - avoid issues with broken hash indices for PostgreSQL <10 - drop runtime checks and legacy codepaths for <11 in db search - always enable custom query plans for performance optimisation PostgreSQL 11 is already EOL since 2023-11-09, so in theory everyone should already have moved on to 12 anyway.	2024-06-07 16:21:09 +02:00
Oneric	b17d3dc6d8	Fix changelog Apparently got jumbled during some rebase(s)	2024-06-07 16:20:34 +02:00
floatingghost	f8f364d36d	Merge pull request 'Handle errors from HTTP requests gracefully' (#791 ) from wp-embeds into develop Reviewed-on: AkkomaGang/akkoma#791	2024-06-07 12:58:58 +00:00
floatingghost	329d8fcba8	Merge pull request 'Update PGTune recommendations' (#795 ) from norm/akkoma:pgtune into develop Reviewed-on: AkkomaGang/akkoma#795	2024-06-07 12:57:00 +00:00
Norm	e2860e5292	Update PGTune recommendations From experience, setting DB type to "Online transaction processing system" seems to give the most optimal configuration in terms of performance. I also increased the recomended max connections to 25-30 as that leaves some room for maintenance tasks to run without running out of connections. Finally, I removed the example configs since they're probably out of date and I think it's better to direct people to use PGTune instead.	2024-06-06 12:18:51 -04:00
Oneric	df27567d99	mrf/steal_emoji: display download_unknown_size in admin-fe Fixes omission in `d6d838cbe8`	2024-06-05 20:14:10 +02:00
Oneric	be5440c5e8	mrf/steal_emoji: fix size limit check Headers are strings, but this expected to already get an int thus always failing the comparison if the header was set. Fixes mistake in `d6d838cbe8`	2024-06-05 20:11:53 +02:00
Oneric	68fe0a9633	test: fix content-length value type All headers are strings, always. In this case it didn't matter atm, but let’s not provide confusing examples.	2024-06-05 19:59:59 +02:00
Floatingghost	0f65dd3ebe	remove pointless logger	2024-06-04 14:34:59 +01:00
Floatingghost	38d09cb0ce	remove now-pointless clause	2024-06-04 14:34:18 +01:00
Floatingghost	c9a03af7c1	Move rescue to the HTTP request itself	2024-06-04 14:30:16 +01:00
Floatingghost	0f7ae0fa21	am i baka	2024-06-04 14:26:33 +01:00
Floatingghost	30e13a8785	Don't error on rich media fail	2024-06-04 14:21:40 +01:00
Floatingghost	778b213945	enqueue pin fetches after changeset validation	2024-06-01 08:25:35 +01:00
Oneric	bed7ff8e89	mix: consistently use shell_info and shell_error Logger output being visible depends on user configuration, but most of the prints in mix tasks should always be shown. When running inside a mix shell, it’s probably preferable to send output directly to it rather than using raw IO.puts and we already have shell_* functions for this, let’s use them everywhere.	2024-05-31 17:17:42 +02:00
Oneric	70cd5f91d8	dbprune/activites: prune array activities first This query is less costly; if something goes wrong or gets aborted later at least this part will arelady be done.	2024-05-31 17:16:40 +02:00
Oneric	aeaebb566c	dbprune: allow splitting array and single activity prunes The former is typically just a few reports; it doesn't make sense to rerun it over and over again in batched prunes or if a full prune OOMed.	2024-05-31 17:16:40 +02:00
Oneric	5751637926	dbprune: use query!	2024-05-31 17:16:40 +02:00
Oneric	24bab63cd8	dbprune: add more logs Pruning can go on for a long time; give admins some insight into that something is happening to make it less frustrating and to make it easier which part of the process is stalled should this happen. Again most of the changes are merely reindents; review with whitespace changes hidden recommended.	2024-05-31 17:16:40 +02:00
Oneric	1d4c212441	dbprune: shortcut array activity search This brought down query costs from 7,953,740.90 to 47,600.97	2024-05-31 17:16:40 +02:00
Oneric	6e7cbf1885	Test both standalone and flag mode for pruning orphaned activities	2024-05-31 17:16:40 +02:00
Oneric	225f87ad62	Also allow limiting the initial prune_object May sometimes be helpful to get more predictable runtime than just with an age-based limit. The subquery for the non-keep-threads path is required since delte_all does not directly accept limit(). Again most of the diff is just adjusting indentation, best hide whitespace-only changes with git diff -w or similar.	2024-05-31 17:16:40 +02:00
Oneric	e64f031167	Log number of deleted rows in prune_orphaned_activities This gives feedback when to stop rerunning limited batches. Most of the diff is just adjusting indentation; best reviewed with whitespace-only changes hidden, e.g. `git diff -w`.	2024-05-31 17:16:40 +02:00
Oneric	fa52093bac	Add standalone prune_orphaned_activities CLI task This part of pruning can be very expensive and bog down the whole instance to an unusable sate for a long time. It can thus be desireable to split it from prune_objects and run it on its own in smaller limited batches. If the batches are smaller enough and spaced out a bit, it may even be possible to avoid any downtime. If not, the limit can still help to at least make the downtime duration somewhat more predictable.	2024-05-31 17:16:40 +02:00
Oneric	3126d15ffc	refactor: move prune_orphaned_activities into own function No logic changes. Preparation for standalone orphan pruning.	2024-05-31 17:16:39 +02:00

1 2 3 4 5 ...

15788 commits