Allow the prune_objects task to delete orphaned activities #427

Closed
ilja wants to merge 0 commits from ilja/akkoma:delete_orphaned_activities into develop
Contributor

This builds further on #350 , but i made it a separate PR so we can keep changes smaller in the hope that this makes reviewing easier. Do let me know if it's better to keep the whole thing as one PR.

After pruning objects, we still have a whole bunch of activities that don't link to anything existing any more. Here we find and delete those activities.

I made this an option because it may cause extra load, time, and risk.

Note that I also check what tables we have in the DB. As explained in the comments of the test, this is because the objects that the activities point to can be stored in different tables. Currently I see three of them, but it's possible we add more tables in the future. If those tables also hold objects and we don't check there, we may delete activities who shouldn't be deleted. For that reason I make a test fail when a new table is added.

This builds further on https://akkoma.dev/AkkomaGang/akkoma/pulls/350 , but i made it a separate PR so we can keep changes smaller in the hope that this makes reviewing easier. Do let me know if it's better to keep the whole thing as one PR. After pruning objects, we still have a whole bunch of activities that don't link to anything existing any more. Here we find and delete those activities. * [x] Make a first working thing * [x] Make it work for also object-arrays * [x] Dogfood (woof woof) * The first run, it had to prune 2+ years worth of activities and took over 24h to complete * I then ran on a daily basis and it takes 20-30min each time * After vacuum full, my DB size went from ~10G to ~3G 🎉 * [x] Docs * [x] While we're at it, improve the docs to explain the vacuum full and mention autovacuum (could also be a separate MR). See https://akkoma.dev/AkkomaGang/akkoma/issues/436#issuecomment-6948 * https://docs.akkoma.dev/develop/administration/CLI_tasks/database/#prune-old-remote-posts-from-the-database * https://docs.akkoma.dev/develop/configuration/postgresql/ * Autovacumm should be on by default according to Postgresql docs, so it doesn't seem like something I should add here. * Maybe other places where it makes sense... * Also added it to the vacuum task. Otherwise the only postgresql references I see in the docs are in the install guides. I made this an option because it may cause extra load, time, and risk. Note that I also check what tables we have in the DB. As explained in the comments of the test, this is because the objects that the activities point to can be stored in different tables. Currently I see three of them, but it's possible we add more tables in the future. If those tables also hold objects and we don't check there, we may delete activities who shouldn't be deleted. For that reason I make a test fail when a new table is added.
ilja added 4 commits 2023-01-07 20:04:23 +00:00
This adds an option to the prune_objects mix task.
The original way deleted all non-local public posts older than a certain time frame.
Here we add a different query which you can call using the option --keep-threads.

We query from the activities table all context id's where
    1. the newest activity with this context is still old
    2. none of the activities with this context is is local
    3. none of the activities with this context is bookmarked
and delete all objects with these contexts.

The idea is that posts with local activities (posts, replies, likes, repeats...) may be intersesting to keep.
Besides that, a post lives in a certain context (the thread), so we keep the whole thread as well.

Caveats:
* Quotes have a different context. Therefore, when someone quotes a post, it's possible the quoted post will still be deleted.
* Although undocumented (in docs/docs/administration/CLI_tasks/database.md/#prune-old-remote-posts-from-the-database), the 'normal' delete action still keeps old remote non-public posts. With this option we don't care about scope.
* I ran this on my instance, but directly on the DB. I still need to test to be sure that we don't get a time-out error or something.

Some statistics from explain analyse:
(cost=1402845.92..1933782.00 rows=3810907 width=62) (actual time=2562455.486..2562455.495 rows=0 loops=1)
 Planning Time: 505.327 ms
 Trigger for constraint chat_message_references_object_id_fkey: time=651939.797 calls=921740
 Trigger for constraint deliveries_object_id_fkey: time=52036.009 calls=921740
 Trigger for constraint hashtags_objects_object_id_fkey: time=20665.778 calls=921740
 Execution Time: 3287933.902 ms
The query is now done using Ecto.
I also ran it on a local DB.
It Went from 4000834 records to 1734648 in about an hour without timeout.
The prune_objects task already did this by default, but is undocumented.
Now we require an explicit parameter for it.
The parameter also works in combination with --keep-threads

Docs still needs to happen
Adapt docs for prune_objects
Some checks are pending
ci/woodpecker/pr/woodpecker Pipeline is pending
f1739ac17b
ilja force-pushed delete_orphaned_activities from b1ab4792af to 7286763011 2023-01-08 20:26:11 +00:00 Compare
ilja force-pushed delete_orphaned_activities from 7286763011 to 5e40707baa 2023-01-09 15:54:41 +00:00 Compare
ilja force-pushed delete_orphaned_activities from 5e40707baa to f2506a1ed2 2023-01-09 15:59:13 +00:00 Compare
ilja force-pushed delete_orphaned_activities from f2506a1ed2 to a56bab0cb5 2023-01-21 05:46:57 +00:00 Compare
ilja force-pushed delete_orphaned_activities from a56bab0cb5 to 0d0c540a59 2023-01-21 08:02:40 +00:00 Compare
ilja force-pushed delete_orphaned_activities from 4aea168f05 to 2f7bd7acb9 2023-01-23 08:22:58 +00:00 Compare
ilja changed title from WIP: Allow the prune_objects task to delete orphaned activities to Allow the prune_objects task to delete orphaned activities 2023-01-23 08:41:10 +00:00
ilja force-pushed delete_orphaned_activities from 700d248c74 to 910fbf1747 2023-01-23 09:23:20 +00:00 Compare
ilja force-pushed delete_orphaned_activities from 910fbf1747 to b4fbe9b517 2023-02-12 09:18:37 +00:00 Compare
ilja force-pushed delete_orphaned_activities from b4fbe9b517 to 328b4d93b7 2023-02-26 13:43:34 +00:00 Compare

dw about the conflict that was caused by the other merge, i'll handle it

thanks a lot! this looks like it was a tonne of work, but very useful

dw about the conflict that was caused by the other merge, i'll handle it thanks a lot! this looks like it was a tonne of work, but very useful

merged via f56e3098ef

thankiessss

merged via f56e3098efda7593e15bd8ec7628bd27fb015b38 thankiessss
floatingghost closed this pull request 2023-02-26 22:11:55 +00:00

Pull request closed

Sign in to join this conversation.
No description provided.