Prune Objects --keep-threads option #350

Merged
floatingghost merged 4 commits from ilja/akkoma:prune_objects_whithout_breaking_threads into develop 2023-01-09 22:15:42 +00:00

4 commits

Author SHA1 Message Date
ilja
f1739ac17b Adapt docs for prune_objects
Some checks are pending
ci/woodpecker/pr/woodpecker Pipeline is pending
2023-01-04 19:19:07 +01:00
ilja
92d2f8b401 Add --keep-non-public option
The prune_objects task already did this by default, but is undocumented.
Now we require an explicit parameter for it.
The parameter also works in combination with --keep-threads

Docs still needs to happen
2023-01-04 19:17:03 +01:00
ilja
04cc1d41ce Build prune_objects --keep-threads query with Ecto
The query is now done using Ecto.
I also ran it on a local DB.
It Went from 4000834 records to 1734648 in about an hour without timeout.
2023-01-04 19:17:03 +01:00
ilja
eb503f093c Prune Objects --keep-threads
This adds an option to the prune_objects mix task.
The original way deleted all non-local public posts older than a certain time frame.
Here we add a different query which you can call using the option --keep-threads.

We query from the activities table all context id's where
    1. the newest activity with this context is still old
    2. none of the activities with this context is is local
    3. none of the activities with this context is bookmarked
and delete all objects with these contexts.

The idea is that posts with local activities (posts, replies, likes, repeats...) may be intersesting to keep.
Besides that, a post lives in a certain context (the thread), so we keep the whole thread as well.

Caveats:
* Quotes have a different context. Therefore, when someone quotes a post, it's possible the quoted post will still be deleted.
* Although undocumented (in docs/docs/administration/CLI_tasks/database.md/#prune-old-remote-posts-from-the-database), the 'normal' delete action still keeps old remote non-public posts. With this option we don't care about scope.
* I ran this on my instance, but directly on the DB. I still need to test to be sure that we don't get a time-out error or something.

Some statistics from explain analyse:
(cost=1402845.92..1933782.00 rows=3810907 width=62) (actual time=2562455.486..2562455.495 rows=0 loops=1)
 Planning Time: 505.327 ms
 Trigger for constraint chat_message_references_object_id_fkey: time=651939.797 calls=921740
 Trigger for constraint deliveries_object_id_fkey: time=52036.009 calls=921740
 Trigger for constraint hashtags_objects_object_id_fkey: time=20665.778 calls=921740
 Execution Time: 3287933.902 ms
2023-01-04 19:17:03 +01:00