[feat] Prune batch size #653

Closed
opened 2023-10-21 00:56:59 +00:00 by norm · 1 comment
Contributor

The idea

Have a way of configuring how many posts are deleted when running mix pleroma.database prune_objects instead of trying to query and delete every post.

Possibly by adding a LIMIT to one of the queries like this:

diff --git a/lib/mix/tasks/pleroma/database.ex b/lib/mix/tasks/pleroma/database.ex
index 09d2a4072..71fd2ef58 100644
--- a/lib/mix/tasks/pleroma/database.ex
+++ b/lib/mix/tasks/pleroma/database.ex
@@ -143,6 +143,7 @@ def run(["prune_objects" | args]) do
         |> having([a], max(a.updated_at) < ^time_deadline)
         |> having([a], not fragment("bool_or(?)", a.local))
         |> having([_, b], fragment("max(?::text) is null", b.id))
+        |> limit(500)
         |> select([a], fragment("? ->> 'context'::text", a.data))
 
       Pleroma.Object

Likely have this configurable via a command line option or in prod.secret.exs like the post retention limit.

The reasoning

When running with --prune-orphaned-activities, it can take a long time to prune old remote activities from the database. It also increases the load on the server, causing the instance to run slower and experience delays in receiving remote posts. See: https://meta.akkoma.dev/t/i-am-having-trouble-pruning-my-database/613 for an example.

Have you searched for this feature request?

  • I have double-checked and have not found this feature request mentioned anywhere.
  • This feature is related to the Akkoma backend specifically, and not pleroma-fe.
### The idea Have a way of configuring how many posts are deleted when running `mix pleroma.database prune_objects` instead of trying to query and delete every post. Possibly by adding a `LIMIT` to one of the queries like [this](https://akko.wtf/notice/AayYoe72qMJoQrd1o8): ```diff diff --git a/lib/mix/tasks/pleroma/database.ex b/lib/mix/tasks/pleroma/database.ex index 09d2a4072..71fd2ef58 100644 --- a/lib/mix/tasks/pleroma/database.ex +++ b/lib/mix/tasks/pleroma/database.ex @@ -143,6 +143,7 @@ def run(["prune_objects" | args]) do |> having([a], max(a.updated_at) < ^time_deadline) |> having([a], not fragment("bool_or(?)", a.local)) |> having([_, b], fragment("max(?::text) is null", b.id)) + |> limit(500) |> select([a], fragment("? ->> 'context'::text", a.data)) Pleroma.Object ``` Likely have this configurable via a command line option or in `prod.secret.exs` like the post retention limit. ### The reasoning When running with `--prune-orphaned-activities`, it can take a long time to prune old remote activities from the database. It also increases the load on the server, causing the instance to run slower and experience delays in receiving remote posts. See: https://meta.akkoma.dev/t/i-am-having-trouble-pruning-my-database/613 for an example. ### Have you searched for this feature request? - [x] I have double-checked and have not found this feature request mentioned anywhere. - [x] This feature is related to the Akkoma backend specifically, and not pleroma-fe.
norm added the
feature request
label 2023-10-21 00:56:59 +00:00
Author
Contributor

Found that there's two other queries that could have LIMIT added onto them as well. Here's a revised patch that uses a 50k batch size:

diff --git a/lib/mix/tasks/pleroma/database.ex b/lib/mix/tasks/pleroma/database.ex
index 09d2a4072..03a9db851 100644
--- a/lib/mix/tasks/pleroma/database.ex
+++ b/lib/mix/tasks/pleroma/database.ex
@@ -143,6 +143,7 @@ def run(["prune_objects" | args]) do
         |> having([a], max(a.updated_at) < ^time_deadline)
         |> having([a], not fragment("bool_or(?)", a.local))
         |> having([_, b], fragment("max(?::text) is null", b.id))
+        |> limit(50000)
         |> select([a], fragment("? ->> 'context'::text", a.data))
 
       Pleroma.Object
@@ -200,6 +201,7 @@ def run(["prune_objects" | args]) do
         and o.id is null
         and a2.id is null
         and u.id is null
+        limit 50000
       )
       """
       |> Repo.query([], timeout: :infinity)
@@ -217,6 +219,7 @@ def run(["prune_objects" | args]) do
         having max(o.data ->> 'id') is null
         and max(a2.data ->> 'id') is null
         and max(u.ap_id) is null
+        limit 50000
       )
       """
       |> Repo.query([], timeout: :infinity)
Found that there's two other queries that could have LIMIT added onto them as well. Here's a revised patch that uses a 50k batch size: ```diff diff --git a/lib/mix/tasks/pleroma/database.ex b/lib/mix/tasks/pleroma/database.ex index 09d2a4072..03a9db851 100644 --- a/lib/mix/tasks/pleroma/database.ex +++ b/lib/mix/tasks/pleroma/database.ex @@ -143,6 +143,7 @@ def run(["prune_objects" | args]) do |> having([a], max(a.updated_at) < ^time_deadline) |> having([a], not fragment("bool_or(?)", a.local)) |> having([_, b], fragment("max(?::text) is null", b.id)) + |> limit(50000) |> select([a], fragment("? ->> 'context'::text", a.data)) Pleroma.Object @@ -200,6 +201,7 @@ def run(["prune_objects" | args]) do and o.id is null and a2.id is null and u.id is null + limit 50000 ) """ |> Repo.query([], timeout: :infinity) @@ -217,6 +219,7 @@ def run(["prune_objects" | args]) do having max(o.data ->> 'id') is null and max(a2.data ->> 'id') is null and max(u.ap_id) is null + limit 50000 ) """ |> Repo.query([], timeout: :infinity) ```
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: AkkomaGang/akkoma#653
No description provided.