Add limit CLI flags to prune jobs #655

Open
Oneric wants to merge 5 commits from Oneric/akkoma:prune-batch into develop
Member

The prune tasks can incur heavy databse load and take a long time, grinding the instance to an halt for the entire duration.

To still free up some space, but lessen or ideally avoid downtime limits on the prune queriess, especially the orphaned activities have proven useful in recent tests. With these patches, after the initial prune_objects (without --prune-orphaned-activities), a script similar to the following could be run to free up space while keeping the instance reasonably responsive (parameters are just examples; need adjusting for specific instance):

#!/bin/sh

YIELD=120
BATCH_SIZE=50000
BATCH_MAX_TIME=70

while : ; do
    start="$(date +%s)"
    out=$( \
        mix pleroma.database prune_orphaned_activities --limit "$BATCH_SIZE" \
        | grep -E ' \[info\] Deleted ' \
        | tail -n 1 \
    )"
    end="$(date +%s)"
    duration="$((end - start))"
    echo "$out"

    if echo "$out" | grep -qE '\[info\] Deleted 0 rows$' ; then
        echo "Nothing more to delete."
        break
    fi
    if [ "$duration" -gt "$BATCH_MAX_TIME" ] ; then
        echo "Completion of single batch takes too long ($duration > $BATCH_MAX_TIME)" >&2
        echo "Abort further batches to not bog down the instance!" >&2
        exit 1
    fi
    sleep "$YIELD"
done

Resolves #653 ; cc @norm


Best reviewed commit by commit and as noted in the commit messages, many of the diff lines are just indentation adjustments and for review its probably a good idea to hide whitespace-only changes.

The prune tasks can incur heavy databse load and take a long time, grinding the instance to an halt for the entire duration. To still free up some space, but lessen or ideally avoid downtime limits on the prune queriess, especially the orphaned activities have proven useful in recent tests. With these patches, after the initial `prune_objects` *(without `--prune-orphaned-activities`)*, a script similar to the following could be run to free up space while keeping the instance reasonably responsive *(parameters are just examples; need adjusting for specific instance)*: ```sh #!/bin/sh YIELD=120 BATCH_SIZE=50000 BATCH_MAX_TIME=70 while : ; do start="$(date +%s)" out=$( \ mix pleroma.database prune_orphaned_activities --limit "$BATCH_SIZE" \ | grep -E ' \[info\] Deleted ' \ | tail -n 1 \ )" end="$(date +%s)" duration="$((end - start))" echo "$out" if echo "$out" | grep -qE '\[info\] Deleted 0 rows$' ; then echo "Nothing more to delete." break fi if [ "$duration" -gt "$BATCH_MAX_TIME" ] ; then echo "Completion of single batch takes too long ($duration > $BATCH_MAX_TIME)" >&2 echo "Abort further batches to not bog down the instance!" >&2 exit 1 fi sleep "$YIELD" done ``` Resolves #653 ; cc @norm --- Best reviewed commit by commit and as noted in the commit messages, many of the diff lines are just indentation adjustments and for review its probably a good idea to hide whitespace-only changes.
smitten reviewed 2023-12-23 22:07:23 +00:00
@ -56,0 +79,4 @@
### Options
- `--limit n` - Only delete up to `n` activities in each query. Running this task in limited batches can help maintain the instances responsiveness while still freeing up some space.
First-time contributor

I'm a little confused about if there's a difference in the behavior between this and prune_objects.

"in each query" I would understand as limiting the database lock by having smaller limit delete operations.

For prune_objects it says "limits how many remote objects get pruned initially". What does initially mean here?

I'm a little confused about if there's a difference in the behavior between this and prune_objects. "in each query" I would understand as limiting the database lock by having smaller limit delete operations. For prune_objects it says "limits how many remote objects get pruned initially". What does initially mean here?
Author
Member

"in each query" I would understand as limiting the database lock by having smaller limit delete operations.

The task executes multiple DELETE queries on the database, each of these queries will have the given limit applied. Currently it executes two queries, so running the task once wiht --limit 100 will delete at most 200 rows.
It would be possible to limit the overall deleted rows to at most exactly the given amount, but this gives preferential treatment to the first queries and since the purpose is just to limit the load and allow breaks inbetween, I figured this is not needed. But if there’s a reason to, this could be changed.

For prune_objects it says "limits how many remote objects get pruned initially". What does initially mean here?

prune_objects first deletes remote posts, then (optionally, if such flags were passed) it will run more cleanup jobs. Only the initial prune is affected by the limit the cleanup not. Reason being, that except for prune_orphaned_activities those cleanup jobs are comparatively cheap anyway.
And prune_orphaned_activities now has its own task. So if you want to cleanup some space, while not continuously hogging the db, you can first (repeatedly) run prune_objects --limit n without --prune-orphaned-activities, but all other desired cleanups in the last run. Then afterwards, repeatedly run the standalone prune_orphaned_activities --limit n as long as a single run finishes fast enough.

I pushed a new rebased version with tweaked documentation (and a typo in a commit message was fixed). Can you take a look if it’s clearer now?

> "in each query" I would understand as limiting the database lock by having smaller limit delete operations. The task executes multiple DELETE queries on the database, each of these queries will have the given limit applied. Currently it executes two queries, so running the task once wiht `--limit 100` will delete at most 200 rows. It would be possible to limit the overall deleted rows to at most exactly the given amount, but this gives preferential treatment to the first queries and since the purpose is just to limit the load and allow breaks inbetween, I figured this is not needed. But if there’s a reason to, this could be changed. > For prune_objects it says "limits how many remote objects get pruned initially". What does initially mean here? `prune_objects` first deletes remote posts, then (optionally, if such flags were passed) it will run more cleanup jobs. Only the initial prune is affected by the limit the cleanup not. Reason being, that except for `prune_orphaned_activities` those cleanup jobs are comparatively cheap anyway. And `prune_orphaned_activities` now has its own task. So if you want to cleanup some space, while not continuously hogging the db, you can first (repeatedly) run `prune_objects --limit n` *without* `--prune-orphaned-activities`, but all other desired cleanups in the last run. Then afterwards, repeatedly run the standalone `prune_orphaned_activities --limit n` as long as a single run finishes fast enough. I pushed a new rebased version with tweaked documentation (and a typo in a commit message was fixed). Can you take a look if it’s clearer now?
First-time contributor

I see what you mean, and the docs updates are clearer thanks! The steps you describe is how I was running it, I did a few prune_objects and then did a few prune_orphaned_activities.

I see what you mean, and the docs updates are clearer thanks! The steps you describe is how I was running it, I did a few `prune_objects` and then did a few `prune_orphaned_activities`.
Oneric marked this conversation as resolved
First-time contributor

This seems to be working for me! Usually pruning makes the RAM fills up on my small VPS and the instance crashes but this is running well.

This seems to be working for me! Usually pruning makes the RAM fills up on my small VPS and the instance crashes but this is running well.
Oneric force-pushed prune-batch from 80ba73839c to 3bc63afbe0 2023-12-24 23:18:28 +00:00 Compare
Oneric force-pushed prune-batch from 3bc63afbe0 to 732bc96493 2024-01-31 16:45:44 +00:00 Compare
Oneric force-pushed prune-batch from 732bc96493 to 800acfa81d 2024-02-10 01:54:09 +00:00 Compare
Author
Member

Rebased this with two updates:

  1. The logger output now shows up in stdout for me, so duplicating it with IO.puts is no longer needed iand has been dropped.
    This change also slightly confused the script from the comments; I updated it to work with the new output and made it a bit more robust wrt ordering.
  2. Standalone prune_orphaned_activities is now used in one of the orphan-pruning tests. Since both modes use the same function and the only difference is the argument parser, I figured it wasn’t worth to duplicate the test setup and instead switched one of the two orphan tests to the standalone task.

Also just because, here’s an alternative version of the script which tries to scale batch size down between some max and min value instead of immediately ceasing the prune. May be more convenient in some cases, though too low min values prob don’t make much sense (and as before time and batch sizes need tweaking for real instances).

#!/bin/sh

YIELD=120
BATCH_SIZE_MAX=64000
BATCH_SIZE_MIN=8000
BATCH_MAX_TIME=70

set -eu

# params: cur_batch_time cur_batch_size
# returns: new_batch_size (0 if constraints cannot be met; otherwise valid)
lower_batch_size() {
    # Intentional rounding imprecision to facillitate going _below_ max time
    div="$(( ($1 + BATCH_MAX_TIME - 1) / BATCH_MAX_TIME ))"
    newbatch="$(($2 / div))"
    if [ "$newbatch" -lt "$BATCH_SIZE_MIN" ] ; then
        newbatch=0
    fi
    echo "$newbatch"
}

BATCH_SIZE="$BATCH_SIZE_MAX"
echo "Starting with batch size $BATCH_SIZE"
while : ; do
    start="$(date +%s)"
    out="$( \
        mix pleroma.database prune_orphaned_activities --limit "$BATCH_SIZE" \
        | grep -E ' \[info\] Deleted ' \
    )"
    end="$(date +%s)"
	duration="$((end - start))"
	echo "$out"

	if echo "$out" | tail -n 1 | grep -qE 'Deleted 0 rows$' ; then
		echo "Nothing more to delete."
		break
	fi
	if [ "$duration" -gt "$BATCH_MAX_TIME" ] ; then
		echo "Completion of single batch takes too long ($duration > $BATCH_MAX_TIME)" >&2
        BATCH_SIZE="$(lower_batch_size "$duration" "$BATCH_SIZE")"
        if [ "$BATCH_SIZE" -gt 0 ] ; then
            echo "Try lowering batch size to $BATCH_SIZE..."
        else
    		echo "Cannot lower batch size further. Abort to not bog down instance!" >&2
		    exit 1
        fi
	fi
	sleep "$YIELD"
done
Rebased this with two updates: 1. The logger output now shows up in stdout for me, so duplicating it with `IO.puts` is no longer needed iand has been dropped. This change also slightly confused the script from the comments; I updated it to work with the new output and made it a bit more robust wrt ordering. 2. Standalone `prune_orphaned_activities` is now used in one of the orphan-pruning tests. Since both modes use the same function and the only difference is the argument parser, I figured it wasn’t worth to duplicate the test setup and instead switched one of the two orphan tests to the standalone task. Also just because, here’s an alternative version of the script which tries to scale batch size down between some max and min value instead of immediately ceasing the prune. May be more convenient in some cases, though too low min values prob don’t make much sense (and as before time and batch sizes need tweaking for real instances). ```sh #!/bin/sh YIELD=120 BATCH_SIZE_MAX=64000 BATCH_SIZE_MIN=8000 BATCH_MAX_TIME=70 set -eu # params: cur_batch_time cur_batch_size # returns: new_batch_size (0 if constraints cannot be met; otherwise valid) lower_batch_size() { # Intentional rounding imprecision to facillitate going _below_ max time div="$(( ($1 + BATCH_MAX_TIME - 1) / BATCH_MAX_TIME ))" newbatch="$(($2 / div))" if [ "$newbatch" -lt "$BATCH_SIZE_MIN" ] ; then newbatch=0 fi echo "$newbatch" } BATCH_SIZE="$BATCH_SIZE_MAX" echo "Starting with batch size $BATCH_SIZE" while : ; do start="$(date +%s)" out="$( \ mix pleroma.database prune_orphaned_activities --limit "$BATCH_SIZE" \ | grep -E ' \[info\] Deleted ' \ )" end="$(date +%s)" duration="$((end - start))" echo "$out" if echo "$out" | tail -n 1 | grep -qE 'Deleted 0 rows$' ; then echo "Nothing more to delete." break fi if [ "$duration" -gt "$BATCH_MAX_TIME" ] ; then echo "Completion of single batch takes too long ($duration > $BATCH_MAX_TIME)" >&2 BATCH_SIZE="$(lower_batch_size "$duration" "$BATCH_SIZE")" if [ "$BATCH_SIZE" -gt 0 ] ; then echo "Try lowering batch size to $BATCH_SIZE..." else echo "Cannot lower batch size further. Abort to not bog down instance!" >&2 exit 1 fi fi sleep "$YIELD" done ```
Oneric force-pushed prune-batch from afa01cb8dd to 790b552030 2024-02-19 18:36:12 +00:00 Compare
Some checks are pending
ci/woodpecker/pr/build-amd64 Pipeline is pending
ci/woodpecker/pr/build-arm64 Pipeline is pending
ci/woodpecker/pr/docs Pipeline is pending
ci/woodpecker/pr/lint Pipeline is pending
ci/woodpecker/pr/test Pipeline is pending
This pull request can be merged automatically.
This branch is out-of-date with the base branch
You are not authorized to merge this pull request.
Sign in to join this conversation.
No description provided.