akkoma/docs/docs/administration/CLI_tasks/database.md

185 lines
5.0 KiB
Markdown
Raw Normal View History

# Database maintenance tasks
{! administration/CLI_tasks/general_cli_task_info.include !}
!!! danger
These mix tasks can take a long time to complete. Many of them were written to address specific database issues that happened because of bugs in migrations or other specific scenarios. Do not run these tasks "just in case" if everything is fine your instance.
## Replace embedded objects with their references
Replaces embedded objects with references to them in the `objects` table. Only needs to be ran once if the instance was created before Pleroma 1.0.5. The reason why this is not a migration is because it could significantly increase the database size after being ran, however after this `VACUUM FULL` will be able to reclaim about 20% (really depends on what is in the database, your mileage may vary) of the db size before the migration.
=== "OTP"
```sh
./bin/pleroma_ctl database remove_embedded_objects [option ...]
```
=== "From Source"
```sh
mix pleroma.database remove_embedded_objects [option ...]
```
### Options
- `--vacuum` - run `VACUUM FULL` after the embedded objects are replaced with their references
## Prune old remote posts from the database
Prune Objects --keep-threads option (#350) This adds an option to the prune_objects mix task. The original way deleted all non-local public posts older than a certain time frame. Here we add a different query which you can call using the option --keep-threads. We query from the activities table all context id's where 1. the newest activity with this context is still old 2. none of the activities with this context is is local 3. none of the activities with this context is bookmarked and delete all objects with these contexts. The idea is that posts with local activities (posts, replies, likes, repeats...) may be interesting to keep. Besides that, a post lives in a certain context (the thread), so we keep the whole thread as well. Caveats: * ~~Quotes have a different context. Therefore, when someone quotes a post, it's possible the quoted post will still be deleted.~~ fixed in https://akkoma.dev/AkkomaGang/akkoma/pulls/379 * Although undocumented (in docs/docs/administration/CLI_tasks/database.md/#prune-old-remote-posts-from-the-database), the 'normal' delete action still kept old remote non-public posts. I added an option to keep this behaviour, but this also means that you now have to explicitly provide that option. **This could be considered a breaking change!** * ~~Note that this removes from the objects table, but not from the activities.~~ See https://akkoma.dev/AkkomaGang/akkoma/pulls/427 for that. Some statistics from explain analyse: (cost=1402845.92..1933782.00 rows=3810907 width=62) (actual time=2562455.486..2562455.495 rows=0 loops=1) Planning Time: 505.327 ms Trigger for constraint chat_message_references_object_id_fkey: time=651939.797 calls=921740 Trigger for constraint deliveries_object_id_fkey: time=52036.009 calls=921740 Trigger for constraint hashtags_objects_object_id_fkey: time=20665.778 calls=921740 Execution Time: 3287933.902 ms *** **TODO** 1. [x] **Question:** Is it OK to keep it like this in regard to quote posts? If not (ie post quoted by local users should also be kept), should we give quotes the same context as the post they are quoting? (If we don't want to give them the same context, I'll have to see how/if I can do it without being too costly) * See https://akkoma.dev/AkkomaGang/akkoma/pulls/379 2. [x] **Question:** the "original" query only deletes public posts (this is undocumented, but you can check the code). This new one doesn't care for scope. From the docs I get that the idea is that posts can be refetched when needed. But I have from a trusted source that Pleroma can't refetch non-public posts. I assume that's the reason why they are kept here. I see different options to deal with this 1. ~~We keep it as currently implemented and just don't care about scope with this option~~ 2. ~~We add logic to not delete non-public posts either (I'll have to see how costly that becomes)~~ 3. We add an extra --keep-non-public parameter. This is technically speaking breakage (you didn't have to provide a param before for this, now you do), but I'm inclined to not care much because it wasn't documented nor tested in the first place. 3. [x] See if we can do the query using Elixir 4. [x] Test on a bigger DB to see that we don't run into a timeout 5. [x] Add docs Co-authored-by: ilja <git@ilja.space> Reviewed-on: https://akkoma.dev/AkkomaGang/akkoma/pulls/350 Co-authored-by: ilja <akkoma.dev@ilja.space> Co-committed-by: ilja <akkoma.dev@ilja.space>
2023-01-09 22:15:41 +00:00
This will prune remote posts older than 90 days (configurable with [`config :pleroma, :instance, remote_post_retention_days`](../../configuration/cheatsheet.md#instance)) from the database. Pruned posts may be refetched in some cases.
!!! danger
The disk space will only be reclaimed after `VACUUM FULL`. You may run out of disk space during the execution of the task or vacuuming if you don't have about 1/3rds of the database size free.
=== "OTP"
```sh
./bin/pleroma_ctl database prune_objects [option ...]
```
=== "From Source"
```sh
mix pleroma.database prune_objects [option ...]
```
### Options
Prune Objects --keep-threads option (#350) This adds an option to the prune_objects mix task. The original way deleted all non-local public posts older than a certain time frame. Here we add a different query which you can call using the option --keep-threads. We query from the activities table all context id's where 1. the newest activity with this context is still old 2. none of the activities with this context is is local 3. none of the activities with this context is bookmarked and delete all objects with these contexts. The idea is that posts with local activities (posts, replies, likes, repeats...) may be interesting to keep. Besides that, a post lives in a certain context (the thread), so we keep the whole thread as well. Caveats: * ~~Quotes have a different context. Therefore, when someone quotes a post, it's possible the quoted post will still be deleted.~~ fixed in https://akkoma.dev/AkkomaGang/akkoma/pulls/379 * Although undocumented (in docs/docs/administration/CLI_tasks/database.md/#prune-old-remote-posts-from-the-database), the 'normal' delete action still kept old remote non-public posts. I added an option to keep this behaviour, but this also means that you now have to explicitly provide that option. **This could be considered a breaking change!** * ~~Note that this removes from the objects table, but not from the activities.~~ See https://akkoma.dev/AkkomaGang/akkoma/pulls/427 for that. Some statistics from explain analyse: (cost=1402845.92..1933782.00 rows=3810907 width=62) (actual time=2562455.486..2562455.495 rows=0 loops=1) Planning Time: 505.327 ms Trigger for constraint chat_message_references_object_id_fkey: time=651939.797 calls=921740 Trigger for constraint deliveries_object_id_fkey: time=52036.009 calls=921740 Trigger for constraint hashtags_objects_object_id_fkey: time=20665.778 calls=921740 Execution Time: 3287933.902 ms *** **TODO** 1. [x] **Question:** Is it OK to keep it like this in regard to quote posts? If not (ie post quoted by local users should also be kept), should we give quotes the same context as the post they are quoting? (If we don't want to give them the same context, I'll have to see how/if I can do it without being too costly) * See https://akkoma.dev/AkkomaGang/akkoma/pulls/379 2. [x] **Question:** the "original" query only deletes public posts (this is undocumented, but you can check the code). This new one doesn't care for scope. From the docs I get that the idea is that posts can be refetched when needed. But I have from a trusted source that Pleroma can't refetch non-public posts. I assume that's the reason why they are kept here. I see different options to deal with this 1. ~~We keep it as currently implemented and just don't care about scope with this option~~ 2. ~~We add logic to not delete non-public posts either (I'll have to see how costly that becomes)~~ 3. We add an extra --keep-non-public parameter. This is technically speaking breakage (you didn't have to provide a param before for this, now you do), but I'm inclined to not care much because it wasn't documented nor tested in the first place. 3. [x] See if we can do the query using Elixir 4. [x] Test on a bigger DB to see that we don't run into a timeout 5. [x] Add docs Co-authored-by: ilja <git@ilja.space> Reviewed-on: https://akkoma.dev/AkkomaGang/akkoma/pulls/350 Co-authored-by: ilja <akkoma.dev@ilja.space> Co-committed-by: ilja <akkoma.dev@ilja.space>
2023-01-09 22:15:41 +00:00
- `--keep-threads` - don't prune posts when they are part of a thread where at least one post has seen local interaction (e.g. one of the posts is a local post, or is favourited by a local user, or has been repeated by a local user...)
- `--keep-non-public` - keep non-public posts like DM's and followers-only, even if they are remote
- `--vacuum` - run `VACUUM FULL` after the objects are pruned
## Create a conversation for all existing DMs
Can be safely re-run
=== "OTP"
```sh
./bin/pleroma_ctl database bump_all_conversations
```
=== "From Source"
```sh
mix pleroma.database bump_all_conversations
```
## Remove duplicated items from following and update followers count for all users
=== "OTP"
```sh
./bin/pleroma_ctl database update_users_following_followers_counts
```
=== "From Source"
```sh
mix pleroma.database update_users_following_followers_counts
```
## Fix the pre-existing "likes" collections for all objects
=== "OTP"
```sh
./bin/pleroma_ctl database fix_likes_collections
```
=== "From Source"
```sh
mix pleroma.database fix_likes_collections
```
2020-05-29 15:17:06 +00:00
## Vacuum the database
### Analyze
Running an `analyze` vacuum job can improve performance by updating statistics used by the query planner. **It is safe to cancel this.**
=== "OTP"
```sh
./bin/pleroma_ctl database vacuum analyze
```
=== "From Source"
2020-05-29 15:17:06 +00:00
```sh
mix pleroma.database vacuum analyze
```
2020-05-29 15:17:06 +00:00
### Full
Running a `full` vacuum job rebuilds your entire database by reading all of the data and rewriting it into smaller
and more compact files with an optimized layout. This process will take a long time and use additional disk space as
it builds the files side-by-side the existing database files. It can make your database faster and use less disk space,
but should only be run if necessary. **It is safe to cancel this.**
=== "OTP"
2020-05-29 15:17:06 +00:00
```sh
./bin/pleroma_ctl database vacuum full
```
=== "From Source"
```sh
mix pleroma.database vacuum full
```
## Add expiration to all local statuses
=== "OTP"
```sh
./bin/pleroma_ctl database ensure_expiration
```
=== "From Source"
```sh
mix pleroma.database ensure_expiration
```
## Change Text Search Configuration
Change `default_text_search_config` for database and (if necessary) text_search_config used in index, then rebuild index (it may take time).
=== "OTP"
```sh
./bin/pleroma_ctl database set_text_search_config english
```
=== "From Source"
```sh
mix pleroma.database set_text_search_config english
```
See [PostgreSQL documentation](https://www.postgresql.org/docs/current/textsearch-configuration.html) and `docs/configuration/howto_search_cjk.md` for more detail.
## Pruning old activities
Over time, transient `Delete` activities and `Tombstone` objects
can accumulate in your database, inflating its size. This is not ideal.
There is a periodic task to prune these transient objects,
but on first run this may take a while on older instances to catch up
to the current day.
=== "OTP"
```sh
./bin/pleroma_ctl database prune_task
```
=== "From Source"
```sh
mix pleroma.database prune_task
Prune Objects --keep-threads option (#350) This adds an option to the prune_objects mix task. The original way deleted all non-local public posts older than a certain time frame. Here we add a different query which you can call using the option --keep-threads. We query from the activities table all context id's where 1. the newest activity with this context is still old 2. none of the activities with this context is is local 3. none of the activities with this context is bookmarked and delete all objects with these contexts. The idea is that posts with local activities (posts, replies, likes, repeats...) may be interesting to keep. Besides that, a post lives in a certain context (the thread), so we keep the whole thread as well. Caveats: * ~~Quotes have a different context. Therefore, when someone quotes a post, it's possible the quoted post will still be deleted.~~ fixed in https://akkoma.dev/AkkomaGang/akkoma/pulls/379 * Although undocumented (in docs/docs/administration/CLI_tasks/database.md/#prune-old-remote-posts-from-the-database), the 'normal' delete action still kept old remote non-public posts. I added an option to keep this behaviour, but this also means that you now have to explicitly provide that option. **This could be considered a breaking change!** * ~~Note that this removes from the objects table, but not from the activities.~~ See https://akkoma.dev/AkkomaGang/akkoma/pulls/427 for that. Some statistics from explain analyse: (cost=1402845.92..1933782.00 rows=3810907 width=62) (actual time=2562455.486..2562455.495 rows=0 loops=1) Planning Time: 505.327 ms Trigger for constraint chat_message_references_object_id_fkey: time=651939.797 calls=921740 Trigger for constraint deliveries_object_id_fkey: time=52036.009 calls=921740 Trigger for constraint hashtags_objects_object_id_fkey: time=20665.778 calls=921740 Execution Time: 3287933.902 ms *** **TODO** 1. [x] **Question:** Is it OK to keep it like this in regard to quote posts? If not (ie post quoted by local users should also be kept), should we give quotes the same context as the post they are quoting? (If we don't want to give them the same context, I'll have to see how/if I can do it without being too costly) * See https://akkoma.dev/AkkomaGang/akkoma/pulls/379 2. [x] **Question:** the "original" query only deletes public posts (this is undocumented, but you can check the code). This new one doesn't care for scope. From the docs I get that the idea is that posts can be refetched when needed. But I have from a trusted source that Pleroma can't refetch non-public posts. I assume that's the reason why they are kept here. I see different options to deal with this 1. ~~We keep it as currently implemented and just don't care about scope with this option~~ 2. ~~We add logic to not delete non-public posts either (I'll have to see how costly that becomes)~~ 3. We add an extra --keep-non-public parameter. This is technically speaking breakage (you didn't have to provide a param before for this, now you do), but I'm inclined to not care much because it wasn't documented nor tested in the first place. 3. [x] See if we can do the query using Elixir 4. [x] Test on a bigger DB to see that we don't run into a timeout 5. [x] Add docs Co-authored-by: ilja <git@ilja.space> Reviewed-on: https://akkoma.dev/AkkomaGang/akkoma/pulls/350 Co-authored-by: ilja <akkoma.dev@ilja.space> Co-committed-by: ilja <akkoma.dev@ilja.space>
2023-01-09 22:15:41 +00:00
```