Commit graph

70 commits

Author SHA1 Message Date
70cd5f91d8 dbprune/activites: prune array activities first
This query is less costly; if something goes wrong or gets aborted later
at least this part will arelady be done.
2024-05-31 17:16:40 +02:00
aeaebb566c dbprune: allow splitting array and single activity prunes
The former is typically just a few reports; it doesn't make sense to
rerun it over and over again in batched prunes or if a full prune OOMed.
2024-05-31 17:16:40 +02:00
5751637926 dbprune: use query! 2024-05-31 17:16:40 +02:00
24bab63cd8 dbprune: add more logs
Pruning can go on for a long time; give admins some insight into that
something is happening to make it less frustrating and to make it easier
which part of the process is stalled should this happen.

Again most of the changes are merely reindents;
review with whitespace changes hidden recommended.
2024-05-31 17:16:40 +02:00
1d4c212441 dbprune: shortcut array activity search
This brought down query costs from 7,953,740.90 to 47,600.97
2024-05-31 17:16:40 +02:00
225f87ad62 Also allow limiting the initial prune_object
May sometimes be helpful to get more predictable runtime
than just with an age-based limit.

The subquery for the non-keep-threads path is required
since delte_all does not directly accept limit().

Again most of the diff is just adjusting indentation, best
hide whitespace-only changes with git diff -w or similar.
2024-05-31 17:16:40 +02:00
e64f031167 Log number of deleted rows in prune_orphaned_activities
This gives feedback when to stop rerunning limited batches.

Most of the diff is just adjusting indentation; best reviewed
with whitespace-only changes hidden, e.g. `git diff -w`.
2024-05-31 17:16:40 +02:00
fa52093bac Add standalone prune_orphaned_activities CLI task
This part of pruning can be very expensive and bog down the whole
instance to an unusable sate for a long time. It can thus be desireable
to split it from prune_objects and run it on its own in smaller limited batches.

If the batches are smaller enough and spaced out a bit, it may even be possible
to avoid any downtime. If not, the limit can still help to at least make the
downtime duration somewhat more predictable.
2024-05-31 17:16:40 +02:00
3126d15ffc refactor: move prune_orphaned_activities into own function
No logic changes. Preparation for standalone orphan pruning.
2024-05-31 17:16:39 +02:00
98cb255d12 Support elixir1.15
Some checks failed
ci/woodpecker/push/build-amd64 Pipeline is pending
ci/woodpecker/push/build-arm64 Pipeline is pending
ci/woodpecker/push/docs Pipeline is pending
ci/woodpecker/push/test Pipeline is pending
ci/woodpecker/pr/test Pipeline failed
ci/woodpecker/pr/build-amd64 unknown status
ci/woodpecker/pr/build-arm64 unknown status
ci/woodpecker/pr/docs unknown status
OTP builds to 1.15

Changelog entry

Ensure policies are fully loaded

Fix :warn

use main branch for linkify

Fix warn in tests

Migrations for phoenix 1.17

Revert "Migrations for phoenix 1.17"

This reverts commit 6a3b2f15b7.

Oban upgrade

Add default empty whitelist

mix format

limit test to amd64

OTP 26 tests for 1.15

use OTP_VERSION tag

baka

just 1.15

Massive deps update

Update locale, deps

Mix format

shell????

multiline???

?

max cases 1

use assert_recieve

don't put_env in async tests

don't async conn/fs tests

mix format

FIx some uploader issues

Fix tests
2023-08-03 17:44:09 +01:00
ilja
f49e9e6d4c Clean up bookmarks after prune_objects
Some checks are pending
ci/woodpecker/pr/woodpecker Pipeline is pending
When doing prune_objects, it's possible that bookmarked objects are deleted.
This gave problems when fetching the bookmark TL.
Here we clean up the bookmarks during pruning in the case were it's possible that bookmarked objects are deleted.
2023-05-21 13:02:28 +02:00
ilja
57eef6d764 prune_objects can prune orphaned activities who reference an array of objects
E.g. Flag activities have an array of objects

We prune the activity when NONE of the objects can be found

Note that the cost of finding and deleting these is ~4x higher than finding and deleting the non-array ones

Only string:
Delete on activities  (cost=506573.48..506580.38 rows=0 width=0)

Only Array:
Delete on activities  (cost=3570359.68..4276365.34 rows=0 width=0)

(They are still executed separately, so the total cost is the sum of the two)
2023-02-26 14:41:50 +01:00
ilja
a7ec6e039c prune_objects can prune orphaned activities
We add an option to also prune remote activities who don't have existing objects any more they reference.
Rn, we only check for activities who only reference one object, not an array or embeded object.
2023-02-26 14:41:50 +01:00
7695010268 Prune Objects --keep-threads option (#350)
Some checks are pending
ci/woodpecker/push/woodpecker Pipeline is pending
This adds an option to the prune_objects mix task.
The original way deleted all non-local public posts older than a certain time frame.
Here we add a different query which you can call using the option --keep-threads.

We query from the activities table all context id's where
    1. the newest activity with this context is still old
    2. none of the activities with this context is is local
    3. none of the activities with this context is bookmarked
and delete all objects with these contexts.

The idea is that posts with local activities (posts, replies, likes, repeats...) may be interesting to keep.
Besides that, a post lives in a certain context (the thread), so we keep the whole thread as well.

Caveats:
* ~~Quotes have a different context. Therefore, when someone quotes a post, it's possible the quoted post will still be deleted.~~ fixed in #379
* Although undocumented (in docs/docs/administration/CLI_tasks/database.md/#prune-old-remote-posts-from-the-database), the 'normal' delete action still kept old remote non-public posts. I added an option to keep this behaviour, but this also means that you now have to explicitly provide that option. **This could be considered a breaking change!**
* ~~Note that this removes from the objects table, but not from the activities.~~ See #427 for that.

Some statistics from explain analyse:
(cost=1402845.92..1933782.00 rows=3810907 width=62) (actual time=2562455.486..2562455.495 rows=0 loops=1)
 Planning Time: 505.327 ms
 Trigger for constraint chat_message_references_object_id_fkey: time=651939.797 calls=921740
 Trigger for constraint deliveries_object_id_fkey: time=52036.009 calls=921740
 Trigger for constraint hashtags_objects_object_id_fkey: time=20665.778 calls=921740
 Execution Time: 3287933.902 ms

***
**TODO**
1. [x] **Question:** Is it OK to keep it like this in regard to quote posts? If not (ie post quoted by local users should also be kept), should we give quotes the same context as the post they are quoting? (If we don't want to give them the same context, I'll have to see how/if I can do it without being too costly)
    * See #379
2. [x] **Question:** the "original" query only deletes public posts (this is undocumented, but you can check the code). This new one doesn't care for scope. From the docs I get that the idea is that posts can be refetched when needed. But I have from a trusted source that Pleroma can't refetch non-public posts. I assume that's the reason why they are kept here. I see different options to deal with this
    1. ~~We keep it as currently implemented and just don't care about scope with this option~~
    2. ~~We add logic to not delete non-public posts either (I'll have to see how costly that becomes)~~
    3. We add an extra --keep-non-public parameter. This is technically speaking breakage (you didn't have to provide a param before for this, now you do), but I'm inclined to not care much because it wasn't documented nor tested in the first place.
3. [x] See if we can do the query using Elixir
4. [x] Test on a bigger DB to see that we don't run into a timeout
5. [x] Add docs

Co-authored-by: ilja <git@ilja.space>
Reviewed-on: #350
Co-authored-by: ilja <akkoma.dev@ilja.space>
Co-committed-by: ilja <akkoma.dev@ilja.space>
2023-01-09 22:15:41 +00:00
07a48b9293 giant massive dep upgrade and dialyxir-found error emporium (#371)
Some checks are pending
ci/woodpecker/push/woodpecker Pipeline is pending
Co-authored-by: FloatingGhost <hannah@coffee-and-dreams.uk>
Reviewed-on: #371
2022-12-14 12:38:48 +00:00
db60640c5b Fixing up deletes a bit (#327)
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
Co-authored-by: FloatingGhost <hannah@coffee-and-dreams.uk>
Reviewed-on: #327
2022-12-01 15:00:53 +00:00
d2a185c013 Documentation updates for stable release (#73)
Some checks are pending
ci/woodpecker/push/docs Pipeline is pending
ci/woodpecker/push/release Pipeline is pending
ci/woodpecker/push/test Pipeline is pending
Reviewed-on: #73
2022-07-15 12:27:16 +00:00
someone
4b940e441a mix pleroma.database set_text_search_config now runs concurrently and infinitely 2021-08-15 13:49:12 -04:00
bc51dea425 Update lib/mix/tasks/pleroma/database.ex 2021-06-07 20:02:28 +00:00
faried nawaz
5be9d13981
a better query to delete from hashtags
old query:

Delete on hashtags  (cost=5089.81..5521.63 rows=6160 width=18)
   ->  Hash Semi Join  (cost=5089.81..5521.63 rows=6160 width=18)
         Hash Cond: (hashtags.id = ht.id)
         ->  Seq Scan on hashtags  (cost=0.00..317.28 rows=17528 width=14)
         ->  Hash  (cost=5012.81..5012.81 rows=6160 width=20)
               ->  Merge Anti Join  (cost=0.70..5012.81 rows=6160 width=20)
                     Merge Cond: (ht.id = hto.hashtag_id)
                     ->  Index Scan using hashtags_pkey on hashtags ht  (cost=0.29..610.53 rows=17528 width=14)
                     ->  Index Scan using hashtags_objects_pkey on hashtags_objects hto  (cost=0.42..3506.48 rows=68158 width=14)

new query:

Delete on hashtags ht  (cost=0.70..5012.81 rows=6160 width=12)
   ->  Merge Anti Join  (cost=0.70..5012.81 rows=6160 width=12)
         Merge Cond: (ht.id = hto.hashtag_id)
         ->  Index Scan using hashtags_pkey on hashtags ht  (cost=0.29..610.53 rows=17528 width=14)
         ->  Index Scan using hashtags_objects_pkey on hashtags_objects hto  (cost=0.42..3506.48 rows=68158 width=14)
2021-05-08 02:00:43 +05:00
faried nawaz
a0c9a2b4cc
mix prune_objects: remove unused hashtags after pruning remote objects 2021-05-08 02:00:42 +05:00
Ivan Tashkinov
40d4362261 [#3213] mix pleroma.database rollback tweaks. 2021-02-23 18:11:25 +03:00
Ivan Tashkinov
5992382cf8 Merge remote-tracking branch 'remotes/origin/develop' into feature/object-hashtags-rework
# Conflicts:
#	CHANGELOG.md
#	lib/mix/tasks/pleroma/database.ex
#	lib/pleroma/web/templates/feed/feed/_activity.rss.eex
2021-02-11 19:31:57 +03:00
Ivan Tashkinov
d1c6dd97aa [#3213] Partially addressed code review points.
migration rollback task changes, hashtags-related config handling tweaks, `hashtags.data` deletion (unused).
2021-02-07 22:24:12 +03:00
hyperion
8d4e0342e1 Update priv/repo/migrations/20190501125843_add_fts_index_to_objects.exs, priv/repo/optional_migrations/rum_indexing/20190510135645_add_fts_index_to_objects_two.exs files 2021-02-06 09:42:17 +00:00
Ivan Tashkinov
108e90b18e [#3213] Explicitly defined PKs in hashtags_objects and data_migration_failed_ids. Added "pleroma.database rollback" task to revert a single migration. 2021-01-31 22:03:59 +03:00
Ivan Tashkinov
e350898828 Merge remote-tracking branch 'remotes/origin/develop' into feature/object-hashtags-rework 2021-01-13 22:11:16 +03:00
Ivan Tashkinov
3e4d84729a [#3213] Prototype of data migrations functionality / HashtagsTableMigrator. 2021-01-13 22:07:38 +03:00
Haelwenn (lanodan) Monnier
c4439c630f
Bump Copyright to 2021
grep -rl '# Copyright © .* Pleroma' * | xargs sed -i 's;Copyright © .* Pleroma .*;Copyright © 2017-2021 Pleroma Authors <https://pleroma.social/>;'
2021-01-13 07:49:50 +01:00
Ivan Tashkinov
8c972de045 [#3213] transfer_hashtags mix task refactoring. 2021-01-10 11:44:39 +03:00
Ivan Tashkinov
0d521022fe [#3213] Removed PK from hashtags_objects table. Improved hashtags_transfer mix task (logging of failed ids). 2021-01-07 12:20:29 +03:00
Ivan Tashkinov
367f0c31c3 [#3213] Added query options support for Repo.chunk_stream/4.
Used infinite timeout in transfer_hashtags select query.
2020-12-31 09:36:26 +03:00
Ivan Tashkinov
a25c1e8ec0 [#3213] Improved database.transfer_hashtags mix task: proper rollback, speedup. 2020-12-30 14:35:19 +03:00
Haelwenn
3966add048 Revert "Merge branch 'features/hashtag-column' into 'develop'"
This reverts merge request !2824
2020-12-28 12:02:16 +00:00
Haelwenn (lanodan) Monnier
d0c2479710
pleroma.database fill_old_hashtags: Add month_limit argument 2020-12-28 11:05:25 +01:00
Ivan Tashkinov
cbb19d0e18 [#3213] Hashtag-filtering functions in ActivityPub. Mix task for migrating hashtags to hashtags table. 2020-12-26 22:20:55 +03:00
Haelwenn (lanodan) Monnier
acb03d591b
Insert text representation of hashtags into object["hashtags"]
Includes a new mix task: pleroma.database fill_old_hashtags
2020-12-22 05:15:34 +01:00
Alexander Strizhakov
cebe3c7def Fix for dropping posts/notifs in WS when mix task is executed
- start oban in mix tasks with empty queues, plugins and crontab
- fix for update_users_following_followers_counts
- fix for removed logo.png
- typo in resend confirmation emails mix task docs
- fix for uploads mix task (start Majic.Pool)
- fix for creating user mix task (start :fast_html app)
2020-12-14 11:02:32 -06:00
Maksim Pechnikov
599f8bb152 RepoStreamer.chunk_stream -> Repo.chunk_stream 2020-09-16 09:47:18 +03:00
Alexander Strizhakov
15aece7238 remove validate_expires_at from enqueue method 2020-09-10 21:52:31 +03:00
Alexander Strizhakov
9bf1065a06 schedule activity expiration in Oban 2020-09-10 21:50:40 +03:00
Alexander Strizhakov
29a7bcd5bb
reverting pinned posts in filtering 2020-08-12 20:01:21 +03:00
Alexander Strizhakov
3ab83f837e
don't load pinned activities in due_expirations 2020-08-12 19:46:47 +03:00
Alexander Strizhakov
eec1ba232c
don't expire pinned posts 2020-08-12 15:15:17 +03:00
Mark Felder
724ed354f2 Ensure only Note objects are set to expire 2020-08-11 11:28:22 -05:00
Mark Felder
cf4c97242b Ensure we only expire Create activities with the Mix task 2020-08-08 12:40:52 -05:00
Egor Kislitsyn
e5557bf8ba
Add mix task to add expiration to all local statuses 2020-08-08 16:29:40 +04:00
Mark Felder
92fba24c74 Alpha sort 2020-05-27 17:17:06 -05:00
Mark Felder
30f96b19c1 Abstract out the database maintenance. I'd like to use this from AdminFE too. 2020-05-27 16:40:51 -05:00
Mark Felder
0d57e06626 Make clearer that this is time and resource consuming 2020-05-27 16:31:37 -05:00