Commit graph

681 commits

Author SHA1 Message Date
522221f7fb Mix format 2023-04-14 17:56:34 +01:00
2a8c1f4192 Add extra diagnostic tasks in 2023-03-29 14:11:00 +01:00
ilja
57eef6d764 prune_objects can prune orphaned activities who reference an array of objects
E.g. Flag activities have an array of objects

We prune the activity when NONE of the objects can be found

Note that the cost of finding and deleting these is ~4x higher than finding and deleting the non-array ones

Only string:
Delete on activities  (cost=506573.48..506580.38 rows=0 width=0)

Only Array:
Delete on activities  (cost=3570359.68..4276365.34 rows=0 width=0)

(They are still executed separately, so the total cost is the sum of the two)
2023-02-26 14:41:50 +01:00
ilja
a7ec6e039c prune_objects can prune orphaned activities
We add an option to also prune remote activities who don't have existing objects any more they reference.
Rn, we only check for activities who only reference one object, not an array or embeded object.
2023-02-26 14:41:50 +01:00
7695010268 Prune Objects --keep-threads option ()
This adds an option to the prune_objects mix task.
The original way deleted all non-local public posts older than a certain time frame.
Here we add a different query which you can call using the option --keep-threads.

We query from the activities table all context id's where
    1. the newest activity with this context is still old
    2. none of the activities with this context is is local
    3. none of the activities with this context is bookmarked
and delete all objects with these contexts.

The idea is that posts with local activities (posts, replies, likes, repeats...) may be interesting to keep.
Besides that, a post lives in a certain context (the thread), so we keep the whole thread as well.

Caveats:
* ~~Quotes have a different context. Therefore, when someone quotes a post, it's possible the quoted post will still be deleted.~~ fixed in 
* Although undocumented (in docs/docs/administration/CLI_tasks/database.md/#prune-old-remote-posts-from-the-database), the 'normal' delete action still kept old remote non-public posts. I added an option to keep this behaviour, but this also means that you now have to explicitly provide that option. **This could be considered a breaking change!**
* ~~Note that this removes from the objects table, but not from the activities.~~ See  for that.

Some statistics from explain analyse:
(cost=1402845.92..1933782.00 rows=3810907 width=62) (actual time=2562455.486..2562455.495 rows=0 loops=1)
 Planning Time: 505.327 ms
 Trigger for constraint chat_message_references_object_id_fkey: time=651939.797 calls=921740
 Trigger for constraint deliveries_object_id_fkey: time=52036.009 calls=921740
 Trigger for constraint hashtags_objects_object_id_fkey: time=20665.778 calls=921740
 Execution Time: 3287933.902 ms

***
**TODO**
1. [x] **Question:** Is it OK to keep it like this in regard to quote posts? If not (ie post quoted by local users should also be kept), should we give quotes the same context as the post they are quoting? (If we don't want to give them the same context, I'll have to see how/if I can do it without being too costly)
    * See 
2. [x] **Question:** the "original" query only deletes public posts (this is undocumented, but you can check the code). This new one doesn't care for scope. From the docs I get that the idea is that posts can be refetched when needed. But I have from a trusted source that Pleroma can't refetch non-public posts. I assume that's the reason why they are kept here. I see different options to deal with this
    1. ~~We keep it as currently implemented and just don't care about scope with this option~~
    2. ~~We add logic to not delete non-public posts either (I'll have to see how costly that becomes)~~
    3. We add an extra --keep-non-public parameter. This is technically speaking breakage (you didn't have to provide a param before for this, now you do), but I'm inclined to not care much because it wasn't documented nor tested in the first place.
3. [x] See if we can do the query using Elixir
4. [x] Test on a bigger DB to see that we don't run into a timeout
5. [x] Add docs

Co-authored-by: ilja <git@ilja.space>
Reviewed-on: 
Co-authored-by: ilja <akkoma.dev@ilja.space>
Co-committed-by: ilja <akkoma.dev@ilja.space>
2023-01-09 22:15:41 +00:00
9be6caf125 argon2 password hashing ()
Co-authored-by: FloatingGhost <hannah@coffee-and-dreams.uk>
Reviewed-on: 
2022-12-30 02:46:58 +00:00
5a405bdadf document dump_to_file and load_from_file 2022-12-29 20:00:04 +00:00
d1bf8aa9ed Add dump_to_file and load_from_file tasks 2022-12-29 19:56:35 +00:00
07a48b9293 giant massive dep upgrade and dialyxir-found error emporium ()
Co-authored-by: FloatingGhost <hannah@coffee-and-dreams.uk>
Reviewed-on: 
2022-12-14 12:38:48 +00:00
e6da301296 Add diagnostics http 2022-12-11 22:57:18 +00:00
09326ffa56 Diagnostics tasks ()
a bunch of ways to get query plans to help with debugging

Co-authored-by: FloatingGhost <hannah@coffee-and-dreams.uk>
Reviewed-on: 
2022-12-07 11:12:34 +00:00
d55de5debf Remerge of hashtag following ()
this time with less idiot

Co-authored-by: FloatingGhost <hannah@coffee-and-dreams.uk>
Reviewed-on: 
2022-12-05 12:58:48 +00:00
ec6bf8c3f7 revert 4a94c9a31e
revert Add ability to follow hashtags ()

Co-authored-by: FloatingGhost <hannah@coffee-and-dreams.uk>
Reviewed-on: 
2022-12-04 20:04:09 +00:00
4a94c9a31e Add ability to follow hashtags ()
Co-authored-by: FloatingGhost <hannah@coffee-and-dreams.uk>
Reviewed-on: 
2022-12-04 17:36:59 +00:00
6b882a2c0b Purge Rejected Follow requests in daily task ()
Co-authored-by: FloatingGhost <hannah@coffee-and-dreams.uk>
Reviewed-on: 
2022-12-03 23:17:43 +00:00
db60640c5b Fixing up deletes a bit ()
Co-authored-by: FloatingGhost <hannah@coffee-and-dreams.uk>
Reviewed-on: 
2022-12-01 15:00:53 +00:00
e3085c495c fix tests broken by relay defaults changing ()
Co-authored-by: FloatingGhost <hannah@coffee-and-dreams.uk>
Reviewed-on: 
2022-11-26 20:45:47 +00:00
856c57208b Ensure deletes are handled after everything else 2022-10-11 14:30:08 +01:00
92ba2802fb generate-keys-at-registration-time ()
Reviewed-on: 
2022-08-24 14:36:33 +00:00
61641957cb fix compatibility with meilisearch ()
Reviewed-on: 
2022-08-16 22:56:49 +00:00
2033d7d4fc ensure extra info in fix_follow_state prints 2022-07-29 19:50:26 +01:00
4c47992686 bugfix/follow-state ()
Reviewed-on: 
2022-07-23 18:58:45 +00:00
d2a185c013 Documentation updates for stable release ()
Reviewed-on: 
2022-07-15 12:27:16 +00:00
7dfc3f3d0e Change default Postgres user/DB to akkoma 2022-07-12 12:41:30 -04:00
a9c82b62f2 Fixes for elasticsearch 8 ()
Reviewed-on: 
2022-07-06 18:57:00 +00:00
a036a01a1e mix format 2022-07-04 17:38:16 +01:00
364b6969eb Use finch everywhere ()
Reviewed-on: 
2022-07-04 16:30:38 +00:00
2937495712 fix ES import from live ()
Reviewed-on: 
2022-06-30 18:44:31 +00:00
bc9e76cce7 Add documentation for ES search 2022-06-30 17:36:57 +01:00
1ecdb19de5 Refactor ES on top of search behaviour 2022-06-30 16:28:31 +01:00
Ekaterina Vaartis
563b964690 Change updateId to uid because apparently that's the new name 2022-06-29 20:49:45 +01:00
Ekaterina Vaartis
b7462040cc Change the meilisearch key auth to conform to 0.25.0 2022-06-29 20:49:45 +01:00
Ekaterina Vaartis
a4914add8c Don't support meilisearch < 0.24.0, since it breaks things 2022-06-29 20:49:45 +01:00
Ekaterina Vaartis
cc3319ac1d Make chunk size configurable 2022-06-29 20:49:45 +01:00
Ekaterina Vaartis
bac70a2bc1 Implement suggestions from the Meilisearch MR
- Index unlisted posts
- Move version check outside of the streaming and only do it once
- Use a PUT request instead of checking manually if there is need to insert
- Add error handling, sort of
2022-06-29 20:49:45 +01:00
Ekaterina Vaartis
0769f06bd1 Style fixes 2022-06-29 20:49:00 +01:00
Ekaterina Vaartis
86971fceaa Support reindexing meilisearch >=0.24.0
It has has a different error code key
2022-06-29 20:48:44 +01:00
Ekaterina Vaartis
9e7d7ebd48 Add a reindex option
Signed-off-by: Ekaterina Vaartis <vaartis@kotobank.ch>
2022-06-29 20:48:44 +01:00
Ekaterina Vaartis
5ed1759091 Reorder ranking rules for (maybe) better results 2022-06-29 20:48:44 +01:00
Ekaterina Vaartis
d1079f1aa3 Add the meilisearch.stats command 2022-06-29 20:48:29 +01:00
Ekaterina Vaartis
d5cc272a91 Add a message with a count of posts to index 2022-06-29 20:48:29 +01:00
Ekaterina Vaartis
dbf556cdcf Implement meilisearch auth 2022-06-29 20:48:29 +01:00
Ekaterina Vaartis
5360cc1097 Make indexing logs rewrite themselves 2022-06-29 20:48:29 +01:00
Ekaterina Vaartis
0cf3654907 Rework task indexing to share code with the main module
The code in the main module now scrubs new posts too
2022-06-29 20:48:29 +01:00
Ekaterina Vaartis
117f525fd6 Adjust content indexing to skip more unneeded stuff 2022-06-29 20:48:29 +01:00
Ekaterina Vaartis
14ef6ce80f Mark only content as searchable for meilisearch 2022-06-29 20:48:29 +01:00
Ekaterina Vaartis
52a872432d Make the chunk size smaller 2022-06-29 20:48:29 +01:00
Ekaterina Vaartis
a586ce0ddd Use content instead of source and scrub it 2022-06-29 20:48:29 +01:00
Ekaterina Vaartis
c3a04166a0 Tweak search ordering to hopefully return newer results 2022-06-29 20:48:29 +01:00
Ekaterina Vaartis
7b3701e6b9 Make meilisearch sort on publish date converted to unix time 2022-06-29 20:48:29 +01:00