Update hashtag prune to account for followed hashtags #844
No reviewers
Labels
No labels
approved, awaiting change
bug
configuration
documentation
duplicate
enhancement
extremely low priority
feature request
Fix it yourself
help wanted
invalid
mastodon_api
needs docs
needs tests
not a bug
planned
pleroma_api
privacy
question
static_fe
triage
wontfix
No milestone
No project
No assignees
2 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: AkkomaGang/akkoma#844
Loading…
Reference in a new issue
No description provided.
Delete branch "norm/akkoma:hashtag-prune"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Currently pruning hashtags with the
prune_objects
task only accounts for whether that hashtag is associated with an object, but this may lead to a foreign key constraint violation if that hashtag has no objects but is followed by a local user.This adds an additional check to see if that hashtag has any followers before proceeding to delete it.
893d9ce93a
to40da4e88ea
f7fe187c70
to88a8086ad3
Seems fine, though hearing it both variants were verified to produce the same result would be reassuring
Notably yet another, more standard and simpler variant to write this is
or alternatively with
UNION ALL
instead ofUNION
.On my db (small, young instance; PostgreSQL 16) the initial
SELECT 1 .. UNION
variant performs atrociously (821 012.04) according toEXPLAIN
and whileUNION ALL
is somewhat better (69 485.28),USING ... LEFT JOIN
is estimated to take an order of magnitude less still (6 020.80). The above withUNION
is a tad better (5 810.82) andUNION ALL
supposedly better still (4 081.40) outperforming even the pre-patch version.For reference the previous query without followed hashtags was estimated at 4 478.42 and if rewritten with an
NOT IN
instead ofNOT EXISTS
3 362.64.However, these numbers may be skewed by my
user_follow_hashtag_table
being empty and the cost estimate forUNION
vsUNION ALL
may potentially be quite a bit off depending on how hashtag frequency is distributed.Ideally real execution times (
EXPLAIN ANALYZE
) for each would be tested for the same action via aborted transactions on multiple, bigger instances. (When doing this keep caching in mind, and measure the execution time of each query a couple of times in a row to avoid skewing results). Practically though,USING .. LEFT JOIN
may very well be good enough since other queries during prune take orders of magnitude more time so while i welcome further experimenting and optimising on this, it’s not a reason to hold this bugfix back as far as I’m concernedRunning the simpler query resulted in a foreign key violation.I was however able to confirm that both the
SELECT 1 ... UNION
variant and theLEFT JOIN
variant both deleted the same amount of rows.EXPLAIN ANALYZE
on theUNION
version took an average of 1486.024ms, whileLEFT JOIN
took 286.3125 ms in my testing so theLEFT JOIN
query is definitely more optimal in my case.oops, crucial omission: try again with
NOT IN
instead ofIN
(i’ll edited the previous post to avoid further confusion and fix the slightly different costs)
Seems like that query is slower than the
LEFT JOIN
but not as slow as theUNION
. I got an average of 426.755 ms over 3 runs of that query on my database.This is the query plan I got when running that query:
The query plan matches mine; notably here your cost estimate for the simpler version (
123683.54
) is already worse than the42254.88
you got forLEFT JOIN
yesterday.So yep,
USING ... LEFT JOIN
appears to be the clear winner for akko’s dbView command line instructions
Checkout
From your project repository, check out a new branch and test the changes.