Update hashtag prune to account for followed hashtags #844
No reviewers
Labels
No labels
approved, awaiting change
bug
cannot reproduce
configuration
documentation
duplicate
enhancement
extremely low priority
feature request
Fix it yourself
help wanted
invalid
mastodon_api
needs change/feedback
needs docs
needs tests
not a bug
not our bug
planned
pleroma_api
privacy
question
static_fe
triage
wontfix
No milestone
No project
No assignees
2 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
AkkomaGang/akkoma!844
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "norm/akkoma:hashtag-prune"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Currently pruning hashtags with the
prune_objectstask only accounts for whether that hashtag is associated with an object, but this may lead to a foreign key constraint violation if that hashtag has no objects but is followed by a local user.This adds an additional check to see if that hashtag has any followers before proceeding to delete it.
893d9ce93a40da4e88eaf7fe187c7088a8086ad3Seems fine, though hearing it both variants were verified to produce the same result would be reassuring
Notably yet another, more standard and simpler variant to write this is
or alternatively with
UNION ALLinstead ofUNION.On my db (small, young instance; PostgreSQL 16) the initial
SELECT 1 .. UNIONvariant performs atrociously (821 012.04) according toEXPLAINand whileUNION ALLis somewhat better (69 485.28),USING ... LEFT JOINis estimated to take an order of magnitude less still (6 020.80). The above withUNIONis a tad better (5 810.82) andUNION ALLsupposedly better still (4 081.40) outperforming even the pre-patch version.For reference the previous query without followed hashtags was estimated at 4 478.42 and if rewritten with an
NOT INinstead ofNOT EXISTS3 362.64.However, these numbers may be skewed by my
user_follow_hashtag_tablebeing empty and the cost estimate forUNIONvsUNION ALLmay potentially be quite a bit off depending on how hashtag frequency is distributed.Ideally real execution times (
EXPLAIN ANALYZE) for each would be tested for the same action via aborted transactions on multiple, bigger instances. (When doing this keep caching in mind, and measure the execution time of each query a couple of times in a row to avoid skewing results). Practically though,USING .. LEFT JOINmay very well be good enough since other queries during prune take orders of magnitude more time so while i welcome further experimenting and optimising on this, it’s not a reason to hold this bugfix back as far as I’m concernedRunning the simpler query resulted in a foreign key violation.I was however able to confirm that both the
SELECT 1 ... UNIONvariant and theLEFT JOINvariant both deleted the same amount of rows.EXPLAIN ANALYZEon theUNIONversion took an average of 1486.024ms, whileLEFT JOINtook 286.3125 ms in my testing so theLEFT JOINquery is definitely more optimal in my case.oops, crucial omission: try again with
NOT INinstead ofIN(i’ll edited the previous post to avoid further confusion and fix the slightly different costs)
Seems like that query is slower than the
LEFT JOINbut not as slow as theUNION. I got an average of 426.755 ms over 3 runs of that query on my database.This is the query plan I got when running that query:
The query plan matches mine; notably here your cost estimate for the simpler version (
123683.54) is already worse than the42254.88you got forLEFT JOINyesterday.So yep,
USING ... LEFT JOINappears to be the clear winner for akko’s db