SQL: Outlier query information gathering #784
Labels
No labels
approved, awaiting change
bug
configuration
documentation
duplicate
enhancement
extremely low priority
feature request
Fix it yourself
help wanted
invalid
mastodon_api
needs change/feedback
needs docs
needs tests
not a bug
planned
pleroma_api
privacy
question
static_fe
triage
wontfix
No milestone
No project
No assignees
4 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: AkkomaGang/akkoma#784
Loading…
Add table
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
If you see this issue, go to
/phoenix/live_dashboard/ecto_stats?nav=outliers
and copy the outlier data here for query analysis to see if we can improve query design in any common places
template table
Mine:
Worth noting that the
pg_stat_statements
extension required for this data isn't enabled by default in Postgres.To enable the extension, modify
postgresql.conf
with the following settings:Then restart Postgres. Next connect to the akkoma database with psql and run
Steps copied from here.
Hi, here are mine. I actived the PG extenstion 9 hours ago to let it collect some stats.
Ill can post a update in a few days again, if you want.
Here's mine after about 12 hours (deleted the original since it wasn't run for long enough):
worth noting the top offender by far for ihba and akko is the query of the attachment cleanup worker; also see #765 for this one
@YokaiRick since it doesn’t show up at all on your instance, can you confirm whether
:instance, :cleanup_attachments
is set to false (the default)?@Oneric nope, i didn't deactivate it: here is my instance part:
and i grepped for the default stuff also:
i have a single user instance and tbh i can't remember when it was the last time i did delete a post.. maybe thats why?
Yeah, if you didn’t set it at all it defaults to
false
; but never deleting posts would amount to the same effect; thanks!Here reports for a young, private instance with very few users. Outlier data gathering was turned on after a bit more than a month after creation and ran for about two (one) month. Prometheus is enabled (the default setting) and actively used. Attachment cleanup is turned on, but the still-not-merged optimisation patches are also applied locally explaining the absence of this query.
After ~2 months:
A month prior
pleroma/stats.ex
, run twice for local and remote; combine into single query?Top offender here is the query used to get the list and count of remote isntaces used in prometheus stat reporting, the
peers
mastodon API and possibly some other pleroma API. This was also a high offender in previous reports, but got overshadowed by the awful attachment cleanup perf.This can be replaced with a query of the instances table which
EXPLAIN
suggests is much cheaper. Results don’t match exactly, but nothing to worry about imho.Third-highest offender is again a stat query counting local and remtote users. Note how it’s run exactly twice as often as the instance query; this i because remote and local users are counted individually; combining this query should half the perf cost; both do a full table scan.
Insertion of activites also shows up; there’s nothing we can do directly, though cutting down on unecessary indices (see #764) might help — and in particular the
counter_cache
table (which is populated by an insertion trigger) and it’s related indexes could be dropped; it’s only used to report non-critical stats in some Pleroma Admin API, but I suspect it of being relatively costly on activity operations.Everything else is Oban related; we currently create way too many jobs known to be noops; cutting down on this might help here too (i have a half-finished patch, but no ETA) (this also helps with PostgreSQL bloat from deleted rows)
With the attachment cleanup alleviation and perf changes which made it into 3.15.x the profile changed quit e a bit and it’d be good to take another look at this for various instance sizes and settings.
First up, upgrade to the newest release if you haven’t already and then run the following as the db superuser (
postgres
) to reset your statement statistics:Then wait at least a week and report your results.
Additionally it’d be nice if you could take a look at your individually most costly queries. Adapt the execution count threshold to a sufficient value to exclude e.g. db prune queries and other stuff you ran manually:
My results, from the dashboard which evidently includes the cost of db pruning:
Almost everything is now just oban job handling, except:
counter_cache
i complained about beforecounter_cache
likely contributes to this (the rest are other indices)As for noteworthy things from the individually most costly queries:
The frontend could be improved to be more selective here to avoid unecessary queries being made.
Here's my updated outliers list:
(note: Sync IO time is omitted because the columns for those stats got changed in pg 17 and phoenix live dashboard doesn't support using those yet)
After upgrading to PostgreSQL 17 and recollecting stats for some weeks during which users were more active than on average and no db pruning, here are my renewed outlier stats:
Activity.(get_|get_all_|)create_by_object_id/1
ActivityPub.Utils.get_existing_announce/2
? Kinda a lot of calls then though...Observations from norm’s and my refreshed stats:
min_id
; the version without a mn_id is individually more costly but rarer)There’s no obvious big common outlier to tackle, but good steps (including observations from my previous results) appear to be:
activities
indexes and triggers which don't bring much benefit; imhocounter_cache
and it’s associated admin-fe feature should be purgedobjects
table