[bug] performance regression in 3.15 #877

Open
opened 2025-03-03 15:53:12 +00:00 by Oneric · 0 comments
Member

Your setup

From source

Extra details

Alpine 3.18

Version

develop fc2c740008 (a couple commits after 3.15.1)

PostgreSQL version

16

What’s the issue?

Starting with the 3.15.0 release there wasa stark rise in DB queries and total DB time (from 1-3 queries per second to 10-20 per second) and 95%-quantile queue times for individual queries incresaed about sevenfold for my tiny instance; total DB time to 3-5 times with a lot of noise.
@norm reported not seeing any meaningful change for a big instance suggesting the added cost is mostly constant and thus doesn't matter much for big instances even if it’s significant for small instances.

Most of this could be tracked down to the new oban web dashboard; when setting config :oban_met, autostart: false the query count drop down to about normal and the total query time per second ""only"" is increased by something like 10-20% most of which can probably be attributed to the rise in queue times which didn’t change meaningfully with this config option.
Fully reverting the PR (which also upgraded some deps), additionally seemed to cut down the increase in 95%q-quantile queue times by about half, but still more than before.

Pulling in the further dep upgrades and ObanV12 migration pushed to develop after 3.15.1 did not meaningfully change this.

Running with just the config option for a while now, the 48h average for total DB times increased by ~20% and noticeably the gap between total and exec-time-only widened. The per-second queue times increased from hovering around 0.110-0.136ms before to 0.220-0.290ms. Per second exec times appear to have more frequent noise spikes now.

db-48h-avg.webp

Comparing what I had applied before moving to regular 3.15.1, this must be related to either the oban_web or MFM PR *(the latter doesn't do any DB stuff though it seems?).

There were barely any queries for the endpoints requiring signature checks since 3.15.1, so it seems extremely unlikely to be due to this specific non-PRed change. Afaict there were no other non_PR changes.

Notably the oban web dashboard is mostly redundant with the prëexisting prometheus metrics if the latter are actually being scraped and stored. We might want to consider disabling it by default, or if nothing else document the impact and how to disable it.

Note: admins upgrading from the previous release rather than develop may also not notice this since it might cancel out with the perf job- and stat-related perf improvements merged a bit earlier.

Severity

I cannot use it as easily as I'd like

Have you searched for this issue?

  • I have double-checked and have not found this issue mentioned anywhere.
### Your setup From source ### Extra details Alpine 3.18 ### Version develop fc2c740008 (a couple commits after 3.15.1) ### PostgreSQL version 16 ### What’s the issue? Starting with the 3.15.0 release there wasa stark rise in DB queries and total DB time (from 1-3 queries per second to 10-20 per second) and 95%-quantile queue times for individual queries incresaed about sevenfold for my tiny instance; total DB time to 3-5 times with a lot of noise. @norm reported not seeing any meaningful change for a big instance suggesting the added cost is mostly constant and thus doesn't matter much for big instances even if it’s significant for small instances. Most of this could be tracked down to the new oban web dashboard; when setting `config :oban_met, autostart: false` the query count drop down to about normal and the total query time per second ""only"" is increased by something like 10-20% most of which can probably be attributed to the rise in queue times which didn’t change meaningfully with this config option. Fully reverting the PR (which also upgraded some deps), additionally seemed to cut down the increase in 95%q-quantile queue times by about half, but still more than before. Pulling in the further dep upgrades and ObanV12 migration pushed to develop after 3.15.1 did not meaningfully change this. Running with just the config option for a while now, the 48h average for total DB times increased by ~20% and noticeably the gap between total and exec-time-only widened. The per-second queue times increased from hovering around 0.110-0.136ms before to 0.220-0.290ms. Per second exec times appear to have more frequent noise spikes now. ![db-48h-avg.webp](/attachments/0c26267b-d542-42b9-ada6-24c8e244ef18) Comparing what I had applied before moving to regular 3.15.1, this must be related to either the oban_web or MFM PR *(the latter doesn't do any DB stuff though it seems?). There were barely any queries for the endpoints requiring signature checks since 3.15.1, so it seems extremely unlikely to be due to this specific non-PRed change. Afaict there were no other non_PR changes. Notably the oban web dashboard is mostly redundant with the prëexisting prometheus metrics if the latter are actually being scraped and stored. We might want to consider disabling it by default, or if nothing else document the impact and how to disable it. Note: admins upgrading from the previous release rather than develop may also not notice this since it might cancel out with the perf job- and stat-related perf improvements merged a bit earlier. ### Severity I cannot use it as easily as I'd like ### Have you searched for this issue? - [x] I have double-checked and have not found this issue mentioned anywhere.
Oneric added the
bug
label 2025-03-03 15:53:12 +00:00
Oneric changed title from [bug] performance regression in 3.15.0 to [bug] performance regression in 3.15 2025-03-03 15:58:54 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: AkkomaGang/akkoma#877
No description provided.