[bug] High CPU usage in PostgreSQL #569

Closed
opened 2023-06-17 03:51:57 +00:00 by sevichecc · 17 comments

Your setup

OTP

Extra details

My VPS:

  • CPU4:vCPU Cores
  • RAM: 8 GB RAM

PostgreSQL Setting:
shared_buffers = 512MB
effective_cache_size = 1536MB
maintenance_work_mem = 128MB
work_mem = 26214kB
max_worker_processes = 2
max_parallel_workers_per_gather = 1
max_parallel_workers = 2

Version

3.9.2-0-g39a878f

PostgreSQL version

14

What were you trying to do?

Ever since migrating Pleroma to Akkoma, postgreSQL's CPU usage has been excessively high, severely impacting the performance of other programs, what should I do to solve this issue?

image

What did you expect to happen?

No response

What actually happened?

No response

Logs

No response

Severity

I cannot use it as easily as I'd like

Have you searched for this issue?

  • I have double-checked and have not found this issue mentioned anywhere.
### Your setup OTP ### Extra details My VPS: - CPU4:vCPU Cores - RAM: 8 GB RAM PostgreSQL Setting: shared_buffers = 512MB effective_cache_size = 1536MB maintenance_work_mem = 128MB work_mem = 26214kB max_worker_processes = 2 max_parallel_workers_per_gather = 1 max_parallel_workers = 2 ### Version 3.9.2-0-g39a878f ### PostgreSQL version 14 ### What were you trying to do? Ever since migrating Pleroma to Akkoma, postgreSQL's CPU usage has been excessively high, severely impacting the performance of other programs, what should I do to solve this issue? ![image](/attachments/51a198bb-be90-43fb-921a-fd6db61cb388) ### What did you expect to happen? _No response_ ### What actually happened? _No response_ ### Logs _No response_ ### Severity I cannot use it as easily as I'd like ### Have you searched for this issue? - [x] I have double-checked and have not found this issue mentioned anywhere.
sevichecc added the
bug
label 2023-06-17 03:51:57 +00:00

you've given us precious nothing to go on here

  • have you followed the instructions provided to you in the irc when you asked there?
  • what endpoints are timing out?
  • what queries are running?
  • are queries hanging?
  • do the logs give you any insight

we cannot telepathically communicate with your database, we cannot answer "how should you solve this", you're going to need to do some digging here

you've given us precious nothing to go on here - have you followed the instructions provided to you in the irc when you asked there? - what endpoints are timing out? - what queries are running? - are queries hanging? - do the logs give you any insight we cannot telepathically communicate with your database, we cannot answer "how should you solve this", you're going to need to do some digging here
Author

I apologize for not joining the IRC channel before posting this issue. I thought that raising the issue in the repository would be helpful for others facing the same problem, as it allows for better tracking and resolution.

And I must acknowledge that my knowledge of PostgreSQL is currently limited. Thank you for reminding me of this important aspect! I will make an effort to learn more about PostgreSQL and gather the necessary information to provide a more detailed analysis of the issue.

I apologize for not joining the IRC channel before posting this issue. I thought that raising the issue in the repository would be helpful for others facing the same problem, as it allows for better tracking and resolution. And I must acknowledge that my knowledge of PostgreSQL is currently limited. Thank you for reminding me of this important aspect! I will make an effort to learn more about PostgreSQL and gather the necessary information to provide a more detailed analysis of the issue.

I'm having a similar problem and went through pg_top to debug the issue. So far, I've found that this specific query is spamming the server:

SELECT a0."id", a0."data", a0."local", a0."actor", a0."recipients", a0."inserted_at", a0."updated_at" FROM "activities" AS a0 INNER JOIN LATERAL (SELECT is_active from users WHERE ap_id = a0."actor" AND is_active = TRUE) AS f1 ON TRUE WHERE (coalesce((a0."data")->'object'->>'id', (a0."data")->>'object') = $1) AND ((a0."data")->>'type' = $2)
I'm having a similar problem and went through [pg_top](https://pg_top.gitlab.io) to debug the issue. So far, I've found that this specific query is spamming the server: ``` SELECT a0."id", a0."data", a0."local", a0."actor", a0."recipients", a0."inserted_at", a0."updated_at" FROM "activities" AS a0 INNER JOIN LATERAL (SELECT is_active from users WHERE ap_id = a0."actor" AND is_active = TRUE) AS f1 ON TRUE WHERE (coalesce((a0."data")->'object'->>'id', (a0."data")->>'object') = $1) AND ((a0."data")->>'type' = $2) ```

Trying to find similar issues, I found two in upstream, one from three years ago and another from one year ago. One of the issues recommended doing a VACUUM ANALYZE, which I already did with no apparent success.

Trying to find similar issues, I found two in upstream, [one from three years ago](https://git.pleroma.social/pleroma/pleroma/-/issues/1780) and [another from one year ago](https://git.pleroma.social/pleroma/pleroma/-/issues/2793). One of the issues recommended doing a `VACUUM ANALYZE`, which I already did with no apparent success.
Author

I'm having a similar problem and went through pg_top to debug the issue. So far, I've found that this specific query is spamming the server:

SELECT a0."id", a0."data", a0."local", a0."actor", a0."recipients", a0."inserted_at", a0."updated_at" FROM "activities" AS a0 INNER JOIN LATERAL (SELECT is_active from users WHERE ap_id = a0."actor" AND is_active = TRUE) AS f1 ON TRUE WHERE (coalesce((a0."data")->'object'->>'id', (a0."data")->>'object') = $1) AND ((a0."data")->>'type' = $2)

After using pg_top and examining the PostgreSQL logs, I discovered that the same query was causing errors.

image

In addition, I have tried vacuuming and analyzing the database, but it did not make any changes.

> I'm having a similar problem and went through [pg_top](https://pg_top.gitlab.io) to debug the issue. So far, I've found that this specific query is spamming the server: > > ``` > SELECT a0."id", a0."data", a0."local", a0."actor", a0."recipients", a0."inserted_at", a0."updated_at" FROM "activities" AS a0 INNER JOIN LATERAL (SELECT is_active from users WHERE ap_id = a0."actor" AND is_active = TRUE) AS f1 ON TRUE WHERE (coalesce((a0."data")->'object'->>'id', (a0."data")->>'object') = $1) AND ((a0."data")->>'type' = $2) > ``` > After using pg_top and examining the PostgreSQL logs, I discovered that the same query was causing errors. ![image](/attachments/edb3d63f-f665-4412-8d45-1e19cc25f572) In addition, I have tried vacuuming and analyzing the database, but it did not make any changes.
820 KiB

try running

explain analyze SELECT a0."id", a0."data", a0."local", a0."actor", a0."recipients", a0."inserted_at", a0."updated_at" FROM "activities" AS a0 INNER JOIN LATERAL (SELECT is_active from users WHERE ap_id = a0."actor" AND is_active = TRUE) AS f1 ON TRUE WHERE (coalesce((a0."data")->'object'->>'id', (a0."data")->>'object') = 'dontcare') AND ((a0."data")->>'type' = 'Create');

and grab the output

try running ```bash explain analyze SELECT a0."id", a0."data", a0."local", a0."actor", a0."recipients", a0."inserted_at", a0."updated_at" FROM "activities" AS a0 INNER JOIN LATERAL (SELECT is_active from users WHERE ap_id = a0."actor" AND is_active = TRUE) AS f1 ON TRUE WHERE (coalesce((a0."data")->'object'->>'id', (a0."data")->>'object') = 'dontcare') AND ((a0."data")->>'type' = 'Create'); ``` and grab the output
Author

try running

explain analyze SELECT a0."id", a0."data", a0."local", a0."actor", a0."recipients", a0."inserted_at", a0."updated_at" FROM "activities" AS a0 INNER JOIN LATERAL (SELECT is_active from users WHERE ap_id = a0."actor" AND is_active = TRUE) AS f1 ON TRUE WHERE (coalesce((a0."data")->'object'->>'id', (a0."data")->>'object') = 'dontcare') AND ((a0."data")->>'type' = 'Create');

and grab the output

The output shows:

ERROR:  relation "activities" does not exist
LINE 1: ...cipients", a0."inserted_at", a0."updated_at" FROM "activitie...

image

> try running > > ```bash > explain analyze SELECT a0."id", a0."data", a0."local", a0."actor", a0."recipients", a0."inserted_at", a0."updated_at" FROM "activities" AS a0 INNER JOIN LATERAL (SELECT is_active from users WHERE ap_id = a0."actor" AND is_active = TRUE) AS f1 ON TRUE WHERE (coalesce((a0."data")->'object'->>'id', (a0."data")->>'object') = 'dontcare') AND ((a0."data")->>'type' = 'Create'); > ``` > > and grab the output The output shows: ``` ERROR: relation "activities" does not exist LINE 1: ...cipients", a0."inserted_at", a0."updated_at" FROM "activitie... ``` ![image](/attachments/722c22ca-7591-40b5-95dd-ed0e4e6dc556)

try actually running it on the correct database and not the default postgres one

try actually running it on the correct database and not the default postgres one
Author

try actually running it on the correct database and not the default postgres one

Running in the akkoma database, the output shows:

                                                                                 
                                                                                 QUERY PLAN                                                                                 
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.42..131365.04 rows=1901 width=696) (actual time=2514.461..2514.622 rows=0 loops=1)
   Workers Planned: 1
   Workers Launched: 0
   ->  Nested Loop  (cost=0.42..130174.94 rows=1118 width=696) (actual time=2513.503..2513.505 rows=0 loops=1)
         ->  Parallel Seq Scan on activities a0  (cost=0.00..116004.28 rows=2422 width=696) (actual time=2513.501..2513.502 rows=0 loops=1)
               Filter: (((data ->> 'type'::text) = 'Create'::text) AND (COALESCE(((data -> 'object'::text) ->> 'id'::text), (data ->> 'object'::text)) = 'dontcare'::text))
               Rows Removed by Filter: 1058265
         ->  Index Scan using users_ap_id_index on users  (cost=0.42..5.85 rows=1 width=38) (never executed)
               Index Cond: ((ap_id)::text = (a0.actor)::text)
               Filter: is_active
 Planning Time: 1.753 ms
 JIT:
   Functions: 9
   Options: Inlining false, Optimization false, Expressions true, Deforming true
   Timing: Generation 2.781 ms, Inlining 0.000 ms, Optimization 1.302 ms, Emission 12.603 ms, Total 16.685 ms
 Execution Time: 2517.689 ms
(16 rows)
> try actually running it on the correct database and not the default postgres one Running in the akkoma database, the output shows: ``` QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Gather (cost=1000.42..131365.04 rows=1901 width=696) (actual time=2514.461..2514.622 rows=0 loops=1) Workers Planned: 1 Workers Launched: 0 -> Nested Loop (cost=0.42..130174.94 rows=1118 width=696) (actual time=2513.503..2513.505 rows=0 loops=1) -> Parallel Seq Scan on activities a0 (cost=0.00..116004.28 rows=2422 width=696) (actual time=2513.501..2513.502 rows=0 loops=1) Filter: (((data ->> 'type'::text) = 'Create'::text) AND (COALESCE(((data -> 'object'::text) ->> 'id'::text), (data ->> 'object'::text)) = 'dontcare'::text)) Rows Removed by Filter: 1058265 -> Index Scan using users_ap_id_index on users (cost=0.42..5.85 rows=1 width=38) (never executed) Index Cond: ((ap_id)::text = (a0.actor)::text) Filter: is_active Planning Time: 1.753 ms JIT: Functions: 9 Options: Inlining false, Optimization false, Expressions true, Deforming true Timing: Generation 2.781 ms, Inlining 0.000 ms, Optimization 1.302 ms, Emission 12.603 ms, Total 16.685 ms Execution Time: 2517.689 ms (16 rows) ```

ok well that explains a lot, this should take 2ms tops

you seem to... not have activities_create_objects_index

can you give me the output of \d activities

that should confirm my suspicion

ok well that explains a lot, this should take 2ms tops you seem to... not have `activities_create_objects_index` can you give me the output of `\d activities` that should confirm my suspicion
Author

the output of \d activities:

                                     Table "public.activities"
   Column    |              Type              | Collation | Nullable |           Default            
-------------+--------------------------------+-----------+----------+------------------------------
 id          | uuid                           |           | not null | 
 data        | jsonb                          |           | not null | 
 inserted_at | timestamp(0) without time zone |           | not null | 
 updated_at  | timestamp(0) without time zone |           | not null | 
 local       | boolean                        |           | not null | true
 actor       | character varying(255)         |           |          | 
 recipients  | character varying(255)[]       |           |          | ARRAY[]::character varying[]
Indexes:
    "activities_pkey" PRIMARY KEY, btree (id)
    "activities_actor_id_DESC_NULLS_LAST_index" btree (actor, id DESC NULLS LAST)
    "activities_cc_index" gin ((data -> 'cc'::text))
    "activities_context_index" btree ((data ->> 'type'::text), (data ->> 'context'::text))
    "activities_create_objects_index" btree (associated_object_id(data))
    "activities_hosts" btree (split_part(actor::text, '/'::text, 3))
    "activities_id_desc_nulls_last_local_index" btree (id DESC NULLS LAST, local)
    "activities_in_reply_to" btree (((data -> 'object'::text) ->> 'inReplyTo'::text))
    "activities_likes" gin ((data #> '{object,likes}'::text[]))
    "activities_local_index" btree (local)
    "activities_recipients_index" gin (recipients)
    "activities_to_index" gin ((data -> 'to'::text))
    "activities_unique_apid_index" UNIQUE, btree ((data ->> 'id'::text))
    "activities_visibility_index" btree (activity_visibility(actor, recipients, data), id DESC NULLS LAST) WHERE (data ->> 'type'::text) = 'Create'::text
Referenced by:
    TABLE "bookmarks" CONSTRAINT "bookmarks_activity_id_fkey" FOREIGN KEY (activity_id) REFERENCES activities(id) ON DELETE CASCADE
    TABLE "notifications" CONSTRAINT "notifications_activity_id_fkey" FOREIGN KEY (activity_id) REFERENCES activities(id) ON DELETE CASCADE
    TABLE "report_notes" CONSTRAINT "report_notes_activity_id_fkey" FOREIGN KEY (activity_id) REFERENCES activities(id) ON DELETE CASCADE
Triggers:
    status_visibility_counter_cache_trigger BEFORE INSERT OR DELETE OR UPDATE OF recipients, data ON activities FOR EACH ROW EXECUTE FUNCTION update_status_visibility_counter_cache()
the output of `\d activities`: ``` Table "public.activities" Column | Type | Collation | Nullable | Default -------------+--------------------------------+-----------+----------+------------------------------ id | uuid | | not null | data | jsonb | | not null | inserted_at | timestamp(0) without time zone | | not null | updated_at | timestamp(0) without time zone | | not null | local | boolean | | not null | true actor | character varying(255) | | | recipients | character varying(255)[] | | | ARRAY[]::character varying[] Indexes: "activities_pkey" PRIMARY KEY, btree (id) "activities_actor_id_DESC_NULLS_LAST_index" btree (actor, id DESC NULLS LAST) "activities_cc_index" gin ((data -> 'cc'::text)) "activities_context_index" btree ((data ->> 'type'::text), (data ->> 'context'::text)) "activities_create_objects_index" btree (associated_object_id(data)) "activities_hosts" btree (split_part(actor::text, '/'::text, 3)) "activities_id_desc_nulls_last_local_index" btree (id DESC NULLS LAST, local) "activities_in_reply_to" btree (((data -> 'object'::text) ->> 'inReplyTo'::text)) "activities_likes" gin ((data #> '{object,likes}'::text[])) "activities_local_index" btree (local) "activities_recipients_index" gin (recipients) "activities_to_index" gin ((data -> 'to'::text)) "activities_unique_apid_index" UNIQUE, btree ((data ->> 'id'::text)) "activities_visibility_index" btree (activity_visibility(actor, recipients, data), id DESC NULLS LAST) WHERE (data ->> 'type'::text) = 'Create'::text Referenced by: TABLE "bookmarks" CONSTRAINT "bookmarks_activity_id_fkey" FOREIGN KEY (activity_id) REFERENCES activities(id) ON DELETE CASCADE TABLE "notifications" CONSTRAINT "notifications_activity_id_fkey" FOREIGN KEY (activity_id) REFERENCES activities(id) ON DELETE CASCADE TABLE "report_notes" CONSTRAINT "report_notes_activity_id_fkey" FOREIGN KEY (activity_id) REFERENCES activities(id) ON DELETE CASCADE Triggers: status_visibility_counter_cache_trigger BEFORE INSERT OR DELETE OR UPDATE OF recipients, data ON activities FOR EACH ROW EXECUTE FUNCTION update_status_visibility_counter_cache() ```

ok

ok

pleroma did something WEIRD and changed their activities_create_objects_index

great

GREAT

love it

ok ok pleroma did something WEIRD and changed their `activities_create_objects_index` great GREAT love it

a quick fix will be to run

drop index activities_create_objects_index;
create index activities_create_objects_index on activities (COALESCE((data -> 'object'::text) ->> 'id'::text, data ->> 'object'::text));

that should bring you into line with the schema we use

this may take some time so be prepared to wait a little

a quick fix will be to run ``` drop index activities_create_objects_index; create index activities_create_objects_index on activities (COALESCE((data -> 'object'::text) ->> 'id'::text, data ->> 'object'::text)); ``` that should bring you into line with the schema we use this _may_ take some time so be prepared to wait a little
Author

Thank you so much! I have tried the quick fix and the query error is gone, the CPU usage has come down. Everything has gone back to normal now.

Thank you so much! I have tried the quick fix and the query error is gone, the CPU usage has come down. Everything has gone back to normal now.

thanks for bringing it to my attention, i'll see why pleroma made the change that they did and add extra notes to migration docs

tell me if it does anything else weird, closing this off for now

thanks for bringing it to my attention, i'll see why pleroma made the change that they did and add extra notes to migration docs tell me if it does anything else weird, closing this off for now

Just as a post-mortem: this happened to me after manually migrating from Pleroma to Akkoma - could it be we're missing a migration?

Just as a post-mortem: this happened to me after manually migrating from Pleroma to Akkoma - could it be we're missing a migration?

quite possibly

this was apparently a known thing a while back, there is a note in the migration guide that should cover it - https://docs.akkoma.dev/stable/installation/migrating_to_akkoma/#warning-migrating-from-pleroma-develop

though I suppose it may have been missed if this migration has made it into stable

quite possibly this was apparently a known thing a while back, there is a note in the migration guide that should cover it - https://docs.akkoma.dev/stable/installation/migrating_to_akkoma/#warning-migrating-from-pleroma-develop though I suppose it may have been missed if this migration has made it into stable
Sign in to join this conversation.
No milestone
No project
No assignees
3 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: AkkomaGang/akkoma#569
No description provided.