A lot of errors when migrating from Pleroma #215

Closed
opened 2022-09-16 11:29:25 +00:00 by bubblineyuri · 55 comments

Hi,

I get the following PostgreSQL error when trying to migrate from Pleroma to Akkoma

column u1.mastofe_settings does not exist

I hope you can help me with that.

~Leonie

Hi, I get the following PostgreSQL error when trying to migrate from Pleroma to Akkoma ``` column u1.mastofe_settings does not exist ``` I hope you can help me with that. ~Leonie

Ah, this could occur if migrating from develop pleroma where you'd already applied the removal of mastofe patch

Should be fixed via #216

if you're from source, git pull and try again

if you're OTP, wait for https://ci.akkoma.dev/AkkomaGang/akkoma/build/930 to complete then update and try again

Ah, this could occur if migrating from develop pleroma where you'd already applied the removal of mastofe patch Should be fixed via https://akkoma.dev/AkkomaGang/akkoma/pulls/216 if you're from source, git pull and try again if you're OTP, wait for https://ci.akkoma.dev/AkkomaGang/akkoma/build/930 to complete then update and try again
Author

This didn't fix anything, now I get these fun messages http://content.koyu.space/1MPAN

This didn't fix anything, now I get these fun messages http://content.koyu.space/1MPAN

ok so it actually did fix your error since these are totally different

sounds like you've got your pool target set really low - it should never be that low by default

you'll want to increase it by changing queue_target like so:

config :pleroma, Pleroma.Repo,
  adapter: Ecto.Adapters.Postgres,
  ... creds are probably here...
  timeout: 90_000,
  queue_target: 20_000, # THIS ONE
  queue_interval: 2_000,
  pool_size: 10
ok so it actually _did_ fix your error since these are totally different sounds like you've got your pool target set really low - it should never be that low by default you'll want to increase it by changing `queue_target` like so: ```elixir config :pleroma, Pleroma.Repo, adapter: Ecto.Adapters.Postgres, ... creds are probably here... timeout: 90_000, queue_target: 20_000, # THIS ONE queue_interval: 2_000, pool_size: 10 ```
Author

Getting this beauty here

Oct 05 11:35:12 koyu.space mix[2454618]: 11:35:12.038 [info] Postgrex.Protocol (#PID<0.5170.0>) disconnected: ** (DBConnection.ConnectionError) client #PID<0.6331.0> exited

Getting this beauty here `Oct 05 11:35:12 koyu.space mix[2454618]: 11:35:12.038 [info] Postgrex.Protocol (#PID<0.5170.0>) disconnected: ** (DBConnection.ConnectionError) client #PID<0.6331.0> exited`

that alone doesn't give an awful lot to go on, there should be more logs above it that indicate what used the exit

that alone doesn't give an awful lot to go on, there should be more logs above it that indicate what used the exit
Author

Is this a little better? http://content.koyu.space/A6tGK

Is this a little better? http://content.koyu.space/A6tGK

hm, that doesn't look fatal - does the instance die or does it recover after doing that?

hm, that doesn't look fatal - does the instance die or does it recover after doing that?
Author

It loads the UI and responds to requests very slowly

It loads the UI and responds to requests very slowly

hm, that sounds like you might have too big of a database for your system

have you done a vacuum/pg_repack to remove stuff?

additionally, how long is your remote object retention? that may be causing object table inflation

hm, that sounds like you might have too big of a database for your system have you done a vacuum/pg_repack to remove stuff? additionally, how long is your remote object retention? that may be causing object table inflation
Author

I have no idea what those two are. I also store configuration in the database, so how do I check that if I've blown off half of the server?

I have no idea what those two are. I also store configuration in the database, so how do I check that if I've blown off half of the server?
vacuum - https://docs.akkoma.dev/stable/administration/CLI_tasks/database/#prune-old-remote-posts-from-the-database pg_repack - https://github.com/reorg/pg_repack show options in the database - https://docs.akkoma.dev/stable/administration/CLI_tasks/config/#dump-all-of-the-config-settings-defined-in-the-database
Author

There is no object retention configuration. Should I try configuring it or assume it has some sort of default?

Doing a pg_repack does ERROR: pg_repack failed with error: pg_repack 1.4.7 is not installed in the database

There is no object retention configuration. Should I try configuring it or assume it has some sort of default? Doing a pg_repack does `ERROR: pg_repack failed with error: pg_repack 1.4.7 is not installed in the database`

it'll have a default then, you should just be able to run the prune command from the docs.akkoma and it'll remove stuff

it'll have a default then, you should just be able to run the prune command from the docs.akkoma and it'll remove stuff
Author

This is getting more interesting now after pruning

Oct 05 14:10:26 koyu.space mix[2497179]: 14:10:26.757 [error] Error while fetching https://fedi.absturztau.be/objects/b51ef47d-46bf-420d-b3d2-4ac22a1126d1: {:error, {:transmogrifier, {:error, {:validate, {:error, #Ecto.Changeset<action: :insert, changes: %{actor: "https://fedi.absturztau.be/users/khaosgrille", cc: ["https://www.w3.org/ns/activitystreams#Public"], context: "https://brotka.st/contexts/bc241591-d030-4d42-8868-4948f437cc86", object: "https://fedi.absturztau.be/objects/b51ef47d-46bf-420d-b3d2-4ac22a1126d1", to: ["https://brotka.st/users/kaia", "https://fedi.absturztau.be/users/khaosgrille/followers", "https://freespeechextremist.com/users/xue"], type: "Create"}, errors: [object: {"The object to create already exists", []}], data: #Pleroma.Web.ActivityPub.ObjectValidators.CreateGenericValidator<>, valid?: false>}}}}}
Oct 05 14:10:26 koyu.space mix[2497179]: 14:10:26.757 [warning] Couldn't fetch "https://fedi.absturztau.be/objects/b51ef47d-46bf-420d-b3d2-4ac22a1126d1", error: nil
This is getting more interesting now after pruning ``` Oct 05 14:10:26 koyu.space mix[2497179]: 14:10:26.757 [error] Error while fetching https://fedi.absturztau.be/objects/b51ef47d-46bf-420d-b3d2-4ac22a1126d1: {:error, {:transmogrifier, {:error, {:validate, {:error, #Ecto.Changeset<action: :insert, changes: %{actor: "https://fedi.absturztau.be/users/khaosgrille", cc: ["https://www.w3.org/ns/activitystreams#Public"], context: "https://brotka.st/contexts/bc241591-d030-4d42-8868-4948f437cc86", object: "https://fedi.absturztau.be/objects/b51ef47d-46bf-420d-b3d2-4ac22a1126d1", to: ["https://brotka.st/users/kaia", "https://fedi.absturztau.be/users/khaosgrille/followers", "https://freespeechextremist.com/users/xue"], type: "Create"}, errors: [object: {"The object to create already exists", []}], data: #Pleroma.Web.ActivityPub.ObjectValidators.CreateGenericValidator<>, valid?: false>}}}}} Oct 05 14:10:26 koyu.space mix[2497179]: 14:10:26.757 [warning] Couldn't fetch "https://fedi.absturztau.be/objects/b51ef47d-46bf-420d-b3d2-4ac22a1126d1", error: nil ```

ok, that's fine, that error isn't fatal at all, just a refetch
you can safely ignore it

ok, that's fine, that error isn't fatal at all, just a refetch you can safely ignore it
Author

How about this? http://content.koyu.space/XIXb4

The server is still super slow

How about this? http://content.koyu.space/XIXb4 The server is still super slow

that's an interesting one, seems it doesn't like some of your config

that sounds like you may have an outdated schema in your config

can you add

config :pleroma, :instance,
  staff_transparency: []

in your config?

if you run from source, does your config/config.exs match the one in our version control?

that's an interesting one, seems it doesn't like some of your config that sounds like you may have an outdated schema in your config can you add ``` config :pleroma, :instance, staff_transparency: [] ``` in your config? if you run from source, does your `config/config.exs` match the one in our version control?
Author

Now it tries to eat itself http://content.koyu.space/Mj6z8

Copied config from version control and added the config flag you mentioned

Now it tries to eat itself http://content.koyu.space/Mj6z8 Copied config from version control and added the config flag you mentioned

nice! that means we're past the worst of it, we've got standard inbound requests and cron activating

looks like you might have some long-running requests

when the server is up and the timeouts are occuring, run SELECT * FROM pg_stat_activity; on your db and see if it throws anything interesting

nice! that means we're past the worst of it, we've got standard inbound requests and cron activating looks like you might have some long-running requests when the server is up and the timeouts are occuring, run `SELECT * FROM pg_stat_activity;` on your db and see if it throws anything interesting
Author

These are the first 20 seconds or so and it's clogging up with queries. This looks awful I guess.

grafik

These are the first 20 seconds or so and it's clogging up with queries. This looks awful I guess. ![grafik](/attachments/af8d47c4-8ac7-41a1-9180-1a5d67e813ee)

yeah ok that's what i'd expect to see in this case

what sort of size box are you running this on? does the IO max out?

yeah ok that's what i'd expect to see in this case what sort of size box are you running this on? does the IO max out?
Author

It's getting better, but it's still very slow. Some requests even don't get throguh now. I'm running koyu.space on a KVM VPS with the following specs:

10GB RAM
4 Cores (AMD EPYC)
200GB SSD

It's getting better, but it's still very slow. Some requests even don't get throguh now. I'm running koyu.space on a KVM VPS with the following specs: 10GB RAM 4 Cores (AMD EPYC) 200GB SSD

you may just have a very large backlog of tasks that is slowly processing

try leaving it online for an hour or so and see if it improves

you may just have a very large backlog of tasks that is slowly processing try leaving it online for an hour or so and see if it improves
Author

What I find ironic is that the timelines load super slow, but the rest like config etc. loads as it should

What I find ironic is that the timelines load super slow, but the rest like config etc. loads as it should
Author

Yes, it seems to have synced up, but timelines still load slow

Yes, it seems to have synced up, but timelines still load slow
Author

I'll let it sink in a little longer. Will report back.

I'll let it sink in a little longer. Will report back.

other things that can cause slow timelines include having thread containment turned on, so check if that's on

it's off by default

other things that can cause slow timelines include having thread containment turned on, so check if that's on it's off by default
Author

Removing masto-fe related settings from the database made it kinda faster 🤔

Removing masto-fe related settings from the database made it kinda faster 🤔
Author

I'm also getting these from time to time

Oct 05 18:23:32 koyu.space mix[20932]: 18:23:32.108 [notice] Application ex_aws exited: :stopped
Oct 05 18:23:32 koyu.space mix[20932]: 18:23:32.108 [notice] Application web_push_encryption exited: :stopped
Oct 05 18:23:34 koyu.space mix[20932]: 18:23:34.453 [info] Postgrex.Protocol (#PID<0.10863.1>) disconnected: ** (DBConnection.ConnectionError) client #PID<0.13157.1> exited

And the thread containment setting does nothing

I'm also getting these from time to time ``` Oct 05 18:23:32 koyu.space mix[20932]: 18:23:32.108 [notice] Application ex_aws exited: :stopped Oct 05 18:23:32 koyu.space mix[20932]: 18:23:32.108 [notice] Application web_push_encryption exited: :stopped Oct 05 18:23:34 koyu.space mix[20932]: 18:23:34.453 [info] Postgrex.Protocol (#PID<0.10863.1>) disconnected: ** (DBConnection.ConnectionError) client #PID<0.13157.1> exited ``` And the thread containment setting does nothing

hm

well keep it off anyhow

https://pgtune.leopard.in.ua/ might be of use, maybe your Db isn't using as much of your hardware as it could

hm well keep it off anyhow https://pgtune.leopard.in.ua/ might be of use, maybe your Db isn't using as much of your hardware as it could
Author

This is also doing nothing, it's a real hard one

This is also doing nothing, it's a real hard one

check your running queries again - is there a specific one that's taking a long time?

check your running queries again - is there a specific one that's taking a long time?
Author

check your running queries again - is there a specific one that's taking a long time?

This one has been trying to do something since start:

SELECT a0."id", a0."data", a0."local", a0."actor", a0."recipients", a0."inserted_at", a0."updated_at", o1."id", o1."data", o1."inserted_at", o1."updated_at" FROM "activities" AS a0 LEFT OUTER JOIN "objects" AS o1 ON (o1."data"->>'id') = COALESCE(a0."data"->'object'->>'id', a0."data"->>'object') WHERE (coalesce((a0."data")->'object'->>'id', (a0."data")->>'object') = ANY($1)) AND ((a0."data")->>'type' = $2)
> check your running queries again - is there a specific one that's taking a long time? This one has been trying to do something since start: ``` SELECT a0."id", a0."data", a0."local", a0."actor", a0."recipients", a0."inserted_at", a0."updated_at", o1."id", o1."data", o1."inserted_at", o1."updated_at" FROM "activities" AS a0 LEFT OUTER JOIN "objects" AS o1 ON (o1."data"->>'id') = COALESCE(a0."data"->'object'->>'id', a0."data"->>'object') WHERE (coalesce((a0."data")->'object'->>'id', (a0."data")->>'object') = ANY($1)) AND ((a0."data")->>'type' = $2) ```

none of that is particularly unusual

you could try turning on debug logging and seeing if you get anything?

you might also consider checking iotop to check if your IO is doing anything untoward

none of that is particularly unusual you could try turning on debug logging and seeing if you get anything? you might also consider checking `iotop` to check if your IO is doing anything untoward
Author

iotop is safe so I don't think it's disk IO. How do I enable debug logging?

iotop is safe so I don't think it's disk IO. How do I enable debug logging?

there's a bunch of level stuff in the config

there's a bunch of level stuff in the config

also, in case you didn't run it earlier

https://www.postgresql.org/docs/current/sql-vacuum.html

this may help

also, in case you didn't run it earlier https://www.postgresql.org/docs/current/sql-vacuum.html this may help
Author

I did an SQL vacuum and tried debug logging, but there's nothing of value. That's tough.

I did an SQL vacuum and tried debug logging, but there's nothing of value. That's tough.

you can also try the pg_repack thing

you'll need to run CREATE EXTENSION pg_repack on your database before you run it (that's why you ran into the not installed thing)

you can also try the pg_repack thing you'll need to run `CREATE EXTENSION pg_repack` on your database before you run it (that's why you ran into the not installed thing)
Author

pg_repack returned ERROR: query failed: ERROR: could not create unique index "index_371024" and the server is still slow

pg_repack returned `ERROR: query failed: ERROR: could not create unique index "index_371024"` and the server is still slow
bubblineyuri changed title from masto-fe errors when migrating from Pleroma to A lot of errors when migrating from Pleroma 2022-10-06 19:13:57 +00:00
Author

I might have found the issue. Running a vacuum using mix takes a long time to finish and has a very high IO load.

I might have found the issue. Running a vacuum using mix takes a long time to finish and has a very high IO load.
Author

Running a vacuum didn't improve performance though. I had high hopes.

Running a vacuum didn't improve performance though. I had high hopes.
Author

Now I'm getting these funny things with a few requests:

Oct 06 19:24:47 koyu.space mix[622149]: 19:24:47.607 request_id=FxuRVH9dchMu0pwAAG9D [error] Internal server error: {:timeout, {GenServer, :call, [Pleroma.Stats, :get_state, 5000]}}
Oct 06 19:24:47 koyu.space mix[622149]: 19:24:47.617 request_id=FxuRVIHE4U-zvBcAAIKi [error] Internal server error: {:timeout, {GenServer, :call, [Pleroma.Stats, :get_state, 5000]}}
Oct 06 19:24:47 koyu.space mix[622149]: 19:24:47.627 [error] #PID<0.4003.0> running Pleroma.Web.Endpoint (connection #PID<0.4001.0>, stream id 1) terminated
Oct 06 19:24:47 koyu.space mix[622149]: Server: fedi.koyu.space:80 (http)
Oct 06 19:24:47 koyu.space mix[622149]: Request: GET /static/font/css/animation.css
Oct 06 19:24:47 koyu.space mix[622149]: ** (exit) exited in: GenServer.call(Pleroma.Stats, :get_state, 5000)
Oct 06 19:24:47 koyu.space mix[622149]:     ** (EXIT) time out
Oct 06 19:24:53 koyu.space mix[622149]: 19:24:53.580 request_id=FxuRVqXJCWEvLSkAAIcC [error] Internal server error: {:timeout, {GenServer, :call, [Pleroma.Stats, :get_state, 5000]}}
Oct 06 19:24:53 koyu.space mix[622149]: 19:24:53.580 [error] #PID<0.4164.0> running Pleroma.Web.Endpoint (connection #PID<0.4153.0>, stream id 1) terminated
Oct 06 19:24:53 koyu.space mix[622149]: Server: fedi.koyu.space:80 (http)
Oct 06 19:24:53 koyu.space mix[622149]: Request: GET /static/themes/mammal.json
Oct 06 19:24:53 koyu.space mix[622149]: ** (exit) exited in: GenServer.call(Pleroma.Stats, :get_state, 5000)
Oct 06 19:24:53 koyu.space mix[622149]:     ** (EXIT) time out
Now I'm getting these funny things with a few requests: ``` Oct 06 19:24:47 koyu.space mix[622149]: 19:24:47.607 request_id=FxuRVH9dchMu0pwAAG9D [error] Internal server error: {:timeout, {GenServer, :call, [Pleroma.Stats, :get_state, 5000]}} Oct 06 19:24:47 koyu.space mix[622149]: 19:24:47.617 request_id=FxuRVIHE4U-zvBcAAIKi [error] Internal server error: {:timeout, {GenServer, :call, [Pleroma.Stats, :get_state, 5000]}} Oct 06 19:24:47 koyu.space mix[622149]: 19:24:47.627 [error] #PID<0.4003.0> running Pleroma.Web.Endpoint (connection #PID<0.4001.0>, stream id 1) terminated Oct 06 19:24:47 koyu.space mix[622149]: Server: fedi.koyu.space:80 (http) Oct 06 19:24:47 koyu.space mix[622149]: Request: GET /static/font/css/animation.css Oct 06 19:24:47 koyu.space mix[622149]: ** (exit) exited in: GenServer.call(Pleroma.Stats, :get_state, 5000) Oct 06 19:24:47 koyu.space mix[622149]: ** (EXIT) time out Oct 06 19:24:53 koyu.space mix[622149]: 19:24:53.580 request_id=FxuRVqXJCWEvLSkAAIcC [error] Internal server error: {:timeout, {GenServer, :call, [Pleroma.Stats, :get_state, 5000]}} Oct 06 19:24:53 koyu.space mix[622149]: 19:24:53.580 [error] #PID<0.4164.0> running Pleroma.Web.Endpoint (connection #PID<0.4153.0>, stream id 1) terminated Oct 06 19:24:53 koyu.space mix[622149]: Server: fedi.koyu.space:80 (http) Oct 06 19:24:53 koyu.space mix[622149]: Request: GET /static/themes/mammal.json Oct 06 19:24:53 koyu.space mix[622149]: ** (exit) exited in: GenServer.call(Pleroma.Stats, :get_state, 5000) Oct 06 19:24:53 koyu.space mix[622149]: ** (EXIT) time out ```
Author

This is very interesting. I'm occasionally getting Oct 06 19:28:24 koyu.space mix[622149]: 19:28:24.173 [notice] :alarm_handler: {:clear, :system_memory_high_watermark} even though I still have 6.9 GB (nice) available. Why is it not using my entire RAM to work with? Timeout on static assets went away once I restarted the PostgreSQL server.

This is very interesting. I'm occasionally getting `Oct 06 19:28:24 koyu.space mix[622149]: 19:28:24.173 [notice] :alarm_handler: {:clear, :system_memory_high_watermark}` even though I still have 6.9 GB (nice) available. Why is it not using my entire RAM to work with? Timeout on static assets went away once I restarted the PostgreSQL server.

hold up, your static assets were timing out?

that would heavily indicate that your server does not have the disk throughput to run a database

hold up, your _static_ assets were timing out? that would heavily indicate that your server does not have the disk throughput to run a database

please benchmark your disk to ensure it has the read and write speeds necessary to comfortably run a database

please benchmark your disk to ensure it has the read and write speeds necessary to comfortably run a database
Author

I also tried regenerating the entire config. That didn't help.

I also tried regenerating the entire config. That didn't help.
Author

please benchmark your disk to ensure it has the read and write speeds necessary to comfortably run a database

How do I do that? I mean Pleroma and bloaty Mastodon worked before.

> please benchmark your disk to ensure it has the read and write speeds necessary to comfortably run a database How do I do that? I mean Pleroma and bloaty Mastodon worked before.

use hdparm and dd

use hdparm and dd
Author
koyu@koyu:~$ dd if=/dev/urandom of=test.img bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 5.6276 s, 191 MB/s
koyu@koyu:~$ sudo hdparm -T /dev/sda

/dev/sda:
 Timing cached reads:   13338 MB in  2.00 seconds = 6674.36 MB/sec
``` koyu@koyu:~$ dd if=/dev/urandom of=test.img bs=1M count=1024 1024+0 records in 1024+0 records out 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 5.6276 s, 191 MB/s ``` ``` koyu@koyu:~$ sudo hdparm -T /dev/sda /dev/sda: Timing cached reads: 13338 MB in 2.00 seconds = 6674.36 MB/sec ```

that really should be sufficient

but I've given you all the resources I can, there's very little else I can do remotely to diagnose

that really should be sufficient but I've given you all the resources I can, there's very little else I can do remotely to diagnose
Author

So I figured out that the whole database got corrupted. Rebuilding the index resulted in the whole database being yeeted.

So I figured out that the whole database got corrupted. Rebuilding the index resulted in the whole database being yeeted.

if you've got a backup somewhere it should still be ok

if you've got a backup somewhere it should still be ok
Author

Well, I have a backup, but I can't restore it when the database index is corrupt

Well, I have a backup, but I can't restore it when the database index is corrupt
Contributor

Had the same issue, it was indeed related to database corruption.
I was able to restore a working state by:

(psql, connected in your akkoma database)

VACUUM FULL;
REINDEX database your-akkoma-database;

Took some time, but now the random timeout are fixed !

PS: your database will be busy during those operations so warn your users / close the service for some time.

Had the same issue, it was indeed related to database corruption. I was able to restore a working state by: (psql, connected in your akkoma database) ``` VACUUM FULL; REINDEX database your-akkoma-database; ``` Took some time, but now the random timeout are fixed ! PS: your database will be busy during those operations so warn your users / close the service for some time.
Sign in to join this conversation.
No milestone
No project
No assignees
3 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: AkkomaGang/akkoma#215
No description provided.