refactor deliver to extract actor from activity #377

Johann150 · 2023-04-18T20:09:07Z

Johann150 commented

2023-04-18 20:09:07 +00:00

Instead of explicitly stating the actor user ID in the deliver queue which can lead to desynchronization the actor to be used for signing should be extracted directly from the activity to be delivered.

We can be sure that there will always be an actor URI since anything we deliver will have to be an Activity (in the ActivityPub sense).

Instead of explicitly stating the actor user ID in the deliver queue which can lead to desynchronization the actor to be used for signing should be extracted directly from the activity to be delivered. We can be sure that there will always be an actor URI since anything we deliver will have to be an Activity (in the ActivityPub sense).

Johann150 added the

upkeep

label 2023-04-18 20:09:07 +00:00

Johann150 force-pushed refactor-deliver-actor from b5cf4540c3 to c66420c125

2023-04-18 20:35:24 +00:00

Compare

helene approved these changes 2023-04-19 20:53:59 +00:00

packages/backend/src/queue/processors/deliver.ts Outdated

					
				@ -21,0 +22,4 @@

					// get user/actor for signing

					const userUri = job.data.content.actor;

					if (userUri == null) return 'error: missing actor';

					const user = await uriPersonCache.fetch(userUri);

helene commented

2023-04-19 18:10:00 +00:00

Could it be possible that user might be wrong or null?
Also, do all activity classes/structs/objects have an actor field?

Additionally, there are no checks to make sure the user is local; though it shouldn't be a problem, but would fail later in case of a bug somewhere else.

Could it be possible that `user` might be wrong or null? Also, do all activity classes/structs/objects have an actor field? Additionally, there are no checks to make sure the user is local; though it shouldn't be a problem, but would fail later in case of a bug somewhere else.

Johann150 commented

2023-04-20 20:36:02 +00:00

If a user cannot be found by the URI (and thus user == null), the activity must be malformed and it would be proper to fail and not deliver the malformed activity.

Since only local users have private keys stored in the database, a request in the name of a remote actor would fail at the point of trying to sign the activity.

I think it is a good idea to make these two things explicit.

As I noted above, anything that we can deliver must be an (ActivityPub) Activity. I think Activities only make sense if they have an Actor, at least Foundkey can only understand Activities that have an Actor. So I think it is reasonable to also only send activities with an actor, and fail on activities missing an actor.

If a user cannot be found by the URI (and thus `user == null`), the activity must be malformed and it would be proper to fail and not deliver the malformed activity. Since only local users have private keys stored in the database, a request in the name of a remote actor would fail at the point of trying to sign the activity. I think it is a good idea to make these two things explicit. As I noted above, anything that we can deliver must be an (ActivityPub) Activity. I think Activities only make sense if they have an Actor, at least Foundkey can only understand Activities that have an Actor. So I think it is reasonable to also only send activities with an actor, and fail on activities missing an actor.

Johann150 marked this conversation as resolved

packages/backend/src/queue/types.ts Outdated

					
				@ -9,3 +9,1 @@

					/** Actor */

					user: ThinUser;

					/** Activity */

					/** Activity, containing the actor URI */

helene commented

2023-04-19 20:48:06 +00:00

Sounds like a TS typing would be preferable here, considering this field is required?

Johann150 marked this conversation as resolved

Johann150 force-pushed refactor-deliver-actor from 3b994f14e5 to 4683e1a93e

2023-04-20 20:47:34 +00:00

Compare

Johann150 force-pushed refactor-deliver-actor from 4683e1a93e to 37e76326f7

2023-04-24 20:11:11 +00:00

Compare

Johann150 commented

2023-04-24 20:22:28 +00:00

Refactored from resolving the actor URI on every delivery attempt to instead resolving the actor URI once when creating the queue job. While this does not achieve what I set out above, it still achieves that it is not necessary to explicitly pass the actor to the deliver function, which was the ultimate goal of the PR.

This change is because some field testing has shown that resolving the actor URI does have a noticeable performance penalty if done on every delivery attempt, and the amount of additional memory that is used in the queue job is negligible, since only the user ID is passed.

The big advantage that this still has is reducing the complexity of calling the deliver function.

Refactored from resolving the actor URI on every delivery attempt to instead resolving the actor URI once when creating the queue job. While this does not achieve what I set out above, it still achieves that it is not necessary to explicitly pass the actor to the `deliver` function, which was the ultimate goal of the PR. This change is because some field testing has shown that resolving the actor URI does have a noticeable performance penalty if done on every delivery attempt, and the amount of additional memory that is used in the queue job is negligible, since only the user ID is passed. The big advantage that this still has is reducing the complexity of calling the `deliver` function.

Johann150 force-pushed refactor-deliver-actor from 37e76326f7 to b374a79eb1

2023-04-24 20:56:00 +00:00

Compare

puniko commented

2023-05-06 18:19:50 +00:00

Pulled in this PR on my instance for testing. will give feedback after a couple of days running it.

puniko commented

2023-05-08 06:20:32 +00:00

I have to revert back this PR on my instance. Impact on the queues are too severe.

Inbox and Deliver queue is lagging behind by almost 1h
Queues have a tendency to get stuck. Can't identify the jobs thats causing this, but it has to be something with this PR, since it didn't happen without it.
Had to raise inboxJobConcurency (or whatever its called) just for the inbox queue to be able to keep up somewhat
~~Disk Space gets eaten way more aggressively for some reason~~ was something else unrelated
Since i merged this PR for testing on saturday, deliver queue has accumilated 46k failed jobs
Failed jobs count on inbox as at around 40k

Most of the failed jobs in deliver seem to be timeouts, haven't checked all failed jobs tho

TimeoutError: Promise timed out after 60000 milliseconds
    at Timeout._onTimeout (/home/misskey/misskey/node_modules/bull/lib/p-timeout.js:56:50)
    at listOnTimeout (node:internal/timers:564:17)
    at process.processTimers (node:internal/timers:507:7)
TimeoutError: Promise timed out after 60000 milliseconds
    at Timeout._onTimeout (/home/misskey/misskey/node_modules/bull/lib/p-timeout.js:56:50)
    at listOnTimeout (node:internal/timers:564:17)
    at process.processTimers (node:internal/timers:507:7)

In inbox, i have tons of jobs job stalled more than allowable limit

and also a couple with accessing a property on undefined

TypeError: Cannot read properties of undefined (reading 'keyId')
    at Queue.default (file:///home/misskey/misskey/packages/backend/built/queue/processors/inbox.js:26:34)
    at handlers.<computed> (/home/misskey/misskey/node_modules/bull/lib/queue.js:701:42)
    at Queue.processJob (/home/misskey/misskey/node_modules/bull/lib/queue.js:1162:22)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)

i've also seen some other properties (send some yesterday to @Johann150 )

thats all the infos i can give rn. i wish i had more, but its extremly hard to track down certain jobs and stuff, let alone search for it

I have to revert back this PR on my instance. Impact on the queues are too severe. - Inbox and Deliver queue is lagging behind by almost 1h - Queues have a tendency to get stuck. Can't identify the jobs thats causing this, but it has to be something with this PR, since it didn't happen without it. - Had to raise `inboxJobConcurency` (or whatever its called) just for the inbox queue to be able to keep up somewhat - ~~Disk Space gets eaten way more aggressively for some reason~~ was something else unrelated - Since i merged this PR for testing on saturday, deliver queue has accumilated 46k failed jobs - Failed jobs count on inbox as at around 40k Most of the failed jobs in deliver seem to be timeouts, haven't checked all failed jobs tho ``` TimeoutError: Promise timed out after 60000 milliseconds at Timeout._onTimeout (/home/misskey/misskey/node_modules/bull/lib/p-timeout.js:56:50) at listOnTimeout (node:internal/timers:564:17) at process.processTimers (node:internal/timers:507:7) TimeoutError: Promise timed out after 60000 milliseconds at Timeout._onTimeout (/home/misskey/misskey/node_modules/bull/lib/p-timeout.js:56:50) at listOnTimeout (node:internal/timers:564:17) at process.processTimers (node:internal/timers:507:7) ``` In inbox, i have tons of jobs `job stalled more than allowable limit` and also a couple with accessing a property on undefined ``` TypeError: Cannot read properties of undefined (reading 'keyId') at Queue.default (file:///home/misskey/misskey/packages/backend/built/queue/processors/inbox.js:26:34) at handlers.<computed> (/home/misskey/misskey/node_modules/bull/lib/queue.js:701:42) at Queue.processJob (/home/misskey/misskey/node_modules/bull/lib/queue.js:1162:22) at process.processTicksAndRejections (node:internal/process/task_queues:95:5) ``` i've also seen some other properties (send some yesterday to @Johann150 ) thats all the infos i can give rn. i wish i had more, but its extremly hard to track down certain jobs and stuff, let alone search for it

Johann150 force-pushed refactor-deliver-actor from ba77a81cc0 to 588a5502e9

2023-05-23 22:11:16 +00:00

Compare

Johann150 force-pushed refactor-deliver-actor from 588a5502e9 to e574d9cef1

2023-07-02 08:15:52 +00:00

Compare

ci/woodpecker/push/lint-foundkey-js Pipeline was successful

Details

ci/woodpecker/pr/build Pipeline was successful

Details

ci/woodpecker/push/lint-sw Pipeline was successful

Details

ci/woodpecker/pr/lint-sw Pipeline was successful

Details

ci/woodpecker/pr/test Pipeline was successful

Details

ci/woodpecker/push/test Pipeline was successful

Details

ci/woodpecker/push/lint-client Pipeline was successful

Details

ci/woodpecker/push/build Pipeline was successful

Details

ci/woodpecker/push/lint-backend Pipeline was successful

Details

ci/woodpecker/pr/lint-backend Pipeline was successful

Details

ci/woodpecker/pr/lint-foundkey-js Pipeline was successful

Details

ci/woodpecker/pr/lint-client Pipeline was successful

Details

This pull request has changes conflicting with the target branch.

packages/backend/src/server/api/common/read-messaging-message.ts
packages/backend/src/services/following/create.ts

View command line instructions

Checkout

From your project repository, check out a new branch and test the changes.

git fetch -u origin refactor-deliver-actor:refactor-deliver-actor

git checkout refactor-deliver-actor

Merge

Merge the changes and update on Forgejo.

git checkout main

git merge --no-ff refactor-deliver-actor

git checkout refactor-deliver-actor

git rebase main

git checkout main

git merge --ff-only refactor-deliver-actor

git checkout refactor-deliver-actor

git rebase main

git checkout main

git merge --no-ff refactor-deliver-actor

git checkout main

git merge --squash refactor-deliver-actor

git checkout main

git merge --ff-only refactor-deliver-actor

git checkout main

git merge refactor-deliver-actor

git push origin main

Sign in to join this conversation.

No reviewers

No labels

feature

fix

upkeep

No milestone

No project

No assignees

3 participants

Notifications

Due date

The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: FoundKeyGang/FoundKey#377

refactor deliver to extract actor from activity #377

Checkout

Merge

refactor `deliver` to extract actor from activity #377