refactor deliver to extract actor from activity #377

Open
Johann150 wants to merge 8 commits from refactor-deliver-actor into main
Owner

Instead of explicitly stating the actor user ID in the deliver queue which can lead to desynchronization the actor to be used for signing should be extracted directly from the activity to be delivered.

We can be sure that there will always be an actor URI since anything we deliver will have to be an Activity (in the ActivityPub sense).

Instead of explicitly stating the actor user ID in the deliver queue which can lead to desynchronization the actor to be used for signing should be extracted directly from the activity to be delivered. We can be sure that there will always be an actor URI since anything we deliver will have to be an Activity (in the ActivityPub sense).
Johann150 added the
upkeep
label 2023-04-18 20:09:07 +00:00
Johann150 force-pushed refactor-deliver-actor from b5cf4540c3 to c66420c125 2023-04-18 20:35:24 +00:00 Compare
helene approved these changes 2023-04-19 20:53:59 +00:00
@ -21,0 +22,4 @@
// get user/actor for signing
const userUri = job.data.content.actor;
if (userUri == null) return 'error: missing actor';
const user = await uriPersonCache.fetch(userUri);
Contributor

Could it be possible that user might be wrong or null?
Also, do all activity classes/structs/objects have an actor field?

Additionally, there are no checks to make sure the user is local; though it shouldn't be a problem, but would fail later in case of a bug somewhere else.

Could it be possible that `user` might be wrong or null? Also, do all activity classes/structs/objects have an actor field? Additionally, there are no checks to make sure the user is local; though it shouldn't be a problem, but would fail later in case of a bug somewhere else.
Author
Owner

If a user cannot be found by the URI (and thus user == null), the activity must be malformed and it would be proper to fail and not deliver the malformed activity.

Since only local users have private keys stored in the database, a request in the name of a remote actor would fail at the point of trying to sign the activity.

I think it is a good idea to make these two things explicit.

As I noted above, anything that we can deliver must be an (ActivityPub) Activity. I think Activities only make sense if they have an Actor, at least Foundkey can only understand Activities that have an Actor. So I think it is reasonable to also only send activities with an actor, and fail on activities missing an actor.

If a user cannot be found by the URI (and thus `user == null`), the activity must be malformed and it would be proper to fail and not deliver the malformed activity. Since only local users have private keys stored in the database, a request in the name of a remote actor would fail at the point of trying to sign the activity. I think it is a good idea to make these two things explicit. As I noted above, anything that we can deliver must be an (ActivityPub) Activity. I think Activities only make sense if they have an Actor, at least Foundkey can only understand Activities that have an Actor. So I think it is reasonable to also only send activities with an actor, and fail on activities missing an actor.
Johann150 marked this conversation as resolved
@ -9,3 +9,1 @@
/** Actor */
user: ThinUser;
/** Activity */
/** Activity, containing the actor URI */
Contributor

Sounds like a TS typing would be preferable here, considering this field is required?

Sounds like a TS typing would be preferable here, considering this field is required?
Johann150 marked this conversation as resolved
Johann150 force-pushed refactor-deliver-actor from 3b994f14e5 to 4683e1a93e 2023-04-20 20:47:34 +00:00 Compare
Johann150 force-pushed refactor-deliver-actor from 4683e1a93e to 37e76326f7 2023-04-24 20:11:11 +00:00 Compare
Author
Owner

Refactored from resolving the actor URI on every delivery attempt to instead resolving the actor URI once when creating the queue job. While this does not achieve what I set out above, it still achieves that it is not necessary to explicitly pass the actor to the deliver function, which was the ultimate goal of the PR.

This change is because some field testing has shown that resolving the actor URI does have a noticeable performance penalty if done on every delivery attempt, and the amount of additional memory that is used in the queue job is negligible, since only the user ID is passed.

The big advantage that this still has is reducing the complexity of calling the deliver function.

Refactored from resolving the actor URI on every delivery attempt to instead resolving the actor URI once when creating the queue job. While this does not achieve what I set out above, it still achieves that it is not necessary to explicitly pass the actor to the `deliver` function, which was the ultimate goal of the PR. This change is because some field testing has shown that resolving the actor URI does have a noticeable performance penalty if done on every delivery attempt, and the amount of additional memory that is used in the queue job is negligible, since only the user ID is passed. The big advantage that this still has is reducing the complexity of calling the `deliver` function.
Johann150 force-pushed refactor-deliver-actor from 37e76326f7 to b374a79eb1 2023-04-24 20:56:00 +00:00 Compare
Contributor

Pulled in this PR on my instance for testing. will give feedback after a couple of days running it.

Pulled in this PR on my instance for testing. will give feedback after a couple of days running it.
Contributor

I have to revert back this PR on my instance. Impact on the queues are too severe.

  • Inbox and Deliver queue is lagging behind by almost 1h
  • Queues have a tendency to get stuck. Can't identify the jobs thats causing this, but it has to be something with this PR, since it didn't happen without it.
  • Had to raise inboxJobConcurency (or whatever its called) just for the inbox queue to be able to keep up somewhat
  • Disk Space gets eaten way more aggressively for some reason was something else unrelated
  • Since i merged this PR for testing on saturday, deliver queue has accumilated 46k failed jobs
  • Failed jobs count on inbox as at around 40k

Most of the failed jobs in deliver seem to be timeouts, haven't checked all failed jobs tho

TimeoutError: Promise timed out after 60000 milliseconds
    at Timeout._onTimeout (/home/misskey/misskey/node_modules/bull/lib/p-timeout.js:56:50)
    at listOnTimeout (node:internal/timers:564:17)
    at process.processTimers (node:internal/timers:507:7)
TimeoutError: Promise timed out after 60000 milliseconds
    at Timeout._onTimeout (/home/misskey/misskey/node_modules/bull/lib/p-timeout.js:56:50)
    at listOnTimeout (node:internal/timers:564:17)
    at process.processTimers (node:internal/timers:507:7)

In inbox, i have tons of jobs job stalled more than allowable limit

and also a couple with accessing a property on undefined

TypeError: Cannot read properties of undefined (reading 'keyId')
    at Queue.default (file:///home/misskey/misskey/packages/backend/built/queue/processors/inbox.js:26:34)
    at handlers.<computed> (/home/misskey/misskey/node_modules/bull/lib/queue.js:701:42)
    at Queue.processJob (/home/misskey/misskey/node_modules/bull/lib/queue.js:1162:22)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)

i've also seen some other properties (send some yesterday to @Johann150 )

thats all the infos i can give rn. i wish i had more, but its extremly hard to track down certain jobs and stuff, let alone search for it

I have to revert back this PR on my instance. Impact on the queues are too severe. - Inbox and Deliver queue is lagging behind by almost 1h - Queues have a tendency to get stuck. Can't identify the jobs thats causing this, but it has to be something with this PR, since it didn't happen without it. - Had to raise `inboxJobConcurency` (or whatever its called) just for the inbox queue to be able to keep up somewhat - ~~Disk Space gets eaten way more aggressively for some reason~~ was something else unrelated - Since i merged this PR for testing on saturday, deliver queue has accumilated 46k failed jobs - Failed jobs count on inbox as at around 40k Most of the failed jobs in deliver seem to be timeouts, haven't checked all failed jobs tho ``` TimeoutError: Promise timed out after 60000 milliseconds at Timeout._onTimeout (/home/misskey/misskey/node_modules/bull/lib/p-timeout.js:56:50) at listOnTimeout (node:internal/timers:564:17) at process.processTimers (node:internal/timers:507:7) TimeoutError: Promise timed out after 60000 milliseconds at Timeout._onTimeout (/home/misskey/misskey/node_modules/bull/lib/p-timeout.js:56:50) at listOnTimeout (node:internal/timers:564:17) at process.processTimers (node:internal/timers:507:7) ``` In inbox, i have tons of jobs `job stalled more than allowable limit` and also a couple with accessing a property on undefined ``` TypeError: Cannot read properties of undefined (reading 'keyId') at Queue.default (file:///home/misskey/misskey/packages/backend/built/queue/processors/inbox.js:26:34) at handlers.<computed> (/home/misskey/misskey/node_modules/bull/lib/queue.js:701:42) at Queue.processJob (/home/misskey/misskey/node_modules/bull/lib/queue.js:1162:22) at process.processTicksAndRejections (node:internal/process/task_queues:95:5) ``` i've also seen some other properties (send some yesterday to @Johann150 ) thats all the infos i can give rn. i wish i had more, but its extremly hard to track down certain jobs and stuff, let alone search for it
Johann150 force-pushed refactor-deliver-actor from ba77a81cc0 to 588a5502e9 2023-05-23 22:11:16 +00:00 Compare
Johann150 force-pushed refactor-deliver-actor from 588a5502e9 to e574d9cef1 2023-07-02 08:15:52 +00:00 Compare
All checks were successful
ci/woodpecker/push/lint-client Pipeline was successful
ci/woodpecker/push/build Pipeline was successful
ci/woodpecker/push/lint-backend Pipeline was successful
ci/woodpecker/push/lint-foundkey-js Pipeline was successful
ci/woodpecker/pr/build Pipeline was successful
ci/woodpecker/push/lint-sw Pipeline was successful
ci/woodpecker/pr/lint-backend Pipeline was successful
ci/woodpecker/pr/lint-foundkey-js Pipeline was successful
ci/woodpecker/pr/lint-client Pipeline was successful
ci/woodpecker/pr/lint-sw Pipeline was successful
ci/woodpecker/pr/test Pipeline was successful
ci/woodpecker/push/test Pipeline was successful
This pull request has changes conflicting with the target branch.
  • packages/backend/src/server/api/common/read-messaging-message.ts
You can also view command line instructions.

Step 1:

From your project repository, check out a new branch and test the changes.
git checkout -b refactor-deliver-actor main
git pull origin refactor-deliver-actor

Step 2:

Merge the changes and update on Forgejo.
git checkout main
git merge --no-ff refactor-deliver-actor
git push origin main
Sign in to join this conversation.
No reviewers
No Label
feature
fix
upkeep
No Milestone
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: FoundKeyGang/FoundKey#377
No description provided.