fix user deletion race condition #320

Manually merged
Johann150 merged 3 commits from user-deletion into main 2023-01-29 11:55:49 +00:00
Owner

Fixes the race condition described in https://github.com/misskey-dev/misskey/issues/7506.

The problem was that delivery attempts fetch the private key of the user being deleted from the database, however the record from the user table may be deleted before all delivery attempts are completed, cascade deleting the private key as well.

There are a few different strategies for solving this:

  1. Don't fetch the private key from the database every time. This approach was attempted by syuilo by storing the users private key along with the job. However the pull request implementing this was closed unmerged with unknown reason. Perhaps the additional memory usage by Redis was a problem, especially when the job numbers get above a certain number.
  2. As another version of (1), use LD-Signatures. This means you do not have to create new signatures for each request, since only the content of the request and not the request itself is signed. LD-Signatures in the "self destruct" operation, where user delete activities are sent for each user. However, even Mastodon's own documentation advises against implementing them, not least because the implementation Mastodon uses is outdated and was based on a superseeded draft standard.
  3. Calculate a theoretical time delay based on the number of retries, time delay between retries and timeout of each job run. Delete the user record with this time delay. However, this approach can re-develop this same race condition under some circumstances, e.g. pausing the queue or otherwise delaying the scheduled execution of the delivery jobs, leading to the deletion job running before the delivery jobs.
  4. Make deletion depend on delivery jobs. Don't delete the user record as long as delivery jobs are still running. Don't compute it theoretically like in (3) but actually track the completion. This is roughly implemented here. Of course there could be better ways to implement it (for example, bullmq, a modernized version of bull seems to have "flows", a kind of job dependance system).

The exact approach here is basically (atomic) reference counting & garbage collecting the user record: Before delivery is started, the number of delivery attempts that will be started are stored in the isDeleted field. Each time a delivery job completes, terminally fails or is cleared from the queue, the value is decremented. Records that have reached 0 outstanding delivery attempts are picked up by the database cleanup job that deletes expired data.

Fixes the race condition described in https://github.com/misskey-dev/misskey/issues/7506. The problem was that delivery attempts fetch the private key of the user being deleted from the database, however the record from the `user` table may be deleted before all delivery attempts are completed, cascade deleting the private key as well. There are a few different strategies for solving this: 1) **Don't fetch the private key from the database every time.** This approach was attempted by syuilo by storing the users private key along with the job. However the pull request implementing this was closed unmerged with unknown reason. Perhaps the additional memory usage by Redis was a problem, especially when the job numbers get above a certain number. 2) As another version of (1), **use LD-Signatures**. This means you do not have to create new signatures for each request, since only the content of the request and not the request itself is signed. LD-Signatures in the "self destruct" operation, where user delete activities are sent for each user. However, even [Mastodon's own documentation](https://docs.joinmastodon.org/spec/security/#ld) advises against implementing them, not least because the implementation Mastodon uses is outdated and was based on a superseeded draft standard. 3) Calculate a theoretical time delay based on the number of retries, time delay between retries and timeout of each job run. **Delete the `user` record with this time delay.** However, this approach can re-develop this same race condition under some circumstances, e.g. pausing the queue or otherwise delaying the scheduled execution of the delivery jobs, leading to the deletion job running before the delivery jobs. 4) **Make deletion depend on delivery jobs.** Don't delete the user record as long as delivery jobs are still running. Don't compute it theoretically like in (3) but actually track the completion. This is roughly implemented here. Of course there could be better ways to implement it (for example, `bullmq`, a modernized version of `bull` seems to have "flows", a kind of job dependance system). The exact approach here is basically (atomic) reference counting & garbage collecting the user record: Before delivery is started, the number of delivery attempts that will be started are stored in the `isDeleted` field. Each time a delivery job completes, terminally fails or is cleared from the queue, the value is decremented. Records that have reached 0 outstanding delivery attempts are picked up by the database cleanup job that deletes expired data.
Johann150 added 3 commits 2023-01-08 21:31:09 +00:00
server: delete records of fully deleted users
Some checks failed
ci/woodpecker/push/lint-backend Pipeline was successful
ci/woodpecker/push/lint-foundkey-js Pipeline was successful
ci/woodpecker/push/build Pipeline was successful
ci/woodpecker/push/lint-client Pipeline was successful
ci/woodpecker/push/lint-sw Pipeline was successful
ci/woodpecker/push/test Pipeline was successful
ci/woodpecker/pr/lint-foundkey-js Pipeline was successful
ci/woodpecker/pr/lint-client Pipeline failed
ci/woodpecker/pr/lint-backend Pipeline failed
ci/woodpecker/pr/build Pipeline was successful
ci/woodpecker/pr/lint-sw Pipeline failed
ci/woodpecker/pr/test Pipeline failed
b245d39b6e
Johann150 added the
fix
label 2023-01-10 16:52:25 +00:00
Johann150 manually merged commit cc83cbe523 into main 2023-01-29 11:55:49 +00:00
Johann150 deleted branch user-deletion 2023-01-29 11:58:39 +00:00
Sign in to join this conversation.
No reviewers
No labels
feature
fix
upkeep
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: FoundKeyGang/FoundKey#320
No description provided.