More federation backoff tweaks #934

Merged
Oneric merged 2 commits from Oneric/akkoma:federation_backoff_tweaks into develop 2025-06-07 18:24:27 +00:00
Owner

Follow-up to #884

  • the default retry count for outgoing federation is raised by one, so as to tolerate 24h downtime by default
  • receiver_worker attempts are spaced out more to better handle resolving objects from temporarily overwhelmed servers (last attempt after about 20min compared to ~1.5 minutes before)
Follow-up to #884 - the default retry count for outgoing federation is raised by one, so as to tolerate 24h downtime by default - receiver_worker attempts are spaced out more to better handle resolving objects from temporarily overwhelmed servers *(last attempt after about 20min compared to ~1.5 minutes before)*
Oneric added 2 commits 2025-05-18 12:31:07 +00:00
We now tolerate a whole day of downtime by default
instead of only about three hours.
federation/in: space out receiver retries more
All checks were successful
ci/woodpecker/pr/lint Pipeline was successful
ci/woodpecker/pr/test/2 Pipeline was successful
ci/woodpecker/pr/test/1 Pipeline was successful
ci/woodpecker/pr/build-arm64 Pipeline was successful
ci/woodpecker/pr/build-amd64 Pipeline was successful
ci/woodpecker/pr/docs Pipeline was successful
ci/woodpecker/pull_request_closed/lint Pipeline was successful
ci/woodpecker/pull_request_closed/test/1 Pipeline was successful
ci/woodpecker/pull_request_closed/test/2 Pipeline was successful
ci/woodpecker/pull_request_closed/build-arm64 Pipeline was successful
ci/woodpecker/pull_request_closed/build-amd64 Pipeline was successful
ci/woodpecker/pull_request_closed/docs Pipeline was successful
258841c310
The most common permanent receiver error arises for likes/boosts
when we don’t yet know the rlevant object and can't fetch it
due to the remote being overwhelmed or otherwise down.

Before this changes all retries were rather rapid
thus not giving the remote enough time to recover
and usually all failing. Now the remote has about 20
minutes to recover before we give up.

Transient errors from race conditions and (presumably)
weird database-cache interactions also occur regularly.
However, they resolve within the first one or two retries
and those intial retries still happen relatively quickly.
Oneric changed title from More federation backoff twekas to More federation backoff tweaks 2025-05-18 12:39:29 +00:00
Oneric merged commit 1b7d9a0f76 into develop 2025-06-07 18:24:27 +00:00
Oneric deleted branch federation_backoff_tweaks 2025-06-07 18:24:27 +00:00
Sign in to join this conversation.
No description provided.