Handle failed fetches a bit better #743

Merged
floatingghost merged 34 commits from failed-fetch-processing into develop 2024-04-19 11:25:14 +00:00

pulls most of https://git.pleroma.social/pleroma/pleroma/-/merge_requests/4015 and adapts it with the various little changes we've made to this stuff

pulls most of https://git.pleroma.social/pleroma/pleroma/-/merge_requests/4015 and adapts it with the various little changes we've made to this stuff
floatingghost added 26 commits 2024-04-13 22:56:27 +00:00
It was only being called once and can be replaced with a case statement.
These tests relied on the removed Fetcher.fetch_object_from_id!/2 function injecting the error tuple into a log message with the exact words "Object containment failed."

We will keep this behavior by generating a similar log message, but perhaps this should do a better job of matching on the error tuple returned by Transmogrifier.handle_incoming/1
This is a definite sign the instance is blocked and they are enforcing authorized_fetch
This reverts commit d472bafec19cee269e7c943bafae7c805785acd7.
We were overzealous with matching on a raw error from the object fetch that should have never been relied on like this. If we can't fetch successfully we should assume that the collection is private.

Building a more expressive and universal error struct to match on may be something to consider.
Object fetch errors are logged in the fetcher module
Author
Owner

gonna document issues as i see them

user fetch validation can cause oban :error, so can ID collisions

{:transmogrifier, {:error, {:validate, ... }}}
Apr 14 00:19:57 mix[1271365]: - `:ok`
Apr 14 00:19:57 mix[1271365]: - `:discard`
Apr 14 00:19:57 mix[1271365]: - `{:ok, value}`
Apr 14 00:19:57 mix[1271365]: - `{:error, reason}`,
Apr 14 00:19:57 mix[1271365]: - `{:cancel, reason}`
Apr 14 00:19:57 mix[1271365]: - `{:discard, reason}`
Apr 14 00:19:57 mix[1271365]: - `{:snooze, seconds}`
Apr 14 00:19:57 mix[1271365]: Instead received:
Apr 14 00:19:57 mix[1271365]: :error
gonna document issues as i see them user fetch validation can cause oban :error, so can ID collisions ``` {:transmogrifier, {:error, {:validate, ... }}} Apr 14 00:19:57 mix[1271365]: - `:ok` Apr 14 00:19:57 mix[1271365]: - `:discard` Apr 14 00:19:57 mix[1271365]: - `{:ok, value}` Apr 14 00:19:57 mix[1271365]: - `{:error, reason}`, Apr 14 00:19:57 mix[1271365]: - `{:cancel, reason}` Apr 14 00:19:57 mix[1271365]: - `{:discard, reason}` Apr 14 00:19:57 mix[1271365]: - `{:snooze, seconds}` Apr 14 00:19:57 mix[1271365]: Instead received: Apr 14 00:19:57 mix[1271365]: :error ```
Author
Owner

ah right it probably doesn't hit the pattern match in remotefetcher

ah right it probably doesn't hit the pattern match in remotefetcher
Oneric commented 2024-04-14 01:08:13 +00:00
Member

ah right it probably doesn't hit the pattern match in remotefetcher

yep, Oban wants {:error, _info} but for everyting not explicitly matched, the last catch all just returns :error:, that oversight was fixed up on Pleroma’s side with https://git.pleroma.social/pleroma/pleroma/-/merge_requests/4077

> ah right it probably doesn't hit the pattern match in remotefetcher yep, Oban wants `{:error, _info}` but for everyting not explicitly matched, the last catch all just returns `:error:`, that oversight was fixed up on Pleroma’s side with https://git.pleroma.social/pleroma/pleroma/-/merge_requests/4077
Member

Also a question: the ported changes make an effort to sort the job into :discard instead of :error to avoid retries, but afaict all remote_fetcher jobs currently (by default) only have a single attempt anyway, so no retries should occur in the first place?
This is based on RemoteFetcherWorker using WorkerHelper, which sets the queue’s default max_attempts to 1 and overrides this for enqueued jobs based on a config value if present, but iinm there’s no default config value for remote_fetcher.

Oban docs say by default job uniqueness is checked across all states except :discarded and :cancelled. If jobs were retryable and we now discard jobs rather than exhausting all attempts with a backoff, won’t this in theory allow bad jobs to be reattempted via insert faster than it previously did with a backoff?

(the changes are still good to have, but just checking i didn’t misunderstand something here)

Also a question: the ported changes make an effort to sort the job into `:discard` instead of `:error` to avoid retries, but afaict all `remote_fetcher` jobs currently (by default) only have a single attempt anyway, so no retries should occur in the first place? This is based on `RemoteFetcherWorker` using `WorkerHelper`, which sets the queue’s default `max_attempts` to `1` and overrides this for enqueued jobs based on a config value if present, but iinm there’s no default config value for `remote_fetcher`. [Oban docs](https://hexdocs.pm/oban/2.15.2/Oban.html#module-unique-jobs) say by default job uniqueness is checked across all states except `:discarded` and `:cancelled`. If jobs were retryable and we now discard jobs rather than exhausting all attempts with a backoff, won’t this in theory allow bad jobs to be reattempted via insert _faster_ than it previously did with a backoff? *(the changes are still good to have, but just checking i didn’t misunderstand something here)*
floatingghost added 2 commits 2024-04-16 01:36:10 +00:00
Author
Owner

Oban docs say by default job uniqueness is checked across all states except :discarded and :cancelled

oh this may actually be an issue - we don't actually enable the unique check for any worker 🥴

maybe that should also get enabled as part of this, i'll make sure it works

>Oban docs say by default job uniqueness is checked across all states except :discarded and :cancelled oh this may actually be an issue - we don't actually enable the `unique` check for any worker 🥴 maybe that should also get enabled as part of this, i'll make sure it works
floatingghost added 1 commit 2024-04-16 01:54:25 +00:00
by default just prevent job floods with a 1-seconds
uniqueness check, but override in RemoteFetcherWorker
for 5 minute uniqueness check over all states

:infinity is an option we can go for maybe at some point,
but that would prevent any refetches so maybe not idk.
floatingghost added 1 commit 2024-04-16 01:59:02 +00:00
floatingghost added 1 commit 2024-04-16 02:07:36 +00:00
Author
Owner

marinating on ihba to see if this breaks, prayge it does not

marinating on ihba to see if this breaks, prayge it does not
floatingghost added 1 commit 2024-04-16 09:19:45 +00:00
floatingghost added 1 commit 2024-04-16 11:59:29 +00:00
Oneric reviewed 2024-04-17 14:13:57 +00:00
@ -8,1 +8,3 @@
use Pleroma.Workers.WorkerHelper, queue: "remote_fetcher"
use Pleroma.Workers.WorkerHelper,
queue: "remote_fetcher",
unique: [period: 300, states: Oban.Job.states()]
Oneric commented 2024-04-17 14:13:57 +00:00
Member

Multiple fetches of the same AP id can still occur if the depth arg differs; setting keys to only consider op and id should avoid this

Multiple fetches of the same AP id can still occur if the `depth` arg differs; setting `keys` to only consider `op` and `id` should avoid this
floatingghost added 1 commit 2024-04-19 10:39:41 +00:00
Author
Owner

has been running on IHBA with no noticable negative effects

has been running on IHBA with no noticable negative effects
floatingghost merged commit 0fee71f58f into develop 2024-04-19 11:25:14 +00:00
floatingghost deleted branch failed-fetch-processing 2024-04-19 11:25:15 +00:00
Sign in to join this conversation.
No description provided.