Handle failed fetches a bit better #743

floatingghost · 2024-04-13T22:56:27Z

floatingghost commented

2024-04-13 22:56:27 +00:00

pulls most of https://git.pleroma.social/pleroma/pleroma/-/merge_requests/4015 and adapts it with the various little changes we've made to this stuff

floatingghost added 26 commits 2024-04-13 22:56:27 +00:00

Remove Fetcher.fetch_object_from_id!/2 c241b5b09f

It was only being called once and can be replaced with a case statement.

Fix Transmogrifier tests ac4cc619ea

These tests relied on the removed Fetcher.fetch_object_from_id!/2 function injecting the error tuple into a log message with the exact words "Object containment failed."

We will keep this behavior by generating a similar log message, but perhaps this should do a better job of matching on the error tuple returned by Transmogrifier.handle_incoming/1

Mark instances as unreachable when returning a 403 from an object fetch 4c29366fe5

This is a definite sign the instance is blocked and they are enforcing authorized_fetch

Consolidate the HTTP status code checking into the private get_object/1 4ff22a409a

Cancel remote fetch jobs for deleted objects 132036f951

Changelogs 160d113b30

Remove mistaken duplicate fetch 6d368808d3

Skip remote fetch jobs for unreachable instances e2b04fac5a

Revert "Mark instances as unreachable when returning a 403 from an object fetch" 30d63aaa6e

This reverts commit d472bafec19cee269e7c943bafae7c805785acd7.

Fix detection of user follower collection being private eeed051a0f

We were overzealous with matching on a raw error from the object fetch that should have never been relied on like this. If we can't fetch successfully we should assume that the collection is private.

Building a more expressive and universal error struct to match on may be something to consider.

RemoteFetcherWorker Oban job tests 331710b6bb

Set Logger level to error 825ae46bfa

Conslidate log messages for object fetcher failures and leverage Logger.metadata 3c54f407c5

Remove duplicate log messages from Transmogrifier d69cba1b93

Object fetch errors are logged in the fetcher module

Formatting 53a9413b95

Leverage existing atoms as return errors for the object fetcher 7e5004b3e2

Prevent requeuing Remote Fetcher jobs that exceed thread depth ff515c05c3

Improve test descriptions f31b262aec

Handle 401s as I have observed it in the wild c0532bcae0

Oban jobs should be discarded on permanent errors fed7a78c77

Allow the Remote Fetcher to attempt fetching an unreachable instance 2e369aef71

formatting 7f6e35ece4

require logger 49ed27cd96

Bring our adjustments into line with atom-failure 33fb74043d

Fix quote test 18442dcc7e

fix pattern matching in fetch errors 2fc25980d1

floatingghost commented

2024-04-13 23:22:44 +00:00

gonna document issues as i see them

user fetch validation can cause oban :error, so can ID collisions

{:transmogrifier, {:error, {:validate, ... }}}
Apr 14 00:19:57 mix[1271365]: - `:ok`
Apr 14 00:19:57 mix[1271365]: - `:discard`
Apr 14 00:19:57 mix[1271365]: - `{:ok, value}`
Apr 14 00:19:57 mix[1271365]: - `{:error, reason}`,
Apr 14 00:19:57 mix[1271365]: - `{:cancel, reason}`
Apr 14 00:19:57 mix[1271365]: - `{:discard, reason}`
Apr 14 00:19:57 mix[1271365]: - `{:snooze, seconds}`
Apr 14 00:19:57 mix[1271365]: Instead received:
Apr 14 00:19:57 mix[1271365]: :error

gonna document issues as i see them user fetch validation can cause oban :error, so can ID collisions ``` {:transmogrifier, {:error, {:validate, ... }}} Apr 14 00:19:57 mix[1271365]: - `:ok` Apr 14 00:19:57 mix[1271365]: - `:discard` Apr 14 00:19:57 mix[1271365]: - `{:ok, value}` Apr 14 00:19:57 mix[1271365]: - `{:error, reason}`, Apr 14 00:19:57 mix[1271365]: - `{:cancel, reason}` Apr 14 00:19:57 mix[1271365]: - `{:discard, reason}` Apr 14 00:19:57 mix[1271365]: - `{:snooze, seconds}` Apr 14 00:19:57 mix[1271365]: Instead received: Apr 14 00:19:57 mix[1271365]: :error ```

floatingghost commented

2024-04-13 23:24:51 +00:00

ah right it probably doesn't hit the pattern match in remotefetcher

Oneric commented

2024-04-14 01:08:13 +00:00

ah right it probably doesn't hit the pattern match in remotefetcher

yep, Oban wants {:error, _info} but for everyting not explicitly matched, the last catch all just returns :error:, that oversight was fixed up on Pleroma’s side with https://git.pleroma.social/pleroma/pleroma/-/merge_requests/4077

> ah right it probably doesn't hit the pattern match in remotefetcher yep, Oban wants `{:error, _info}` but for everyting not explicitly matched, the last catch all just returns `:error:`, that oversight was fixed up on Pleroma’s side with https://git.pleroma.social/pleroma/pleroma/-/merge_requests/4077

Oneric commented

2024-04-14 01:34:56 +00:00

Also a question: the ported changes make an effort to sort the job into :discard instead of :error to avoid retries, but afaict all remote_fetcher jobs currently (by default) only have a single attempt anyway, so no retries should occur in the first place?
This is based on RemoteFetcherWorker using WorkerHelper, which sets the queue’s default max_attempts to 1 and overrides this for enqueued jobs based on a config value if present, but iinm there’s no default config value for remote_fetcher.

Oban docs say by default job uniqueness is checked across all states except :discarded and :cancelled. If jobs were retryable and we now discard jobs rather than exhausting all attempts with a backoff, won’t this in theory allow bad jobs to be reattempted via insert faster than it previously did with a backoff?

(the changes are still good to have, but just checking i didn’t misunderstand something here)

Also a question: the ported changes make an effort to sort the job into `:discard` instead of `:error` to avoid retries, but afaict all `remote_fetcher` jobs currently (by default) only have a single attempt anyway, so no retries should occur in the first place? This is based on `RemoteFetcherWorker` using `WorkerHelper`, which sets the queue’s default `max_attempts` to `1` and overrides this for enqueued jobs based on a config value if present, but iinm there’s no default config value for `remote_fetcher`. [Oban docs](https://hexdocs.pm/oban/2.15.2/Oban.html#module-unique-jobs) say by default job uniqueness is checked across all states except `:discarded` and `:cancelled`. If jobs were retryable and we now discard jobs rather than exhausting all attempts with a backoff, won’t this in theory allow bad jobs to be reattempted via insert _faster_ than it previously did with a backoff? *(the changes are still good to have, but just checking i didn’t misunderstand something here)*

floatingghost added 2 commits 2024-04-16 01:36:10 +00:00

Make sure we return the right format for oban b7dd739de1

changelog entry 1896ff1ab0

floatingghost commented

2024-04-16 01:40:07 +00:00

Oban docs say by default job uniqueness is checked across all states except :discarded and :cancelled

oh this may actually be an issue - we don't actually enable the unique check for any worker 🥴

maybe that should also get enabled as part of this, i'll make sure it works

>Oban docs say by default job uniqueness is checked across all states except :discarded and :cancelled oh this may actually be an issue - we don't actually enable the `unique` check for any worker 🥴 maybe that should also get enabled as part of this, i'll make sure it works

floatingghost added 1 commit 2024-04-16 01:54:25 +00:00

Enable oban job uniqueness 5043571084

by default just prevent job floods with a 1-seconds
uniqueness check, but override in RemoteFetcherWorker
for 5 minute uniqueness check over all states

:infinity is an option we can go for maybe at some point,
but that would prevent any refetches so maybe not idk.

floatingghost added 1 commit 2024-04-16 01:59:02 +00:00

oban options should be a keyword list d70fa16383

floatingghost added 1 commit 2024-04-16 02:07:36 +00:00

mix format says no d2cee15c15

floatingghost commented

2024-04-16 02:25:28 +00:00

marinating on ihba to see if this breaks, prayge it does not

floatingghost added 1 commit 2024-04-16 09:19:45 +00:00

make xmerl shut up about markup b2c29527fb

floatingghost added 1 commit 2024-04-16 11:59:29 +00:00

Merge branch 'develop' into failed-fetch-processing 123db1abc4

Oneric reviewed 2024-04-17 14:13:57 +00:00

lib/pleroma/workers/remote_fetcher_worker.ex Outdated

					
				@ -8,1 +8,3 @@

				  use Pleroma.Workers.WorkerHelper, queue: "remote_fetcher"

				  use Pleroma.Workers.WorkerHelper,

				    queue: "remote_fetcher",

				    unique: [period: 300, states: Oban.Job.states()]