http: workaround finch bug by selectively forcing HTTP1 #980

Closed
Oneric wants to merge 1 commit from Oneric:workaround-finch-http2-window-size into develop
Owner

Should avoid #978

@floatingghost can you test to confirm it works?

Should avoid #978 @floatingghost can you test to confirm it works?
Oneric force-pushed workaround-finch-http2-window-size from 155b77bd23
All checks were successful
ci/woodpecker/pr/test/1 Pipeline was successful
ci/woodpecker/pr/test/2 Pipeline was successful
to 444ce437d7
All checks were successful
ci/woodpecker/pr/test/1 Pipeline was successful
ci/woodpecker/pr/test/2 Pipeline was successful
2025-09-21 10:06:26 +00:00
Compare

unfortunately no, it still uses the http2 pool

log

Sep 22 14:04:38 ihba elixir[3419978]:   adapter: [
Sep 22 14:04:38 ihba elixir[3419978]:     name: MyFinch,
Sep 22 14:04:38 ihba elixir[3419978]:     pools: %{
Sep 22 14:04:38 ihba elixir[3419978]:       default: [
Sep 22 14:04:38 ihba elixir[3419978]:         size: 10,
Sep 22 14:04:38 ihba elixir[3419978]:         pool_max_idle_time: 60000,
Sep 22 14:04:38 ihba elixir[3419978]:         conn_max_idle_time: 5000,
Sep 22 14:04:38 ihba elixir[3419978]:         protocols: [:http1],
Sep 22 14:04:38 ihba elixir[3419978]:         conn_opts: [
Sep 22 14:04:38 ihba elixir[3419978]:           transport_opts: [inet6: true],
Sep 22 14:04:38 ihba elixir[3419978]:           client_settings: [enable_push: false]
Sep 22 14:04:38 ihba elixir[3419978]:         ]
Sep 22 14:04:38 ihba elixir[3419978]:       ]
Sep 22 14:04:38 ihba elixir[3419978]:     }
Sep 22 14:04:38 ihba elixir[3419978]:   ]
Sep 22 14:04:38 ihba elixir[3419978]: ]
Sep 22 14:04:38 ihba elixir[3419978]: 14:04:38.458 request_id=GGefyrtlxoDd_9UAAAQJ [warning] ExAws: HTTP ERROR: %Mint.HTTPError{reason: {:exceeds_window_size, :request, 65535}, module: Mint.HTTP2} for URL: "https://s3.fr-par.scw.cloud/i>
Sep 22 14:04:38 ihba elixir[3419978]: 14:04:38.458 request_id=GGefyrtlxoDd_9UAAAQJ [error] Elixir.Pleroma.Uploaders.S3: {:error, %Mint.HTTPError{reason: {:exceeds_window_size, :request, 65535}, module: Mint.HTTP2}}
Sep 22 14:04:38 ihba elixir[3419978]: 14:04:38.458 request_id=GGefyrtlxoDd_9UAAAQJ [error] Elixir.Pleroma.Upload store (using Pleroma.Uploaders.S3) failed: "S3 Upload failed"

the protocols are correctly set, but they seem to get ignored

unfortunately no, it still uses the http2 pool log ``` Sep 22 14:04:38 ihba elixir[3419978]: adapter: [ Sep 22 14:04:38 ihba elixir[3419978]: name: MyFinch, Sep 22 14:04:38 ihba elixir[3419978]: pools: %{ Sep 22 14:04:38 ihba elixir[3419978]: default: [ Sep 22 14:04:38 ihba elixir[3419978]: size: 10, Sep 22 14:04:38 ihba elixir[3419978]: pool_max_idle_time: 60000, Sep 22 14:04:38 ihba elixir[3419978]: conn_max_idle_time: 5000, Sep 22 14:04:38 ihba elixir[3419978]: protocols: [:http1], Sep 22 14:04:38 ihba elixir[3419978]: conn_opts: [ Sep 22 14:04:38 ihba elixir[3419978]: transport_opts: [inet6: true], Sep 22 14:04:38 ihba elixir[3419978]: client_settings: [enable_push: false] Sep 22 14:04:38 ihba elixir[3419978]: ] Sep 22 14:04:38 ihba elixir[3419978]: ] Sep 22 14:04:38 ihba elixir[3419978]: } Sep 22 14:04:38 ihba elixir[3419978]: ] Sep 22 14:04:38 ihba elixir[3419978]: ] Sep 22 14:04:38 ihba elixir[3419978]: 14:04:38.458 request_id=GGefyrtlxoDd_9UAAAQJ [warning] ExAws: HTTP ERROR: %Mint.HTTPError{reason: {:exceeds_window_size, :request, 65535}, module: Mint.HTTP2} for URL: "https://s3.fr-par.scw.cloud/i> Sep 22 14:04:38 ihba elixir[3419978]: 14:04:38.458 request_id=GGefyrtlxoDd_9UAAAQJ [error] Elixir.Pleroma.Uploaders.S3: {:error, %Mint.HTTPError{reason: {:exceeds_window_size, :request, 65535}, module: Mint.HTTP2}} Sep 22 14:04:38 ihba elixir[3419978]: 14:04:38.458 request_id=GGefyrtlxoDd_9UAAAQJ [error] Elixir.Pleroma.Upload store (using Pleroma.Uploaders.S3) failed: "S3 Upload failed" ``` the protocols are correctly set, but they seem to get ignored
Author
Owner

Soooo, the thing is Finch.request cannot ever override pool options and only takes three request-specific options as documented:

  • :receive_timeout
  • :pool_timeout
  • :request_timeout

With the latter two being limited to HTTP1. Checking finch’s code confirms this.

This means we should actually NEVER add AdapterHelper.options to individiual requests and the per-request cacerts addition is also actually doing nothing at all andwe rely on Mint’s default of querying the CAStore package. It is true though that setting cacerts outselves on pool creation breaks plain http:// connections.

This means we can (best as part of or follow-up to #973 since it already modifies this part quite a bit) remove AdapterHelper usage from requests and don’t need to set cacerts (or set it on pool creation and use two different pools for https and http).

It unfortunately also means there’s no way to workaround the body-size Finch bug short of indiscriminately forcing HTTP1 for every connection as we used to before, or spin up two different pools for body-less and body-having requests with only the latter being limited to :http1 (or :http2 exclusively, though this might break federation with ill-configured instances)

A somewhat shoddy approach could also be to always use the stream API and limit the size of chunks we send out to the minimum HTTP2 must support, or if there is no such minimum size a sufficiently small size to be unlikely to cause issues in practice. However, this will degrade performance and kind of defeats the point of allowing HTTP2 in the first place.

Given we have a number of issues related to Finch, it might also be worthwhile to consider whether we want to continue using Finch going forward. Finch is designed with performance as a primary focus and apparently a preference for connections only to modern and/or ahead-of-time well-known peers, and it seems this focus might distract it a bit from what is needed to be used in general-purpose connections with untrusted remotes.
Namely our pain points with Finch atm are, with more severe issues near the end:

  • it doesn't properly handle server push responses for any pool type (but we can work around it on our side; this should be relatively easy to fix on Finch’s side)
  • race condition in cleaning up pools sometimes kills active connections; see #880 (less frequent now that we adjusted the pool timeout and unclear whether other backends are fully race-free; WIP but seemingly stale patches to improve this in Finch (although no full fix seems possible): https://github.com/sneako/finch/pull/292)
  • no way to set custom cacerts defaults a http-vs-https agnostic way (needs two separate pools; this is presumably relatiely easy to fix in Finch since it internally already creates different pools per schema anyway. :http pools implicitly created from the default config would just need to remove cacerts if set)
  • it breaks body requests in a non-predictable way (since it depends on configuration of the remote server) which is hard to work around (might only be solvable by fundamentally changing how Finch handles pooling for [:http1, :http2])
  • there’s no way to reliably limit the size of remote responses making us vulnerable to being OOMed by malicious remotes. This might eventually be addressed but it isn’t possible rn and doesn’t appear to be a priority on Finch’s part: https://github.com/sneako/finch/issues/224
    • the recommended mitigation from Finch size is to just use the stream API and stop accepting once we got too much, BUT Finch is already prefetching more chunks before we process them so if the connection is sufficiently fast there might still be a lot more being loaded into memory than we want; see e.g. https://github.com/sneako/finch/issues/282

Checking alternatives supported as Tesla backends, only hackney appears to already have pooling built-in though there are various complaints about hackney pool having poor performance but idk if this is still up-to-date. (I also have no idea how its pooling even works and whether it utilises HTTP2 multiplexing). It has a built-in response body-size limit, though it notes it will be rounded up to the connections chunk size. But much better than Finch in this regard at least.
For everything else we’d need to implement our own pool (at which point it might make more sense to either use hackney or invest effort into fixing Finch)
Idk what motivated the selection of Inch when Akkoma was consolidating to a single Tesla backend and have no actual experience with the other backends. What do you think @floatingghost?

Soooo, the thing is `Finch.request` cannot ever override pool options and only takes three request-specific options as [documented](https://hexdocs.pm/finch/Finch.html#request/3-options): - `:receive_timeout` - `:pool_timeout` - `:request_timeout` With the latter two being limited to HTTP1. Checking finch’s code confirms this. This means we should actually NEVER add `AdapterHelper.options` to individiual requests and the per-request `cacerts` addition is also actually doing nothing at all andwe rely on Mint’s default of querying the `CAStore` package. It is true though that setting `cacerts` outselves on pool creation breaks plain `http://` connections. This means we can *(best as part of or follow-up to #973 since it already modifies this part quite a bit)* remove `AdapterHelper` usage from requests and don’t need to set cacerts *(or set it on pool creation and use two different pools for https and http)*. It unfortunately also means there’s no way to workaround the body-size Finch bug short of indiscriminately forcing `HTTP1` for every connection as we used to before, or spin up two different pools for body-less and body-having requests with only the latter being limited to `:http1` (or `:http2` exclusively, though this might break federation with ill-configured instances) A somewhat shoddy approach could also be to always use the stream API and limit the size of chunks we send out to the minimum HTTP2 must support, or if there is no such minimum size a sufficiently small size to be unlikely to cause issues in practice. However, this will degrade performance and kind of defeats the point of allowing HTTP2 in the first place. Given we have a number of issues related to Finch, it might also be worthwhile to consider whether we want to continue using Finch going forward. Finch is designed with performance as a primary focus and apparently a preference for connections only to modern and/or ahead-of-time well-known peers, and it seems this focus might distract it a bit from what is needed to be used in general-purpose connections with untrusted remotes. Namely our pain points with Finch atm are, with more severe issues near the end: - it doesn't properly handle server push responses for any pool type (but we can work around it on our side; this should be relatively easy to fix on Finch’s side) - race condition in cleaning up pools sometimes kills active connections; see #880 *(less frequent now that we adjusted the pool timeout and unclear whether other backends are fully race-free; WIP but seemingly stale patches to improve this in Finch (although no full fix seems possible): https://github.com/sneako/finch/pull/292)* - no way to set custom cacerts defaults a http-vs-https agnostic way *(needs two separate pools; this is presumably relatiely easy to fix in Finch since it internally already creates different pools per schema anyway. `:http` pools implicitly created from the default config would just need to remove `cacerts` if set)* - it breaks body requests in a non-predictable way (since it depends on configuration of the remote server) which is hard to work around *(might only be solvable by fundamentally changing how Finch handles pooling for `[:http1, :http2]`)* - there’s no way to reliably limit the size of remote responses making us vulnerable to being OOMed by malicious remotes. This might eventually be addressed but it isn’t possible rn and doesn’t appear to be a priority on Finch’s part: https://github.com/sneako/finch/issues/224 - the recommended mitigation from Finch size is to just use the stream API and stop accepting once we got too much, BUT Finch is already prefetching more chunks before we process them so if the connection is sufficiently fast there might still be a lot more being loaded into memory than we want; see e.g. https://github.com/sneako/finch/issues/282 Checking alternatives supported as Tesla backends, only hackney appears to already have pooling built-in though there are various complaints about hackney pool having poor performance but idk if this is still up-to-date. *(I also have no idea how its pooling even works and whether it utilises HTTP2 multiplexing)*. It has a built-in response body-size limit, though it notes it will be rounded **up** to the connections chunk size. But much better than Finch in this regard at least. For everything else we’d need to implement our own pool (at which point it might make more sense to either use hackney or invest effort into fixing Finch) Idk what motivated the selection of Inch when Akkoma was consolidating to a single Tesla backend and have no actual experience with the other backends. What do you think @floatingghost?

so hackney is... not good. it frequently has upstream issues that need specific workarounds, which was half of the reason alternate backends were supported in pleroma, i'd probably advise against using it for most things

the reason why i selected finch was that it seemed like it'd be the best supported going forwards, though i'm not wed to it or anything

so hackney is... not good. it frequently has upstream issues that need specific workarounds, which was half of the reason alternate backends were supported in pleroma, i'd probably advise against using it for most things the reason why i selected finch was that it seemed like it'd be the best supported going forwards, though i'm not wed to it or anything
Author
Owner

hmm.. welp hacnkey was the only other Tesla backend with built-in pooling.
All others options for a proper resolution (and allowing HTTP2) will be harder *(either implementing custom pooling for another backend, or fixing Finch).

Any preference whether to just revert to HTTP1-only or set up two Finch instances until we have a better solution?

hmm.. welp hacnkey was the only other Tesla backend with built-in pooling. All others options for a proper resolution (and allowing HTTP2) will be harder *(either implementing custom pooling for another backend, or fixing Finch). Any preference whether to just revert to HTTP1-only or set up two Finch instances until we have a better solution?

hmmm
probably the easiest is just to force http1 i'd think?
setting up two instances is fine and all but do we gain that much from baseline http2 support for only some requests?

hmmm probably the easiest is just to force http1 i'd think? setting up two instances is fine and all but do we gain that much from baseline http2 support for only some requests?
Author
Owner

In theory better interoperability (some (few) websites don't accept HTTP1). other advantages of HTTP2 are admittedly invalidated by either Finch limitations (using HTTP1 pooling logic for HTTP2-supporting remotes if ALPN is used) or from the overhead of having two instances.

But yeah, given HTTP1-only worked fine enough before, just reverting to it seems good.
I’ll follow up with a new PR setting default protocols to [:http1] only and cleaning up the misplaced pool options in requests

In theory better interoperability (some (few) websites don't accept HTTP1). other advantages of HTTP2 are admittedly invalidated by either Finch limitations *(using HTTP1 pooling logic for HTTP2-supporting remotes if ALPN is used)* or from the overhead of having two instances. But yeah, given HTTP1-only worked fine enough before, just reverting to it seems good. I’ll follow up with a new PR setting default protocols to `[:http1]` only and cleaning up the misplaced pool options in requests
Oneric closed this pull request 2025-10-10 16:26:26 +00:00
All checks were successful
ci/woodpecker/pr/test/1 Pipeline was successful
ci/woodpecker/pr/test/2 Pipeline was successful

Pull request closed

Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
AkkomaGang/akkoma!980
No description provided.