From 4b5a398f224abfa409c3f9fd1fa80c2dd6e41263 Mon Sep 17 00:00:00 2001 From: Oneric Date: Thu, 4 Apr 2024 17:19:58 +0200 Subject: [PATCH] Avoid accumulation of stale data in websockets MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit We’ve received reports of some specific instances slowly accumulating more and more binary data over time up to OOMs and globally setting ERL_FULLSWEEP_AFTER=0 has proven to be an effective countermeasure. However, this incurs increased cpu perf costs everywhere and is thus not suitable to apply out of the box. There are many reports unrelated to Akkoma of long-lived Phoenix websockets getting into a state unfavourable for the garbage collector depending on usage pattern, resulting in exactly the observed behaviour. Therefore it seems likely affected instances are using timeline streaming and do so in just the right way to trigger this. We can tune the garbage collector just for websocket processes and use a more lenient value of 20 to keep the added perf cost in check. Unfortunately none of the affected instances responded to inquieries to test this more selective gc tuning, so this is not fully veriefied. However, given the general reports regarding websockets and Pleroma — as it turns out — also applying and having properly tested a very similar tweak seems to support this theory. Ref.: https://www.erlang.org/doc/man/erlang#ghlink-process_flag-2-idp226 https://blog.guzman.codes/using-phoenix-channels-high-memory-usage-save-money-with-erlfullsweepafter https://git.pleroma.social/pleroma/pleroma/-/merge_requests/4060 --- lib/pleroma/web/mastodon_api/websocket_handler.ex | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/lib/pleroma/web/mastodon_api/websocket_handler.ex b/lib/pleroma/web/mastodon_api/websocket_handler.ex index bd7c56243..9dae7bf93 100644 --- a/lib/pleroma/web/mastodon_api/websocket_handler.ex +++ b/lib/pleroma/web/mastodon_api/websocket_handler.ex @@ -18,6 +18,8 @@ defmodule Pleroma.Web.MastodonAPI.WebsocketHandler do @timeout :timer.seconds(60) # Hibernate every X messages @hibernate_every 100 + # Tune garabge collect for long-lived websocket process + @fullsweep_after 20 def init(%{qs: qs} = req, state) do with params <- Enum.into(:cow_qs.parse_qs(qs), %{}), @@ -59,6 +61,10 @@ defmodule Pleroma.Web.MastodonAPI.WebsocketHandler do "#{__MODULE__} accepted websocket connection for user #{(state.user || %{id: "anonymous"}).id}, topic #{state.topic}" ) + # process is long-lived and can sometimes accumulate stale data in such a way it's + # not freed by young garbage cycles, thus make full collection sweeps more frequent + :erlang.process_flag(:fullsweep_after, @fullsweep_after) + Streamer.add_socket(state.topic, state.oauth_token) {:ok, %{state | timer: timer()}} end