[bug] High RAM usage / OOM when ulimit -n is high #1020
Labels
No labels
approved, awaiting change
broken setup
bug
cannot reproduce
configuration
documentation
duplicate
enhancement
extremely low priority
feature request
Fix it yourself
help wanted
invalid
mastodon_api
needs change/feedback
needs docs
needs tests
not a bug
not our bug
planned
pleroma_api
privacy
question
static_fe
triage
wontfix
No milestone
No project
No assignees
2 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
AkkomaGang/akkoma#1020
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Your setup
Docker
Extra details
Docker on Debian
Version
3.16.0-60-g6d88834f
PostgreSQL version
14
What were you trying to do?
I upgraded my host + Docker and was surprised to find out that the container got OOM-killed immediately upon starting. See detailed troubleshooting and root-causing below.
What did you expect to happen?
I have a very small instance that only used 500MB before the upgrade.
What actually happened?
I switched to a larger host with 4GB RAM and found out that Akkoma now allocates 2.2GB+ on start.
Logs
Logs are as usual until the process OOMs. However, I observed immediate RAM alloc right upon starting, even if just running
mixcommands without starting the server.After a whole day debugging and searching the Internet, I found out that Erts beam allocates RAM proportional to
ulimit -n(i.e. max file descriptors). And somehow my latest setup has a limit of1073741824in the container. This causes 2GB+ allocations. Upon changing the command toulimit -n 65536 && mix ecto.migrate && mix phx.server, the problem is resolved.Similar issues I've found helpful: https://github.com/teslamate-org/teslamate/discussions/3045 and https://github.com/docker-library/rabbitmq/issues/545
Severity
I cannot use the software
Have you searched for this issue?
Q for project maintainers:
1. Do we consider this a bug? To me, it seems like a bug in the Docker image or at least the documentation on Docker installation instructions. But I can see OTP installations also running into this on distros with high FD limits. It's not a bug per se in the main Akkoma code base though.
I believe more and more people will run into this as Docker / k8s / etc. roll out higher / unlimited defaults. Many people like me choose Akkoma due to its light weight. The high RAM/OOM experience out of the box may scare people away or make people believe it won't fit on their small box. And it's not an easy one to troubleshoot / fix for people who are not familiar with Erlang.
2. How shall we approach this? I'd like to contribute but I'd like some thoughts on how. Shall we:
ulimit -n 65536to the docker command? (Downside: It is difficult for users to increase this limit if they somehow need it, see the RabbitMQ discussion linked above)ERL_MAX_PORTSto the Docker env image? This is easier to override, but upon my testing, this only works with the main process. Even with the env var,libmagic_portandfasthtml_workerstill takes 256MB each, adding up to 1.5GBnofileis unreasonably high? So at least there's some logging pointing to the issue.Lmk which option or combination of options sound good to you and I'd be happy to send some PRs.
This sounds best to me. As far as I recall some large instances needed to increase the OS-default file-descriptor limit for akkoma.
fasthtml_workerandlibmagic_portare both native binaries, not BEAM programs and even if an environment variable should be available to them too. Not sure why you are seeing such large mem consumption from them.Regardless of what we end up going with, a note should laso be added to docs about the setting and how to override its default
I think they somehow allocate memory proportional to fd limit through a different mechanism unknown to me. I was unable to get to the bottom of it. (And as far as I can tell,
ERL_MAX_PORTSonly controls ports, not fds in general -- it's just that # ports default to max fds in this particular case. And other programs may use a different env var or no env var at all.) Maybe someone more familiar can chime in here.Sounds good!
One idea occurred to me that we can actually add
ulimit -n $AKKOMA_MAX_FDS(or whatever we want to call it) to the Docker command. This combines the benefit of controlling all fd-related things and still being easy to override. The Docker image can setAKKOMA_MAX_FDS=65536by default but users can override this or even unset it. The best part about it? This can be used to increase soft limits as long as hard limits permit, e.g. for the large instances. BumpingERL_MAX_PORTSon the other hand doesn't do a lot and won't work withoutulimit.Sorry for the long delay. I've been testing #1079 on my server and it's been working flawlessly. If we think this approach looks good, I can work on the docs next.