[bug] High RAM usage / OOM when ulimit -n is high #1020

Open
opened 2025-11-25 03:25:54 +00:00 by provable_ascent · 4 comments
Contributor

Your setup

Docker

Extra details

Docker on Debian

Version

3.16.0-60-g6d88834f

PostgreSQL version

14

What were you trying to do?

I upgraded my host + Docker and was surprised to find out that the container got OOM-killed immediately upon starting. See detailed troubleshooting and root-causing below.

What did you expect to happen?

I have a very small instance that only used 500MB before the upgrade.

What actually happened?

I switched to a larger host with 4GB RAM and found out that Akkoma now allocates 2.2GB+ on start.

Logs

Logs are as usual until the process OOMs. However, I observed immediate RAM alloc right upon starting, even if just running mix commands without starting the server.

After a whole day debugging and searching the Internet, I found out that Erts beam allocates RAM proportional to ulimit -n (i.e. max file descriptors). And somehow my latest setup has a limit of 1073741824 in the container. This causes 2GB+ allocations. Upon changing the command to ulimit -n 65536 && mix ecto.migrate && mix phx.server, the problem is resolved.

Similar issues I've found helpful: https://github.com/teslamate-org/teslamate/discussions/3045 and https://github.com/docker-library/rabbitmq/issues/545

Severity

I cannot use the software

Have you searched for this issue?

  • I have double-checked and have not found this issue mentioned anywhere.
### Your setup Docker ### Extra details Docker on Debian ### Version 3.16.0-60-g6d88834f ### PostgreSQL version 14 ### What were you trying to do? I upgraded my host + Docker and was surprised to find out that the container got OOM-killed immediately upon starting. See detailed troubleshooting and root-causing below. ### What did you expect to happen? I have a very small instance that only used 500MB before the upgrade. ### What actually happened? I switched to a larger host with 4GB RAM and found out that Akkoma now allocates 2.2GB+ on start. ### Logs Logs are as usual until the process OOMs. However, I observed immediate RAM alloc right upon starting, even if just running `mix` commands without starting the server. After a whole day debugging and searching the Internet, I found out that Erts beam allocates RAM proportional to `ulimit -n` (i.e. max file descriptors). And somehow my latest setup has a limit of `1073741824` in the container. This causes 2GB+ allocations. Upon changing the command to `ulimit -n 65536 && mix ecto.migrate && mix phx.server`, the problem is resolved. Similar issues I've found helpful: https://github.com/teslamate-org/teslamate/discussions/3045 and https://github.com/docker-library/rabbitmq/issues/545 ### Severity I cannot use the software ### Have you searched for this issue? - [x] I have double-checked and have not found this issue mentioned anywhere.
Author
Contributor

Q for project maintainers:

1. Do we consider this a bug? To me, it seems like a bug in the Docker image or at least the documentation on Docker installation instructions. But I can see OTP installations also running into this on distros with high FD limits. It's not a bug per se in the main Akkoma code base though.

I believe more and more people will run into this as Docker / k8s / etc. roll out higher / unlimited defaults. Many people like me choose Akkoma due to its light weight. The high RAM/OOM experience out of the box may scare people away or make people believe it won't fit on their small box. And it's not an easy one to troubleshoot / fix for people who are not familiar with Erlang.

2. How shall we approach this? I'd like to contribute but I'd like some thoughts on how. Shall we:

  • Prepend ulimit -n 65536 to the docker command? (Downside: It is difficult for users to increase this limit if they somehow need it, see the RabbitMQ discussion linked above)
  • Add ERL_MAX_PORTS to the Docker env image? This is easier to override, but upon my testing, this only works with the main process. Even with the env var, libmagic_port and fasthtml_worker still takes 256MB each, adding up to 1.5GB
  • Add a note about ulimit to the installation guide, esp Docker?
  • Add a FAQ / troubleshooting section?
  • Add some warning when nofile is unreasonably high? So at least there's some logging pointing to the issue.

Lmk which option or combination of options sound good to you and I'd be happy to send some PRs.

Q for project maintainers: 1\. Do we consider this a bug? To me, it seems like a bug in the Docker image or at least the documentation on Docker installation instructions. But I can see OTP installations also running into this on distros with high FD limits. It's not a bug per se in the main Akkoma code base though. I believe more and more people will run into this as Docker / k8s / etc. roll out higher / unlimited defaults. Many people like me choose Akkoma due to its light weight. The high RAM/OOM experience out of the box may scare people away or make people believe it won't fit on their small box. And it's not an easy one to troubleshoot / fix for people who are not familiar with Erlang. 2\. How shall we approach this? I'd like to contribute but I'd like some thoughts on how. Shall we: - Prepend `ulimit -n 65536` to the docker command? (Downside: It is difficult for users to increase this limit if they somehow need it, see the RabbitMQ discussion linked above) - Add `ERL_MAX_PORTS` to the Docker env image? This is easier to override, but upon my testing, this only works with the main process. Even with the env var, `libmagic_port` and `fasthtml_worker` still takes 256MB each, adding up to 1.5GB - Add a note about ulimit to the installation guide, esp [Docker](https://docs.akkoma.dev/stable/installation/docker_en/)? - Add a FAQ / troubleshooting section? - Add some warning when `nofile` is unreasonably high? So at least there's some logging pointing to the issue. Lmk which option or combination of options sound good to you and I'd be happy to send some PRs.
Owner

Add ERL_MAX_PORTS to the Docker env image? This is easier to override,

This sounds best to me. As far as I recall some large instances needed to increase the OS-default file-descriptor limit for akkoma.
fasthtml_worker and libmagic_port are both native binaries, not BEAM programs and even if an environment variable should be available to them too. Not sure why you are seeing such large mem consumption from them.

Regardless of what we end up going with, a note should laso be added to docs about the setting and how to override its default

> Add ERL_MAX_PORTS to the Docker env image? This is easier to override, This sounds best to me. As far as I recall some large instances needed to increase the OS-default file-descriptor limit for akkoma. `fasthtml_worker` and `libmagic_port` are both native binaries, not BEAM programs and even if an environment variable should be available to them too. Not sure why you are seeing such large mem consumption from them. Regardless of what we end up going with, a note should laso be added to docs about the setting and how to override its default
Author
Contributor

fasthtml_worker and libmagic_port are both native binaries, not BEAM programs and even if an environment variable should be available to them too. Not sure why you are seeing such large mem consumption from them.

I think they somehow allocate memory proportional to fd limit through a different mechanism unknown to me. I was unable to get to the bottom of it. (And as far as I can tell, ERL_MAX_PORTS only controls ports, not fds in general -- it's just that # ports default to max fds in this particular case. And other programs may use a different env var or no env var at all.) Maybe someone more familiar can chime in here.

Regardless of what we end up going with, a note should laso be added to docs about the setting and how to override its default

Sounds good!


One idea occurred to me that we can actually add ulimit -n $AKKOMA_MAX_FDS (or whatever we want to call it) to the Docker command. This combines the benefit of controlling all fd-related things and still being easy to override. The Docker image can set AKKOMA_MAX_FDS=65536 by default but users can override this or even unset it. The best part about it? This can be used to increase soft limits as long as hard limits permit, e.g. for the large instances. Bumping ERL_MAX_PORTS on the other hand doesn't do a lot and won't work without ulimit.

> `fasthtml_worker` and `libmagic_port` are both native binaries, not BEAM programs and even if an environment variable should be available to them too. Not sure why you are seeing such large mem consumption from them. I think they somehow allocate memory proportional to fd limit through a different mechanism unknown to me. I was unable to get to the bottom of it. (And as far as I can tell, `ERL_MAX_PORTS` only controls ports, not fds in general -- it's just that # ports default to max fds in this particular case. And other programs may use a different env var or no env var at all.) Maybe someone more familiar can chime in here. > Regardless of what we end up going with, a note should laso be added to docs about the setting and how to override its default Sounds good! ---- One idea occurred to me that we can actually add `ulimit -n $AKKOMA_MAX_FDS` (or whatever we want to call it) to the Docker command. This combines the benefit of controlling all fd-related things and still being easy to override. The Docker image can set `AKKOMA_MAX_FDS=65536` by default but users can override this or even unset it. The best part about it? This can be used to *increase* soft limits as long as hard limits permit, e.g. for the large instances. Bumping `ERL_MAX_PORTS` on the other hand doesn't do a lot and won't work without `ulimit`.
Author
Contributor

Sorry for the long delay. I've been testing #1079 on my server and it's been working flawlessly. If we think this approach looks good, I can work on the docs next.

Sorry for the long delay. I've been testing #1079 on my server and it's been working flawlessly. If we think this approach looks good, I can work on the docs next.
Sign in to join this conversation.
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
AkkomaGang/akkoma#1020
No description provided.