From 60fe4a8393fe874284fbf2f46d598119a5207296 Mon Sep 17 00:00:00 2001 From: Mark Felder Date: Fri, 6 Nov 2020 13:00:31 -0600 Subject: [PATCH 01/11] First draft of tips for optimizing BEAM --- docs/configuration/optimizing_beam.md | 60 +++++++++++++++++++++++++++ 1 file changed, 60 insertions(+) create mode 100644 docs/configuration/optimizing_beam.md diff --git a/docs/configuration/optimizing_beam.md b/docs/configuration/optimizing_beam.md new file mode 100644 index 000000000..7a3cfda4b --- /dev/null +++ b/docs/configuration/optimizing_beam.md @@ -0,0 +1,60 @@ +# Optimizing the BEAM + +Pleroma is built upon the Erlang/OTP VM known as BEAM. The BEAM VM is highly optimized for latency, but this has drawbacks in environments without dedicated hardware. One of the tricks used by the BEAM VM is [busy waiting](https://en.wikipedia.org/wiki/Busy_waiting). This allows the application to pretend to be busy working so the OS kernel does not pause the application process and switch to another process waiting for the CPU to execute its workload. It does this by busy waiting or spinning for a period of time which inflates the apparent CPU usage of the application as seen in utilities like **top(1)** so it is immediately ready to execute another task. Switching between procesess is a rather expensive operation and also clears CPU caches which further affects latency and performance. + +This strategy is very successful in making a performant and responsive application, but is not desirable on Virtual Machines or hardware with few CPU cores. Pleroma instances are often deployed on the same server as the required PostgreSQL database which can lead to situations where the Pleroma application is holding the CPU in a busy-wait loop and as a result the database cannot process requests in a timely manner. The fewer CPUs available, the more this problem is exacerbated. The latency is further amplified by the OS being installed on a Virtual Machine as the Hypervisor uses CPU time-slicing to pause the entire OS and switch between other tasks. + +More adventerous admins can be creative with CPU affinity (e.g., *taskset* for Linux and *cpuset* on FreeBSD) to pin processes to specific CPUs and eliminate much of this contention. The most important advice is to run as few processes as possible on your server to achieve the best performance. Even idle background processes can occasionally create [software interrupts](https://en.wikipedia.org/wiki/Interrupt) and take attention away from the executing process and latency spikes and invalidation of the CPU caches as they must be cleared when switching between processes for security. + +## VPS Provider Recommendations + +### Good + +* ???? + +### Bad + +* AWS (known to use burst scheduling) + + +## Example configurations + +Tuning the BEAM requires you provide a config file normally called [vm.args](http://erlang.org/doc/man/erl.html#emulator-flags). If you are using systemd to manage the service you can modify the unit file as such: + +`ExecStart=/usr/bin/elixir --erl '-args_file /opt/pleroma/config/vm.args' -S /usr/bin/mix phx.server` + +Check your OS documentation to adopt a similar strategy on other platforms. + +### Virtual Machine and/or few CPU cores + +Disable the busy-waiting + +``` ++sbwt none ++sbwtdcpu none ++sbwtdio none +``` + +### Dedicated Hardware + +Enable more busy waiting, increase the internal maximum limit of BEAM processes and ports + +``` ++P 16777216 ++Q 16777216 ++K true ++A 128 ++sbt db ++sbwt very_long ++swt very_low ++sub true ++Mulmbcs 32767 ++Mumbcgs 1 ++Musmbcs 2047 +``` + +## Additional Reading + +* [WhatsApp: Scaling to Millions of Simultaneous Connections](https://www.erlang-factory.com/upload/presentations/558/efsf2012-whatsapp-scaling.pdf) +* [Preemptive Scheduling and Spinlocks](https://www.uio.no/studier/emner/matnat/ifi/nedlagte-emner/INF3150/h03/annet/slides/preemptive.pdf) +* [The Curious Case of BEAM CPU Usage](https://stressgrid.com/blog/beam_cpu_usage/) From 9e90e49ad2d01744bd5385473f2ebd4b2a442e8a Mon Sep 17 00:00:00 2001 From: Mark Felder Date: Fri, 6 Nov 2020 13:02:07 -0600 Subject: [PATCH 02/11] Grammar --- docs/configuration/optimizing_beam.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/configuration/optimizing_beam.md b/docs/configuration/optimizing_beam.md index 7a3cfda4b..40fa7bfbf 100644 --- a/docs/configuration/optimizing_beam.md +++ b/docs/configuration/optimizing_beam.md @@ -4,7 +4,7 @@ Pleroma is built upon the Erlang/OTP VM known as BEAM. The BEAM VM is highly opt This strategy is very successful in making a performant and responsive application, but is not desirable on Virtual Machines or hardware with few CPU cores. Pleroma instances are often deployed on the same server as the required PostgreSQL database which can lead to situations where the Pleroma application is holding the CPU in a busy-wait loop and as a result the database cannot process requests in a timely manner. The fewer CPUs available, the more this problem is exacerbated. The latency is further amplified by the OS being installed on a Virtual Machine as the Hypervisor uses CPU time-slicing to pause the entire OS and switch between other tasks. -More adventerous admins can be creative with CPU affinity (e.g., *taskset* for Linux and *cpuset* on FreeBSD) to pin processes to specific CPUs and eliminate much of this contention. The most important advice is to run as few processes as possible on your server to achieve the best performance. Even idle background processes can occasionally create [software interrupts](https://en.wikipedia.org/wiki/Interrupt) and take attention away from the executing process and latency spikes and invalidation of the CPU caches as they must be cleared when switching between processes for security. +More adventerous admins can be creative with CPU affinity (e.g., *taskset* for Linux and *cpuset* on FreeBSD) to pin processes to specific CPUs and eliminate much of this contention. The most important advice is to run as few processes as possible on your server to achieve the best performance. Even idle background processes can occasionally create [software interrupts](https://en.wikipedia.org/wiki/Interrupt) and take attention away from the executing process creating latency spikes and invalidation of the CPU caches as they must be cleared when switching between processes for security. ## VPS Provider Recommendations From da1862e1d3fea5488b80bfc46ea878468940238c Mon Sep 17 00:00:00 2001 From: Mark Felder Date: Fri, 6 Nov 2020 13:04:13 -0600 Subject: [PATCH 03/11] Less confusing I hope --- docs/configuration/optimizing_beam.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/configuration/optimizing_beam.md b/docs/configuration/optimizing_beam.md index 40fa7bfbf..00620b35d 100644 --- a/docs/configuration/optimizing_beam.md +++ b/docs/configuration/optimizing_beam.md @@ -1,6 +1,6 @@ # Optimizing the BEAM -Pleroma is built upon the Erlang/OTP VM known as BEAM. The BEAM VM is highly optimized for latency, but this has drawbacks in environments without dedicated hardware. One of the tricks used by the BEAM VM is [busy waiting](https://en.wikipedia.org/wiki/Busy_waiting). This allows the application to pretend to be busy working so the OS kernel does not pause the application process and switch to another process waiting for the CPU to execute its workload. It does this by busy waiting or spinning for a period of time which inflates the apparent CPU usage of the application as seen in utilities like **top(1)** so it is immediately ready to execute another task. Switching between procesess is a rather expensive operation and also clears CPU caches which further affects latency and performance. +Pleroma is built upon the Erlang/OTP VM known as BEAM. The BEAM VM is highly optimized for latency, but this has drawbacks in environments without dedicated hardware. One of the tricks used by the BEAM VM is [busy waiting](https://en.wikipedia.org/wiki/Busy_waiting). This allows the application to pretend to be busy working so the OS kernel does not pause the application process and switch to another process waiting for the CPU to execute its workload. It does this by spinning for a period of time which inflates the apparent CPU usage of the application as seen in utilities like **top(1)** so it is immediately ready to execute another task. Switching between procesess is a rather expensive operation and also clears CPU caches which further affects latency and performance. This strategy is very successful in making a performant and responsive application, but is not desirable on Virtual Machines or hardware with few CPU cores. Pleroma instances are often deployed on the same server as the required PostgreSQL database which can lead to situations where the Pleroma application is holding the CPU in a busy-wait loop and as a result the database cannot process requests in a timely manner. The fewer CPUs available, the more this problem is exacerbated. The latency is further amplified by the OS being installed on a Virtual Machine as the Hypervisor uses CPU time-slicing to pause the entire OS and switch between other tasks. From 620f1d723781ef39079add6326aec24112275253 Mon Sep 17 00:00:00 2001 From: Mark Felder Date: Fri, 6 Nov 2020 13:12:13 -0600 Subject: [PATCH 04/11] More grammar fixes --- docs/configuration/optimizing_beam.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/configuration/optimizing_beam.md b/docs/configuration/optimizing_beam.md index 00620b35d..fd89c4e99 100644 --- a/docs/configuration/optimizing_beam.md +++ b/docs/configuration/optimizing_beam.md @@ -1,6 +1,6 @@ # Optimizing the BEAM -Pleroma is built upon the Erlang/OTP VM known as BEAM. The BEAM VM is highly optimized for latency, but this has drawbacks in environments without dedicated hardware. One of the tricks used by the BEAM VM is [busy waiting](https://en.wikipedia.org/wiki/Busy_waiting). This allows the application to pretend to be busy working so the OS kernel does not pause the application process and switch to another process waiting for the CPU to execute its workload. It does this by spinning for a period of time which inflates the apparent CPU usage of the application as seen in utilities like **top(1)** so it is immediately ready to execute another task. Switching between procesess is a rather expensive operation and also clears CPU caches which further affects latency and performance. +Pleroma is built upon the Erlang/OTP VM known as BEAM. The BEAM VM is highly optimized for latency, but this has drawbacks in environments without dedicated hardware. One of the tricks used by the BEAM VM is [busy waiting](https://en.wikipedia.org/wiki/Busy_waiting). This allows the application to pretend to be busy working so the OS kernel does not pause the application process and switch to another process waiting for the CPU to execute its workload. It does this by spinning for a period of time which inflates the apparent CPU usage of the application so it is immediately ready to execute another task. This can be observed with utilities like **top(1)** which will show consistently high CPU usage for the process. Switching between procesess is a rather expensive operation and also clears CPU caches further affecting latency and performance. The goal of busy waiting is to avoid this penalty. This strategy is very successful in making a performant and responsive application, but is not desirable on Virtual Machines or hardware with few CPU cores. Pleroma instances are often deployed on the same server as the required PostgreSQL database which can lead to situations where the Pleroma application is holding the CPU in a busy-wait loop and as a result the database cannot process requests in a timely manner. The fewer CPUs available, the more this problem is exacerbated. The latency is further amplified by the OS being installed on a Virtual Machine as the Hypervisor uses CPU time-slicing to pause the entire OS and switch between other tasks. From 4999549191b1ac7c50bb3a6398b0bad0f0957b73 Mon Sep 17 00:00:00 2001 From: Mark Felder Date: Fri, 6 Nov 2020 13:15:21 -0600 Subject: [PATCH 05/11] Make it clearer the settings go into the vm.args file --- docs/configuration/optimizing_beam.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/configuration/optimizing_beam.md b/docs/configuration/optimizing_beam.md index fd89c4e99..64d35ad2c 100644 --- a/docs/configuration/optimizing_beam.md +++ b/docs/configuration/optimizing_beam.md @@ -29,6 +29,7 @@ Check your OS documentation to adopt a similar strategy on other platforms. Disable the busy-waiting +**vm.args:** ``` +sbwt none +sbwtdcpu none @@ -39,6 +40,7 @@ Disable the busy-waiting Enable more busy waiting, increase the internal maximum limit of BEAM processes and ports +**vm.args:** ``` +P 16777216 +Q 16777216 From a9c1f83fd8b2793e6474b14d246e1ef362892467 Mon Sep 17 00:00:00 2001 From: Mark Felder Date: Fri, 6 Nov 2020 13:16:22 -0600 Subject: [PATCH 06/11] Markdown, you're drunk --- docs/configuration/optimizing_beam.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/configuration/optimizing_beam.md b/docs/configuration/optimizing_beam.md index 64d35ad2c..de76086f7 100644 --- a/docs/configuration/optimizing_beam.md +++ b/docs/configuration/optimizing_beam.md @@ -30,6 +30,7 @@ Check your OS documentation to adopt a similar strategy on other platforms. Disable the busy-waiting **vm.args:** + ``` +sbwt none +sbwtdcpu none @@ -41,6 +42,7 @@ Disable the busy-waiting Enable more busy waiting, increase the internal maximum limit of BEAM processes and ports **vm.args:** + ``` +P 16777216 +Q 16777216 From abf2ec2bbe5226d6558ca1918779ee68df1cab84 Mon Sep 17 00:00:00 2001 From: lain Date: Sun, 8 Nov 2020 09:45:35 +0000 Subject: [PATCH 07/11] Update optimizing_beam.md --- docs/configuration/optimizing_beam.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/configuration/optimizing_beam.md b/docs/configuration/optimizing_beam.md index de76086f7..545392f8f 100644 --- a/docs/configuration/optimizing_beam.md +++ b/docs/configuration/optimizing_beam.md @@ -4,7 +4,7 @@ Pleroma is built upon the Erlang/OTP VM known as BEAM. The BEAM VM is highly opt This strategy is very successful in making a performant and responsive application, but is not desirable on Virtual Machines or hardware with few CPU cores. Pleroma instances are often deployed on the same server as the required PostgreSQL database which can lead to situations where the Pleroma application is holding the CPU in a busy-wait loop and as a result the database cannot process requests in a timely manner. The fewer CPUs available, the more this problem is exacerbated. The latency is further amplified by the OS being installed on a Virtual Machine as the Hypervisor uses CPU time-slicing to pause the entire OS and switch between other tasks. -More adventerous admins can be creative with CPU affinity (e.g., *taskset* for Linux and *cpuset* on FreeBSD) to pin processes to specific CPUs and eliminate much of this contention. The most important advice is to run as few processes as possible on your server to achieve the best performance. Even idle background processes can occasionally create [software interrupts](https://en.wikipedia.org/wiki/Interrupt) and take attention away from the executing process creating latency spikes and invalidation of the CPU caches as they must be cleared when switching between processes for security. +More adventurous admins can be creative with CPU affinity (e.g., *taskset* for Linux and *cpuset* on FreeBSD) to pin processes to specific CPUs and eliminate much of this contention. The most important advice is to run as few processes as possible on your server to achieve the best performance. Even idle background processes can occasionally create [software interrupts](https://en.wikipedia.org/wiki/Interrupt) and take attention away from the executing process creating latency spikes and invalidation of the CPU caches as they must be cleared when switching between processes for security. ## VPS Provider Recommendations From 2933658446f4172bb77b4d294d032fa4e2d93237 Mon Sep 17 00:00:00 2001 From: feld Date: Tue, 10 Nov 2020 16:44:00 +0000 Subject: [PATCH 08/11] Apply 1 suggestion(s) to 1 file(s) --- docs/configuration/optimizing_beam.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/configuration/optimizing_beam.md b/docs/configuration/optimizing_beam.md index 545392f8f..1753620c9 100644 --- a/docs/configuration/optimizing_beam.md +++ b/docs/configuration/optimizing_beam.md @@ -27,7 +27,7 @@ Check your OS documentation to adopt a similar strategy on other platforms. ### Virtual Machine and/or few CPU cores -Disable the busy-waiting +Disable the busy-waiting. This should generally only be done if you're on a platform that does burst scheduling, like AWS. **vm.args:** From 952a8c213e37b38c48890813f80e8792cb0cebe2 Mon Sep 17 00:00:00 2001 From: feld Date: Tue, 10 Nov 2020 16:44:08 +0000 Subject: [PATCH 09/11] Apply 1 suggestion(s) to 1 file(s) --- docs/configuration/optimizing_beam.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/configuration/optimizing_beam.md b/docs/configuration/optimizing_beam.md index 1753620c9..ecfae86d9 100644 --- a/docs/configuration/optimizing_beam.md +++ b/docs/configuration/optimizing_beam.md @@ -39,7 +39,7 @@ Disable the busy-waiting. This should generally only be done if you're on a plat ### Dedicated Hardware -Enable more busy waiting, increase the internal maximum limit of BEAM processes and ports +Enable more busy waiting, increase the internal maximum limit of BEAM processes and ports. You can use this if you run on dedicated hardware, but it is not necessary. **vm.args:** From 776067a9a32fd2b536c6782a0aa7c2dfd8444280 Mon Sep 17 00:00:00 2001 From: feld Date: Tue, 10 Nov 2020 16:44:17 +0000 Subject: [PATCH 10/11] Apply 1 suggestion(s) to 1 file(s) --- docs/configuration/optimizing_beam.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/configuration/optimizing_beam.md b/docs/configuration/optimizing_beam.md index ecfae86d9..a79d188ca 100644 --- a/docs/configuration/optimizing_beam.md +++ b/docs/configuration/optimizing_beam.md @@ -6,6 +6,8 @@ This strategy is very successful in making a performant and responsive applicati More adventurous admins can be creative with CPU affinity (e.g., *taskset* for Linux and *cpuset* on FreeBSD) to pin processes to specific CPUs and eliminate much of this contention. The most important advice is to run as few processes as possible on your server to achieve the best performance. Even idle background processes can occasionally create [software interrupts](https://en.wikipedia.org/wiki/Interrupt) and take attention away from the executing process creating latency spikes and invalidation of the CPU caches as they must be cleared when switching between processes for security. +Please only change these settings if you are experiencing issues or really know what you are doing. In general, there's no need to change these settings. + ## VPS Provider Recommendations ### Good From 7681b4c5cd683bcc8c91b1d296261e7e2b11fd88 Mon Sep 17 00:00:00 2001 From: feld Date: Tue, 10 Nov 2020 16:44:23 +0000 Subject: [PATCH 11/11] Apply 1 suggestion(s) to 1 file(s) --- docs/configuration/optimizing_beam.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/configuration/optimizing_beam.md b/docs/configuration/optimizing_beam.md index a79d188ca..e336bd36c 100644 --- a/docs/configuration/optimizing_beam.md +++ b/docs/configuration/optimizing_beam.md @@ -12,7 +12,7 @@ Please only change these settings if you are experiencing issues or really know ### Good -* ???? +* Hetzner Cloud ### Bad