Skip to content

emqx/lc

Repository files navigation

load control erlang application

This is a helper application to enable load control on the erlang node with minimal overhead.

The application interface module is the load_ctl* module, all API calls should go through it.

It monitors:

  • Length of erlang run queue
  • System memory utilization

APIs

is_overloaded

load_ctl:is_overloaded() -> boolean().

returns true when the system is overloaded due to long erlang run queue.

Ideal for checking in realtime workloads, like before spawning the new process to handle new incoming connections. If the system is overloaded, the new connection could be closed/rejected thus peer client could retry with other nodes in the same cluster. The existing connections/work are kept and service quality (latency, throughput) could be assured.

This check is cheap.

is_high_mem

load_ctl:is_high_mem() -> boolean().

Returns true if memory usage is high (over threshold). The memory usage is read from /proc/meminfo or linux cgroup fs mount if process runs in the container.

This check is cheap.

maydelay

maydelay(timer:timeout()) -> false | ok | timeout.

Blocks the caller until the system is not overloaded or timeout.

Ideal for checking in heavy lifting workload. In case some heavy lifting workload is unimportant and could be deferred.

retuns false when there was no delay. retuns ok when there was delay but get unblocked by system cooldown. retuns timeout when there was delay but unblock due to timeout.

join/leave a priority process group

load_ctl:join(Priority::non_neg_integer()) -> ok
do_work()
load_ctl:leave(Priority::non_neg_integer()) -> ok 

Caller joins/leaves a priority process group managed by this application.

While the system is overloaded, the processes in the group under prioriy threshold will be killed forcefully.

It is also configurable that very low priority processes could get killed while system trends become overloaded.

IMPORTANT: Please double check the side effect of the killing process. If the caller processes are supervised under the same supervior, the bursts restarts could make the supervior reaches the intensity and period limit and cause the supervior itself get killed.

enable/disable load control: stop/restart flagman

flagman is a daemon process that monitors the system load. It raises the overload flag when the system is overloaded. It clears the flag when the system is cooldown.

flagman starts when the application is started.

It can be temptory disabled:

%% for runq flagman
load_ctl:stop_runq_flagman() -> ok.

%% for memory flagman
load_ctl:stop_mem_flagman() -> ok.

%% Or with a TIMEOUT
load_ctl:stop_runq_flagman(timer:timeout()) -> {error, timeout} | ok.
load_ctl:stop_mem_flagman(timer:timeout()) -> {error, timeout} | ok.

It can be restarted after disabled:

load_ctl:restart_runq_flagman() -> ok.

load_ctl:restart_mem_flagman() -> ok.

Configuration

Get current config

load_ctl:get_config() -> Config::map().

Overwrite the current config

load_ctl:put_config(Config::map()) -> ok | {error, badarg}.

for config keys in the Config map refer to DOC Internals

Alarm

Once the system is overloaded, alarm lc_runq_alarm is raised via alarm_handler with alarm info as following

#{ node % node()
 , runq_length  %% runq len when triggered
 } 
 

Once the system has high memory usage, lc_mem_alarm is raised with alarm info as following

#{ node % node() 
 , mem_usage %% float(), memory usage when alarm is raised. 
 }

The alarm is cleared when the system is cool or the flagman is stopped.

Dependencies

  1. OTP 23+
  2. rebar3

Build

$ make