Skip to content

Commit

Permalink
trace-load-limits: restrict trace load for applications
Browse files Browse the repository at this point in the history
In a shared system many developers log as much as they
want into DLT. This can lead into overloading the logging
system resulting in bad performance and dropped logs.
This commit introduces trace load limits to restrict
applications to a certain log volume measured in bytes/s.
It is based on #134 but extends this heavily.

Trace load limits are configured via a space separted
configuraiton file.
The format of the file follows:

APPID [CTXID] SOFT_LIMIT HARD_LIMIT
The most matching entry will be selected for each log, meaning that
either app and context must match or at least the app id, for which a
default is created when not configured.
This allows to configure trace load for single contexts which can be
used for example to limit different applications in the qnx slog to a
given budget without affecting others or to give a file transfer
unlimited bandwidth.
It is recommended to always specify a budget for the application id
without the contexts to ensure new contexts and internal logs like DLTL
can be logged.

Applications are starting up with a default limit defined
via CMake variables TRACE_LOAD_USER_HARD_LIMIT, TRACE_LOAD_USER_SOFT_LIMIT.
As soon as the connection to the daemon was succesull each configuration
entry matching the app id will be sent to the client via an
application message.
If no configuration is found TRACE_LOAD_DAEMON_HARD_LIMIT and
TRACE_LOAD_USER_SOFT_LIMIT will be used instead.
The two staged configuration process makes sure that the daemon default
can be set to 0, forcing developers to define a limit for their
application while making sure that applications are able to log before
they received the log levels.

Measuring the trace load is done in the daemon and the client.
Dropping messages on the client side is the primary mechanism and
prevents sending logs to the daemon only to be dropped there, which
reduces the load of the IPC. Measuring again on daemon side makes
sure that rouge clients are still limited to the trace load defined.

Exceeding the limit soft will produce the following logs:
ECU1- DEMO DLTL log warn V 1 [Trace load exceeded trace soft limit on apid: DEMO. (soft limit: 2000 bytes/sec, current: 2414 bytes/sec)]
ECU1- DEMO DLTL log warn V 1 [Trace load exceeded trace soft limit on apid: DEMO, ctid TEST.(soft limit: 150 bytes/sec, current: 191 bytes/sec)]

Exceeding the hard limit will produce the same message but the text
'42 messages discarded.' is appended and messages will be dropped.
Dropped messages are lost and cannot be recovered, which forces
developers to get their logging volume under control.

As debug and trace load are usually disabled for production and thus do
not impact the performance of actual systems these logs are not
accounted for in trace load limits. In practice this turned out to be
very usefull to improve developer experience while maintaining good
performance, as devs know that debug and trace logs are only available
during development and important information has to be logged on a
higher level.

To simplify creating a trace limit base line the script
'utils/calculate-load.py' is provided which makes suggestions
for the limits based on actual log volume.

Signed-off-by: Alexander Mohr <[email protected]>
  • Loading branch information
alexmohr committed Aug 8, 2024
1 parent 358ab08 commit 0496fe8
Show file tree
Hide file tree
Showing 24 changed files with 2,813 additions and 35 deletions.
41 changes: 41 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,8 @@ option(WITH_DLT_QNX_SYSTEM "Set to ON to build QNX system binary dlt-qnx-system"
option(WITH_DLT_FILE_LOGGING_SYSLOG_FALLBACK "Set to ON to enable fallback to syslog if dlt logging to file fails" OFF)
option(WITH_DLT_NETWORK_TRACE "Set to ON to enable network trace (if message queue is supported)" ON)
option(WITH_DLT_LOG_LEVEL_APP_CONFIG "Set to ON to enable default log levels based on application ids" OFF)
option(WITH_DLT_TRACE_LOAD_CTRL "Set to ON to enable trace load control in libdlt and dlt-daemon" OFF)


set(DLT_IPC "FIFO"
CACHE STRING "UNIX_SOCKET,FIFO")
Expand Down Expand Up @@ -370,6 +372,44 @@ if(WITH_DLT_LOG_LEVEL_APP_CONFIG)
add_definitions(-DDLT_LOG_LEVEL_APP_CONFIG)
endif()


if(WITH_DLT_TRACE_LOAD_CTRL)
add_definitions(-DDLT_TRACE_LOAD_CTRL_ENABLE)

# Configure limits for client
if(NOT TRACE_LOAD_USER_SOFT_LIMIT)
set(TRACE_LOAD_USER_SOFT_LIMIT 83333)
endif()

if(NOT TRACE_LOAD_USER_HARD_LIMIT)
set(TRACE_LOAD_USER_HARD_LIMIT 100000)
endif()

if (TRACE_LOAD_USER_SOFT_LIMIT GREATER TRACE_LOAD_USER_HARD_LIMIT)
message(FATAL_ERROR "TRACE_LOAD_USER_SOFT_LIMIT must be less or equal than TRACE_LOAD_USER_HARD_LIMIT")
endif()

add_definitions(-DDLT_TRACE_LOAD_CLIENT_HARD_LIMIT_DEFAULT=${TRACE_LOAD_USER_HARD_LIMIT})
add_definitions(-DDLT_TRACE_LOAD_CLIENT_SOFT_LIMIT_DEFAULT=${TRACE_LOAD_USER_SOFT_LIMIT})

# Configure limits for daemon
if(NOT TRACE_LOAD_DAEMON_SOFT_LIMIT)
set(TRACE_LOAD_DAEMON_SOFT_LIMIT 0)
endif()

if(NOT TRACE_LOAD_DAEMON_HARD_LIMIT)
set(TRACE_LOAD_DAEMON_HARD_LIMIT 0)
endif()

if (TRACE_LOAD_DAEMON_SOFT_LIMIT GREATER TRACE_LOAD_DAEMON_HARD_LIMIT)
message(FATAL_ERROR "TRACE_LOAD_DAEMON_SOFT_LIMIT must be less or equal than TRACE_LOAD_DAEMON_HARD_LIMIT")
endif()

add_definitions(-DDLT_TRACE_LOAD_DAEMON_HARD_LIMIT_DEFAULT=${TRACE_LOAD_DAEMON_HARD_LIMIT})
add_definitions(-DDLT_TRACE_LOAD_DAEMON_SOFT_LIMIT_DEFAULT=${TRACE_LOAD_DAEMON_SOFT_LIMIT})

endif(WITH_DLT_TRACE_LOAD_CTRL)

add_subdirectory(doc)
add_subdirectory(src)
add_subdirectory(include)
Expand Down Expand Up @@ -489,6 +529,7 @@ message(STATUS "WITH_EXTENDED_FILTERING = ${WITH_EXTENDED_FILTERING}")
message(STATUS "WITH_DLT_DISABLE_MACRO = ${WITH_DLT_DISABLE_MACRO}")
message(STATUS "WITH_DLT_FILE_LOGGING_SYSLOG_FALLBACK = ${WITH_DLT_FILE_LOGGING_SYSLOG_FALLBACK}")
message(STATUS "WITH_DLT_LOG_LEVEL_APP_CONFIG = ${WITH_DLT_LOG_LEVEL_APP_CONFIG}")
message(STATUS "WITH_DLT_TRACE_LOAD_CTRL = ${WITH_DLT_TRACE_LOAD_CTRL}" )
message(STATUS "Change a value with: cmake -D<Variable>=<Value>")
message(STATUS "-------------------------------------------------------------------------------")
message(STATUS)
4 changes: 4 additions & 0 deletions doc/dlt-daemon.1.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,10 @@ COVESA system or more likely on a external tester device.
: Load an alternative configuration for app id log level defaults.
By default, the configuration file /etc/dlt-log-levels.conf is loaded.

-l

: Load an alternative trace load configuration file. By default the configuration file /etc/dlt-trace-load.conf is loaded.


# EXAMPLES

Expand Down
110 changes: 110 additions & 0 deletions include/dlt/dlt_common.h
Original file line number Diff line number Diff line change
Expand Up @@ -828,6 +828,116 @@ extern "C"
{
# endif


#ifdef DLT_TRACE_LOAD_CTRL_ENABLE
/* For trace load control feature */

#include <pthread.h>
/* For trace load control */
#ifdef DLT_TRACE_LOAD_CTRL_ENABLE

/* Number of slots in window for recording trace load (Default: 60)
* Average trace load in this window will be used as trace load
* Older time data than this size will be removed from trace load
*/
#define DLT_TRACE_LOAD_WINDOW_SIZE (60)

/* Window resolution in unit of timestamp (Default: 10000 x 0.1 msec = 1 sec)
* This value is same as size of 1 slot of window.
* Actual window size in sec can be calculated by
* DLT_TRACE_LOAD_WINDOW_SIZE x DLT_TRACE_LOAD_WINDOW_RESOLUTION / DLT_TIMESTAMP_RESOLUTION.
* (Default: 60 x 10000 / 10000 = 60 sec)
* FIXME: When timestamp resolution of dlt is changed from 0.1 msec,
* then DLT_TRACE_LOAD_WINDOW_RESOLUTION value also has to be updated accordingly.
*/
#define DLT_TRACE_LOAD_WINDOW_RESOLUTION (10000)

/* Special Context ID for output soft_limit/hard_limit over warning message (DLT LIMITS) */
#define DLT_INTERNAL_CONTEXT_ID ("DLTL")

/* Frequency in which warning messages are logged in seconds when an application is over the soft limit
* Unit of this value is Number of slot of window.
* NOTE: Size of the slot depends on value of DLT_TRACE_LOAD_WINDOW_RESOLUTION
* (Default: 10 slots = 10000 x 0.1 msec = 10 sec)
*/
#define DLT_SOFT_LIMIT_WARN_FREQUENCY (10)

/* Frequency in which warning messages are logged in seconds when an application is over the hard limit
* Unit of this value is Number of slot of window.
* NOTE: Size of the slot depends on value of DLT_TRACE_LOAD_WINDOW_RESOLUTION
* (Default: 10 slots = 10000 x 0.1 msec = 10 sec)
*/
#define DLT_HARD_LIMIT_WARN_FREQUENCY (10)

/* Timestamp resolution of 1 second (Default: 10000 -> 1/10000 = 0.0001sec = 0.1msec)
* This value is defined as reciprocal of the resolution (1 / DLT_TIMESTAMP_RESOLUTION)
* FIXME: When timestamp resolution of dlt is changed from 0.1 msec,
* then DLT_TIMESTAMP_RESOLUTION value also has to be updated accordingly.
*/
#define DLT_TIMESTAMP_RESOLUTION (10000)

#endif

typedef struct
{
// Window for recording total bytes for each slots [bytes]
uint64_t window[DLT_TRACE_LOAD_WINDOW_SIZE];
uint64_t total_bytes_of_window; // Grand total bytes of whole window [bytes]
uint32_t curr_slot; // Current slot No. of window [slot No.]
uint32_t last_slot; // Last slot No. of window [slot No.]
uint32_t curr_abs_slot; // Current absolute slot No. of window [slot No.]
uint32_t last_abs_slot; // Last absolute slot No. of window [slot No.]
uint64_t avg_trace_load; // Average trace load of whole window [bytes/sec]
uint32_t hard_limit_over_counter; // Discarded message counter due to hard limit over [msg]
uint32_t hard_limit_over_bytes; // Discarded message bytes due to hard limit over [msg]
uint32_t slot_left_soft_limit_warn; // Slot left to output next warning of soft limit over [slot No.]
uint32_t slot_left_hard_limit_warn; // Slot left to output next warning of hard limit over [slot No.]
bool is_over_soft_limit; // Flag if trace load has been over soft limit
bool is_over_hard_limit; // Flag if trace load has been over hard limit
} DltTraceLoadStat;

/*
* The parameter of trace load settings
*/
typedef struct
{
char apid[DLT_ID_SIZE]; /**< Application id for which the settings are valid */
char ctid[DLT_ID_SIZE]; /**< Context id for which the settings are valid, this is optional */

uint32_t soft_limit; /**< Warning threshold, if load is above soft limit a warning will be logged but message won't be discarded */
uint32_t hard_limit; /**< limit threshold, if load is above hard limit a warning will be logged and message will be discarded */

DltTraceLoadStat tl_stat;
} DltTraceLoadSettings;

extern pthread_rwlock_t trace_load_rw_lock;

#ifndef UINT32_MAX
#define UINT32_MAX 0xFFFFFFFF
#endif

/* Precomputation */
static const uint64_t TIMESTAMP_BASED_WINDOW_SIZE = DLT_TRACE_LOAD_WINDOW_SIZE * DLT_TRACE_LOAD_WINDOW_RESOLUTION;
typedef DltReturnValue (DltLogInternal)(DltLogLevelType loglevel, const char *text, void* params);
bool dlt_check_trace_load(
DltTraceLoadSettings* tl_settings,
int32_t log_level,
uint32_t timestamp,
int32_t size,
DltLogInternal internal_dlt_log,
void *internal_dlt_log_params);

/**
* Find the runtime trace load settings for the given application id and context id.
* @param settings Array with all settings
* @param settings_count Size of settings
* @param apid The apid to search for
* @param ctid The context id to search for, can be NULL
* @return A sorted array with all settings that match the given apid and ctid
*/
DltTraceLoadSettings* dlt_find_runtime_trace_load_settings(DltTraceLoadSettings *settings, uint32_t settings_count, const char* apid, const char* ctid);
#endif

/**
* Helper function to print a byte array in hex.
* @param ptr pointer to the byte array.
Expand Down
3 changes: 3 additions & 0 deletions include/dlt/dlt_types.h
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,9 @@ typedef unsigned int speed_t;
*/
typedef enum
{
#ifdef DLT_TRACE_LOAD_CTRL_ENABLE
DLT_RETURN_LOAD_EXCEEDED = -9,
#endif
DLT_RETURN_FILESZERR = -8,
DLT_RETURN_LOGGING_DISABLED = -7,
DLT_RETURN_USER_BUFFER_FULL = -6,
Expand Down
3 changes: 3 additions & 0 deletions include/dlt/dlt_user.h.in
Original file line number Diff line number Diff line change
Expand Up @@ -268,6 +268,9 @@ typedef struct
DltUserConnectionState connection_state;
# endif
uint16_t log_buf_len; /**< length of message buffer, by default: DLT_USER_BUF_MAX_SIZE */
#ifdef DLT_TRACE_LOAD_CTRL_ENABLE
pthread_rwlock_t trace_load_limit_lock;
#endif
} DltUser;

typedef int (*dlt_injection_callback_id)(uint32_t, void *, uint32_t, void *);
Expand Down
6 changes: 6 additions & 0 deletions src/daemon/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -104,3 +104,9 @@ if (WITH_DLT_LOG_LEVEL_APP_CONFIG)
DESTINATION ${CONFIGURATION_FILES_DIR}
COMPONENT base)
endif()

if (WITH_DLT_TRACE_LOAD_CTRL)
INSTALL(FILES dlt-trace-load.conf
DESTINATION ${CONFIGURATION_FILES_DIR}
COMPONENT base)
endif()
Loading

0 comments on commit 0496fe8

Please sign in to comment.