-
Notifications
You must be signed in to change notification settings - Fork 298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
trace-load-limits: restrict trace load for applications #671
trace-load-limits: restrict trace load for applications #671
Conversation
On an unrelated note: Maybe it would make sense for the project to create different pipelines where features like test are actually tested. We have some tests that are only enabled when a feature is turned on but in the pipeline never is. |
345ffd2
to
0496fe8
Compare
In a shared system many developers log as much as they want into DLT. This can lead into overloading the logging system resulting in bad performance and dropped logs. This commit introduces trace load limits to restrict applications to a certain log volume measured in bytes/s. It is based on COVESA#134 but extends this heavily. Trace load limits are configured via a space separted configuraiton file. The format of the file follows: APPID [CTXID] SOFT_LIMIT HARD_LIMIT The most matching entry will be selected for each log, meaning that either app and context must match or at least the app id, for which a default is created when not configured. This allows to configure trace load for single contexts which can be used for example to limit different applications in the qnx slog to a given budget without affecting others or to give a file transfer unlimited bandwidth. It is recommended to always specify a budget for the application id without the contexts to ensure new contexts and internal logs like DLTL can be logged. Applications are starting up with a default limit defined via CMake variables TRACE_LOAD_USER_HARD_LIMIT, TRACE_LOAD_USER_SOFT_LIMIT. As soon as the connection to the daemon was succesull each configuration entry matching the app id will be sent to the client via an application message. If no configuration is found TRACE_LOAD_DAEMON_HARD_LIMIT and TRACE_LOAD_USER_SOFT_LIMIT will be used instead. The two staged configuration process makes sure that the daemon default can be set to 0, forcing developers to define a limit for their application while making sure that applications are able to log before they received the log levels. Measuring the trace load is done in the daemon and the client. Dropping messages on the client side is the primary mechanism and prevents sending logs to the daemon only to be dropped there, which reduces the load of the IPC. Measuring again on daemon side makes sure that rouge clients are still limited to the trace load defined. Exceeding the limit soft will produce the following logs: ECU1- DEMO DLTL log warn V 1 [Trace load exceeded trace soft limit on apid: DEMO. (soft limit: 2000 bytes/sec, current: 2414 bytes/sec)] ECU1- DEMO DLTL log warn V 1 [Trace load exceeded trace soft limit on apid: DEMO, ctid TEST.(soft limit: 150 bytes/sec, current: 191 bytes/sec)] Exceeding the hard limit will produce the same message but the text '42 messages discarded.' is appended and messages will be dropped. Dropped messages are lost and cannot be recovered, which forces developers to get their logging volume under control. As debug and trace load are usually disabled for production and thus do not impact the performance of actual systems these logs are not accounted for in trace load limits. In practice this turned out to be very usefull to improve developer experience while maintaining good performance, as devs know that debug and trace logs are only available during development and important information has to be logged on a higher level. To simplify creating a trace limit base line the script 'utils/calculate-load.py' is provided which makes suggestions for the limits based on actual log volume. Signed-off-by: Alexander Mohr <[email protected]>
0496fe8
to
398ac45
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @alexmohr, I've provided a partial feedback from my side ( I've read around 25% of PR ). I will continue to review this change next week.
@svlad-90 Thanks for the review, all good findings. I will apply them once your done completely with the review or when our internal review is done, so I don't have to start multiple times :) |
Hi @alexmohr, I've finished my review. I have no additional findings in the source code. I know this change is functional, as I used it in at least two production projects in the past ( without the context ID extension ). Also, I confirm that the developed unit tests are meaningful. However, I'm not an expert in the dlt-daemon code. I could miss something in the proposed C code with all those memset-s, etc. I'm a C++ developer who works at a slightly higher level. )) @minminlittleshrimp, can someone from maintainers also have a look at this change? I ask for that to be on the safe side. @alexmohr, I also have the following request for you - it would be great if you could:
That's all from my side as of now. |
@svlad-90 Fixed review findings. I'll remove the draft as soon as we're done with the internal review (probably a couple of days) |
Hi @alexmohr, I've checked the dlt_trace_limit.md in your PR and slightly adjusted its content: dlt_trace_limit.md. I've fixed some grammar and logical issues and added the table of contents. Feel free to add it to your PR instead of the original document if you find the updated version more suitable. |
* Add documentation * remove broken statistics * set defaults for unit tests differently.
49e846c
to
6aaae63
Compare
Thank you, that's way better 👍 |
Hi @minminlittleshrimp, do you have any updates regarding this PR? It was marked as "Ready for review" over a week ago. |
Hello @svlad-90 |
Hello @Le-Tin Could you kindly support this topic? Thanks in advance |
Hi @Le-Tin, @Bichdao021195. Kind reminder. Are there any updates from your side? )) |
Hello @svlad-90 , Sorry for my missing the attach name to this Pull request. I have just looked and seen you changed a lot of codes. I need the time to check it and response you later |
Hello all, |
Hi @lti9hc, Thank you so much for the update! We are looking forward to getting the review results. |
Hello all, |
I can spend a PR to do that, and yes, all features should be tested somehow. |
Converting this to draft again, because we found another issue that may lead to a segfault in libdlt. |
added a new commit which addresses the issue I mentioned yesterday. |
Hello @alexmohr @Bichdao021195 |
I'm not 100% sure what you want to see here.
Above is the output of dlt when application run into the trace load limits.
So from a user a perspective if logs show up with context id DLTL and something like "trace load exceeded" they know that messages have discarded or if the soft limit exceeded that they are close to it. |
A live demo is great, we would like to see how this feature takes action in a heavily logging system. |
Hi @minminlittleshrimp
|
I can't show it to you on an actual system, as the logging data from real systems is confidential. I would have shown a demo pretty much the same way @Bichdao021195 already did. |
pthread_rwlock_init(&trace_load_rw_lock, NULL); | ||
pthread_rwlock_wrlock(&trace_load_rw_lock); | ||
|
||
trace_load_settings = malloc(sizeof(DltTraceLoadSettings)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kindly please ensure that the memory is allocated successfully before accessing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -318,6 +386,12 @@ int dlt_daemon_init(DltDaemon *daemon, | |||
|
|||
daemon->state = DLT_DAEMON_STATE_INIT; /* initial logging state */ | |||
|
|||
#ifdef DLT_TRACE_LOAD_CTRL_ENABLE | |||
daemon->preconfigured_trace_load_settings = NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need init value for daemon->preconfigured_trace_load_settings_count?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is good practice to initialize memory, so Nullchecks work
#ifdef DLT_TRACE_LOAD_CTRL_ENABLE | ||
if (daemon->preconfigured_trace_load_settings != NULL) { | ||
free(daemon->preconfigured_trace_load_settings); | ||
daemon->preconfigured_trace_load_settings = NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it required to reset preconfigured_trace_load_settings_count to 0 after deallocating.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is good practice, to make sure we are not using empty trace load settings
src/daemon/dlt_daemon_common.c
Outdated
if (application->trace_load_settings == NULL) { | ||
DltTraceLoadSettings* pre_configured_trace_load_settings = NULL; | ||
int num_settings = 0; | ||
DltReturnValue rv = dlt_daemon_find_preconfigured_trace_load_settings( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if it is possible, kindly please using the meaning name instead of "rv"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RV is pretty common for return value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed to find_trace_settings_return_value
(int32_t) (sizeof(DltUserHeader) + sizeof(uint32_t )); | ||
if (receiver->bytesRcvd < (int32_t)trace_load_settings_user_message_bytes_required) { | ||
// Not enough data to read the message length | ||
leave_while = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if it is possible, kindly please using the meaning name instead of "leave_while".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't do, leave_while is pr existing code. This PR introduces enough changes already so I won't be doing refactorings here.
@minminlittleshrimp does that work for you ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me, I am okay with current changes and I think it can be delivered/merged anytime.
Further refactoring and investigating can be spent on other PRs later under enhancement/refactor labels.
TBD from me to Alex in any future PR: why we dont make use of original dlt parser and using filter section concept like in dltlogstorage.conf, dlt gateway.conf but to reimplement new one just for trace limit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me, I am okay with current changes and I think it can be delivered/merged anytime.
I'll fix the remaining findings from @Bichdao021195, especially the malloc comments.
using filter section concept like in dltlogstorage.conf
First of all because I wasn't aware of the concept. But isn't this key value only? For trace load limits we have up to three values per key
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @alexmohr
The parser supports detecting the section and parsing the field, for the field it can be,saying:
[LIMIT1]
Apid=LOG
Ctid=TEST
Soft=
Hard=
I might think of smt like this, but yeah, just get this huge feature merged because I dont see any reasons to hold it for now. I can then raise a PR for this if I have time to do so.
I recommend to soon merge this change as it looks useful to dlt to have this feature added.
Hello @michael-methner
Could you spend sometimes review before we merge it? Your comment will be valuable to us.
To DLT team, if you guys still need times then we can prepare some questions for Q&A after the demo section, when everything is clear and we can see how the actual working limit from the author perspective. For now I suggest we go applying this change and test under high traffic of, e.g, QNX, Android. A Rasp4 already up and run on my desk with AOSP14 so I also try this myself 😋
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remaining findings from @Bichdao021195 are done and the ones I don't want to change because I think it's good practice to properly init and reset values are commented.
Looking forward to the demo next week :)
|
||
char msg[255]; | ||
trace_load_settings_alloc_size = sizeof(DltTraceLoadSettings) * trace_load_settings_user_messages_count; | ||
trace_load_settings = malloc(trace_load_settings_alloc_size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kindly please ensure that the memory is allocated successfully before accessing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Hi @alexmohr |
26th sept 1pm-2pm CEST works for me.
26th sept 1pm-2pm CEST works for me. |
Sorry to quote again 😅 |
732a3f9
to
8f332d0
Compare
Hi @alexmohr @minminlittleshrimp |
HI @alexmohr @minminlittleshrimp |
Hi, you can set the defaults in cmake for user and daemon to uint32_max (2^32-1) and configure apps you want to restrict with a lower limit via the configuration file. |
IMPORTANT: Build fail in local!
|
Signed-off-by: Alexander Mohr <[email protected]>
Signed-off-by: Alexander Mohr <[email protected]>
8f332d0
to
cf63c53
Compare
oops... fixed. |
I know how this feature works, so my participation in the demo is not mandatory. I've already seen it in action. )) |
Thanks @svlad-90 for dropping fb For the live demo DLT would like to have an insight of the feature and also the extensibility and future plan so it is optional for you. Thanks for your participation in building up this feature and hope to cooperate with you in the future in more dlt topics. Minh |
Signed-off-by: Alexander Mohr <[email protected]>
Yesterday we already had live debug session, to me the session was good. |
In a shared system many developers log as much as they want into DLT. This can lead into overloading the logging system resulting in bad performance and dropped logs. This commit introduces trace load limits to restrict applications to a certain log volume measured in bytes/s. It is based on #134 but extends this heavily.
Trace load limits are configured via a space separted configuraiton file.
The format of the file follows:
APPID [CTXID] SOFT_LIMIT HARD_LIMIT
The most matching entry will be selected for each log, meaning that either app and context must match or at least the app id, for which a default is created when not configured.
This allows to configure trace load for single contexts which can be used for example to limit different applications in the qnx slog to a given budget without affecting others or to give a file transfer unlimited bandwidth.
It is recommended to always specify a budget for the application id without the contexts to ensure new contexts and internal logs like DLTL can be logged.
Applications are starting up with a default limit defined via CMake variables TRACE_LOAD_USER_HARD_LIMIT, TRACE_LOAD_USER_SOFT_LIMIT. As soon as the connection to the daemon was succesull each configuration entry matching the app id will be sent to the client via an application message.
If no configuration is found TRACE_LOAD_DAEMON_HARD_LIMIT and TRACE_LOAD_USER_SOFT_LIMIT will be used instead.
The two staged configuration process makes sure that the daemon default can be set to 0, forcing developers to define a limit for their application while making sure that applications are able to log before they received the log levels.
Measuring the trace load is done in the daemon and the client. Dropping messages on the client side is the primary mechanism and prevents sending logs to the daemon only to be dropped there, which reduces the load of the IPC. Measuring again on daemon side makes sure that rouge clients are still limited to the trace load defined.
Exceeding the limit soft will produce the following logs: ECU1- DEMO DLTL log warn V 1 [Trace load exceeded trace soft limit on apid: DEMO. (soft limit: 2000 bytes/sec, current: 2414 bytes/sec)] ECU1- DEMO DLTL log warn V 1 [Trace load exceeded trace soft limit on apid: DEMO, ctid TEST.(soft limit: 150 bytes/sec, current: 191 bytes/sec)]
Exceeding the hard limit will produce the same message but the text '42 messages discarded.' is appended and messages will be dropped. Dropped messages are lost and cannot be recovered, which forces developers to get their logging volume under control.
As debug and trace load are usually disabled for production and thus do not impact the performance of actual systems these logs are not accounted for in trace load limits. In practice this turned out to be very usefull to improve developer experience while maintaining good performance, as devs know that debug and trace logs are only available during development and important information has to be logged on a higher level.
To simplify creating a trace limit base line the script 'utils/calculate-load.py' is provided which makes suggestions for the limits based on actual log volume.
fixes #643
@svlad-90
Please note that the final internal review is not done yet, so details might still change here and therefore the PR is marked as draft. I'm opening this already as we will benefit internally from the review here on the upstream project as well.
The program was tested solely for our own use cases, which might differ from yours.
Licensed under Mozilla Public License Version 2.0
Alexander Mohr, [email protected], Mercedes-Benz Tech Innovation GmbH, imprint