Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

trace-load-limits: restrict trace load for applications #671

Merged
merged 6 commits into from
Sep 27, 2024

Conversation

alexmohr
Copy link
Contributor

In a shared system many developers log as much as they want into DLT. This can lead into overloading the logging system resulting in bad performance and dropped logs. This commit introduces trace load limits to restrict applications to a certain log volume measured in bytes/s. It is based on #134 but extends this heavily.

Trace load limits are configured via a space separted configuraiton file.
The format of the file follows:

APPID [CTXID] SOFT_LIMIT HARD_LIMIT
The most matching entry will be selected for each log, meaning that either app and context must match or at least the app id, for which a default is created when not configured.
This allows to configure trace load for single contexts which can be used for example to limit different applications in the qnx slog to a given budget without affecting others or to give a file transfer unlimited bandwidth.
It is recommended to always specify a budget for the application id without the contexts to ensure new contexts and internal logs like DLTL can be logged.

Applications are starting up with a default limit defined via CMake variables TRACE_LOAD_USER_HARD_LIMIT, TRACE_LOAD_USER_SOFT_LIMIT. As soon as the connection to the daemon was succesull each configuration entry matching the app id will be sent to the client via an application message.
If no configuration is found TRACE_LOAD_DAEMON_HARD_LIMIT and TRACE_LOAD_USER_SOFT_LIMIT will be used instead.
The two staged configuration process makes sure that the daemon default can be set to 0, forcing developers to define a limit for their application while making sure that applications are able to log before they received the log levels.

Measuring the trace load is done in the daemon and the client. Dropping messages on the client side is the primary mechanism and prevents sending logs to the daemon only to be dropped there, which reduces the load of the IPC. Measuring again on daemon side makes sure that rouge clients are still limited to the trace load defined.

Exceeding the limit soft will produce the following logs: ECU1- DEMO DLTL log warn V 1 [Trace load exceeded trace soft limit on apid: DEMO. (soft limit: 2000 bytes/sec, current: 2414 bytes/sec)] ECU1- DEMO DLTL log warn V 1 [Trace load exceeded trace soft limit on apid: DEMO, ctid TEST.(soft limit: 150 bytes/sec, current: 191 bytes/sec)]

Exceeding the hard limit will produce the same message but the text '42 messages discarded.' is appended and messages will be dropped. Dropped messages are lost and cannot be recovered, which forces developers to get their logging volume under control.

As debug and trace load are usually disabled for production and thus do not impact the performance of actual systems these logs are not accounted for in trace load limits. In practice this turned out to be very usefull to improve developer experience while maintaining good performance, as devs know that debug and trace logs are only available during development and important information has to be logged on a higher level.

To simplify creating a trace limit base line the script 'utils/calculate-load.py' is provided which makes suggestions for the limits based on actual log volume.


fixes #643
@svlad-90
Please note that the final internal review is not done yet, so details might still change here and therefore the PR is marked as draft. I'm opening this already as we will benefit internally from the review here on the upstream project as well.

The program was tested solely for our own use cases, which might differ from yours.
Licensed under Mozilla Public License Version 2.0

Alexander Mohr, [email protected], Mercedes-Benz Tech Innovation GmbH, imprint

@alexmohr
Copy link
Contributor Author

On an unrelated note: Maybe it would make sense for the project to create different pipelines where features like test are actually tested. We have some tests that are only enabled when a feature is turned on but in the pipeline never is.
Let me know if I should a new github pipeline where trace load limits are enabled in the scope of this PR or if this should be solved separately as this change already is huge.

@alexmohr alexmohr force-pushed the add-trace-load-limits branch 3 times, most recently from 345ffd2 to 0496fe8 Compare August 8, 2024 15:54
In a shared system many developers log as much as they
want into DLT. This can lead into overloading the logging
system resulting in bad performance and dropped logs.
This commit introduces trace load limits to restrict
applications to a certain log volume measured in bytes/s.
It is based on COVESA#134 but extends this heavily.

Trace load limits are configured via a space separted
configuraiton file.
The format of the file follows:

APPID [CTXID] SOFT_LIMIT HARD_LIMIT
The most matching entry will be selected for each log, meaning that
either app and context must match or at least the app id, for which a
default is created when not configured.
This allows to configure trace load for single contexts which can be
used for example to limit different applications in the qnx slog to a
given budget without affecting others or to give a file transfer
unlimited bandwidth.
It is recommended to always specify a budget for the application id
without the contexts to ensure new contexts and internal logs like DLTL
can be logged.

Applications are starting up with a default limit defined
via CMake variables TRACE_LOAD_USER_HARD_LIMIT, TRACE_LOAD_USER_SOFT_LIMIT.
As soon as the connection to the daemon was succesull each configuration
entry matching the app id will be sent to the client via an
application message.
If no configuration is found TRACE_LOAD_DAEMON_HARD_LIMIT and
TRACE_LOAD_USER_SOFT_LIMIT will be used instead.
The two staged configuration process makes sure that the daemon default
can be set to 0, forcing developers to define a limit for their
application while making sure that applications are able to log before
they received the log levels.

Measuring the trace load is done in the daemon and the client.
Dropping messages on the client side is the primary mechanism and
prevents sending logs to the daemon only to be dropped there, which
reduces the load of the IPC. Measuring again on daemon side makes
sure that rouge clients are still limited to the trace load defined.

Exceeding the limit soft will produce the following logs:
ECU1- DEMO DLTL log warn V 1 [Trace load exceeded trace soft limit on apid: DEMO. (soft limit: 2000 bytes/sec, current: 2414 bytes/sec)]
ECU1- DEMO DLTL log warn V 1 [Trace load exceeded trace soft limit on apid: DEMO, ctid TEST.(soft limit: 150 bytes/sec, current: 191 bytes/sec)]

Exceeding the hard limit will produce the same message but the text
'42 messages discarded.' is appended and messages will be dropped.
Dropped messages are lost and cannot be recovered, which forces
developers to get their logging volume under control.

As debug and trace load are usually disabled for production and thus do
not impact the performance of actual systems these logs are not
accounted for in trace load limits. In practice this turned out to be
very usefull to improve developer experience while maintaining good
performance, as devs know that debug and trace logs are only available
during development and important information has to be logged on a
higher level.

To simplify creating a trace limit base line the script
'utils/calculate-load.py' is provided which makes suggestions
for the limits based on actual log volume.

Signed-off-by: Alexander Mohr <[email protected]>
@alexmohr alexmohr force-pushed the add-trace-load-limits branch from 0496fe8 to 398ac45 Compare August 8, 2024 15:59
Copy link

@svlad-90 svlad-90 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @alexmohr, I've provided a partial feedback from my side ( I've read around 25% of PR ). I will continue to review this change next week.

include/dlt/dlt_common.h Outdated Show resolved Hide resolved
include/dlt/dlt_common.h Show resolved Hide resolved
src/daemon/dlt-daemon.c Outdated Show resolved Hide resolved
src/daemon/dlt-daemon.c Show resolved Hide resolved
@alexmohr
Copy link
Contributor Author

alexmohr commented Aug 9, 2024

@svlad-90 Thanks for the review, all good findings. I will apply them once your done completely with the review or when our internal review is done, so I don't have to start multiple times :)

@svlad-90
Copy link

Hi @alexmohr, I've finished my review. I have no additional findings in the source code. I know this change is functional, as I used it in at least two production projects in the past ( without the context ID extension ). Also, I confirm that the developed unit tests are meaningful.

However, I'm not an expert in the dlt-daemon code. I could miss something in the proposed C code with all those memset-s, etc. I'm a C++ developer who works at a slightly higher level. ))

@minminlittleshrimp, can someone from maintainers also have a look at this change? I ask for that to be on the safe side.


@alexmohr, I also have the following request for you - it would be great if you could:

  • Create a separate "/doc/dlt_trace_limit.md" file
  • Place there the extended version of the first comment in this PR ( the big one ) that describes the whole feature
  • Extend the document with details regarding how the 'slots' and 'window' concepts work. It is hard to get that idea from the source code. Some high-level descriptions would be much easier to understand.

That's all from my side as of now.

@alexmohr
Copy link
Contributor Author

@svlad-90 Fixed review findings.
I also removed the statics things, as it only worked for FIFO and compilation was broken.
We didn't notice this internally as FIFO isn't used. It still can be added in a later PR if we want to have it upstream, but makes review of the PR a bit easier.

I'll remove the draft as soon as we're done with the internal review (probably a couple of days)

@minminlittleshrimp minminlittleshrimp self-assigned this Aug 13, 2024
@svlad-90
Copy link

Hi @alexmohr, I've checked the dlt_trace_limit.md in your PR and slightly adjusted its content: dlt_trace_limit.md. I've fixed some grammar and logical issues and added the table of contents. Feel free to add it to your PR instead of the original document if you find the updated version more suitable.

* Add documentation
* remove broken statistics
* set defaults for unit tests differently.
@alexmohr alexmohr force-pushed the add-trace-load-limits branch from 49e846c to 6aaae63 Compare August 15, 2024 09:54
@alexmohr alexmohr marked this pull request as ready for review August 15, 2024 09:55
@alexmohr
Copy link
Contributor Author

Hi @alexmohr, I've checked the dlt_trace_limit.md in your PR and slightly adjusted its content: dlt_trace_limit.md. I've fixed some grammar and logical issues and added the table of contents. Feel free to add it to your PR instead of the original document if you find the updated version more suitable.

Thank you, that's way better 👍

@svlad-90
Copy link

svlad-90 commented Aug 23, 2024

Hi @minminlittleshrimp, do you have any updates regarding this PR? It was marked as "Ready for review" over a week ago.

@minminlittleshrimp
Copy link
Collaborator

Hello @svlad-90
sorry I dont have time for reviewing this PR. I will ask other DLT maintainers to support you.
Thanks for your understanding

@minminlittleshrimp
Copy link
Collaborator

Hello @Le-Tin
Hello @Bichdao021195

Could you kindly support this topic?

Thanks in advance

@svlad-90
Copy link

Hi @Le-Tin, @Bichdao021195. Kind reminder. Are there any updates from your side? ))

@lti9hc
Copy link
Collaborator

lti9hc commented Aug 30, 2024

Hello @svlad-90 ,

Sorry for my missing the attach name to this Pull request.

I have just looked and seen you changed a lot of codes.

I need the time to check it and response you later

@lti9hc
Copy link
Collaborator

lti9hc commented Sep 5, 2024

Hello all,
I have just completed for review about 70%, I will continue review later

@svlad-90
Copy link

svlad-90 commented Sep 5, 2024

Hi @lti9hc, Thank you so much for the update! We are looking forward to getting the review results.

@minminlittleshrimp
Copy link
Collaborator

minminlittleshrimp commented Sep 6, 2024

Hello all,
Just friendly say thanks for this nice work, especially @alexmohr and @svlad-90, I was quite busy and rarely had time to work with you on this interesting topic, sorry for that. I am back now for some learning your work and will add some personal insight to your great job.
Once again thanks for your understanding and being patient.

@minminlittleshrimp
Copy link
Collaborator

a new github pipeline where trace load limits are enabled in the scope of this PR or if this should be solved separately as this change already is huge.

I can spend a PR to do that, and yes, all features should be tested somehow.
We gonna find the way.

@alexmohr alexmohr marked this pull request as draft September 10, 2024 08:02
@alexmohr
Copy link
Contributor Author

Converting this to draft again, because we found another issue that may lead to a segfault in libdlt.

@alexmohr alexmohr marked this pull request as ready for review September 11, 2024 08:43
@alexmohr
Copy link
Contributor Author

added a new commit which addresses the issue I mentioned yesterday.
This will happen when the message from the daemon is received while trace load settings are processed.
We passed the nullpointer checks, but after that the trace load settings will be set to NULL and be used resulting in a segfault.
The commit protects the trace load settings with a write lock.
A write lock has been used, to make sure trace load settings are only modified in one thread at a time, to prevent inconsistencies.

@minminlittleshrimp
Copy link
Collaborator

Hello @alexmohr @Bichdao021195
Could anyone please give some demo using dlt binaries to observe the trace log limit in action?
I would like to have a tester/user-experience view of this feature.
Thanks in advance

@alexmohr
Copy link
Contributor Author

alexmohr commented Sep 15, 2024

Hello @alexmohr @Bichdao021195 Could anyone please give some demo using dlt binaries to observe the trace log limit in action? I would like to have a tester/user-experience view of this feature. Thanks in advance

I'm not 100% sure what you want to see here.

512434 2024/09/14 18:58:22.036219   21511302 149 LNX- SYS- DLTL log warn V 1 [Trace load exceeded trace hard limit.(hard limit: 200000 bytes/sec, current: 200924 bytes/sec) 301 messages discarded.]

Above is the output of dlt when application run into the trace load limits.
The limits for SYS are configured like this

SYS 170000 200000

So from a user a perspective if logs show up with context id DLTL and something like "trace load exceeded" they know that messages have discarded or if the soft limit exceeded that they are close to it.
I'll gladly provide more information / demo if you can be a bit more precise.
I could also do a live demo if you prefer this.

@minminlittleshrimp
Copy link
Collaborator

Hello @alexmohr @Bichdao021195 Could anyone please give some demo using dlt binaries to observe the trace log limit in action? I would like to have a tester/user-experience view of this feature. Thanks in advance

I'm not 100% sure what you want to see here.

512434 2024/09/14 18:58:22.036219   21511302 149 LNX- SYS- DLTL log warn V 1 [Trace load exceeded trace hard limit.(hard limit: 200000 bytes/sec, current: 200924 bytes/sec) 301 messages discarded.]

Above is the output of dlt when application run into the trace load limits. The limits for SYS are configured like this

SYS 170000 200000

So from a user a perspective if logs show up with context id DLTL and something like "trace load exceeded" they know that messages have discarded or if the soft limit exceeded that they are close to it. I'll gladly provide more information / demo if you can be a bit more precise. I could also do a live demo if you prefer this.

A live demo is great, we would like to see how this feature takes action in a heavily logging system.
Could you suggest how we proceed this activity? We can invite @svlad-90 and DLT members if they have time?
Thanks

@Bichdao021195
Copy link
Contributor

Bichdao021195 commented Sep 17, 2024

Hello @alexmohr @Bichdao021195 Could anyone please give some demo using dlt binaries to observe the trace log limit in action? I would like to have a tester/user-experience view of this feature. Thanks in advance

Hi @minminlittleshrimp
The live demo is good idea. But in this mean time , i will share with you
the basic steps to experience the trace load limit from my side are:

**_Pull the request to local:_**
git pull origin pull/671/head:refs/for/master;

**_build and install  with cmd:_** 
cmake -DWITH_DLT_TRACE_LOAD_CTRL=ON .. && sudo make install. 

**_config trace load limits for your app:_**
nad8hc@nad8hc-VirtualBox:~$ cat /usr/local/etc/dlt-trace-load.conf
#Allow LOG app with TEST context to log 100 bytes
#The other context not log.  
LOG 100 100

_**start dlt-daemon: you can see the dlt-daemon config limits as your expectation.** 
nad8hc@nad8hc-VirtualBox:~$ dlt-daemon 
[ 1554.266932]~DLT~ 8463~NOTICE   ~Starting DLT Daemon; DLT Package Version: 2.18.10 STABLE, Package Revision: v2.18.10_50_g6aaae63, build on Sep 17 2024 10:16:58
-SYSTEMD -SYSTEMD_WATCHDOG -TEST -SHM

[ 1554.267034]~DLT~ 8463~INFO     ~FIFO size: 65536
[ 1554.267069]~DLT~ 8463~WARNING  ~Unable to set send timeout Socket operation on non-socket.
[ 1554.267085]~DLT~ 8463~INFO     ~Activate connection type: 5
[ 1554.267095]~DLT~ 8463~INFO     ~dlt_daemon_socket_open: Socket created
[ 1554.267102]~DLT~ 8463~INFO     ~dlt_daemon_socket_open: Listening on ip 0.0.0.0 and port: 3490
[ 1554.267105]~DLT~ 8463~INFO     ~dlt_daemon_socket_open: Socket send queue size: 16384
[ 1554.267115]~DLT~ 8463~INFO     ~Activate connection type: 1
[ 1554.267143]~DLT~ 8463~WARNING  ~dlt_daemon_unix_socket_open: unlink() failed: No such file or directory
[ 1554.267182]~DLT~ 8463~INFO     ~Activate connection type: 9
[ 1554.267205]~DLT~ 8463~INFO     ~Cannot open configuration file: /tmp/dlt-runtime.cfg
[ 1554.267208]~DLT~ 8463~INFO     ~Ringbuffer configuration: 500000/10000000/500000
[ 1554.267378]~DLT~ 8463~NOTICE   ~Failed to open ECU Software version file.
[ 1554.267422]~DLT~ 8463~INFO     ~Configured trace limits for app id 'SYS', soft limit: 83333, hard_limit: 100000
[ 1554.267429]~DLT~ 8463~INFO     ~Configured trace limits for app id 'QSYM', ctx id 'QSLA', soft limit: 83333, hard_limit: 100000
[ 1554.267433]~DLT~ 8463~INFO     ~Configured trace limits for app id 'TEST', soft limit: 2222, hard_limit: 5555
[ 1554.267436]~DLT~ 8463~INFO     ~Configured trace limits for app id 'TEST', ctx id 'FOO', soft limit: 42, hard_limit: 100
[ 1554.267440]~DLT~ 8463~INFO     ~Configured trace limits for app id 'BAR', soft limit: 42, hard_limit: 42
[ 1554.267443]~DLT~ 8463~INFO     ~Configured trace limits for app id 'BAR', ctx id 'BAR', soft limit: 84, hard_limit: 84
**[ 1554.267446]~DLT~ 8463~INFO     ~Configured trace limits for app id 'LOG', soft limit: 100, hard_limit: 100**

**_start dlt-example-user with option -d (delay) to flexible adjust the  sending logs speed:_**
nad8hc@nad8hc-VirtualBox:~$ dlt-example-user -n 10000000000 hello -d 10

_start dlt-recceive to observe how trace load limits works: the warning log will pop up if exceed the limits. 
**nad8hc@nad8hc-VirtualBox:~$ dlt-receive -a localhost**
2024/09/17 10:24:52.382604   15613275 153 ECU1 LOG- TEST log warn V 2 [153 hello]
2024/09/17 10:24:52.393643   15613386 154 ECU1 LOG- TEST log warn V 2 [154 hello]
2024/09/17 10:24:52.403741   15613489 155 ECU1 LOG- TEST log warn V 2 [155 hello]
2024/09/17 10:24:52.414533   15613595 156 ECU1 LOG- TEST log warn V 2 [156 hello]
2024/09/17 10:24:52.424876   15613698 157 ECU1 LOG- TEST log warn V 2 [157 hello]
2024/09/17 10:24:53.057457   15620025 001 ECU1 LOG- DLTL log warn V 1 [Trace load exceeded trace soft limit on apid LOG (soft limit: 100 bytes/sec, current: 151 bytes/sec)]
2024/09/17 10:24:53.057620   15620026 002 ECU1 LOG- DLTL log warn V 1 [Trace load exceeded trace hard limit on apid LOG (hard limit: 100 bytes/sec, current: 151 bytes/sec) 56 messages discarded. ]
2024/09/17 10:25:03.055936   15720011 003 ECU1 LOG- DLTL log warn V 1 [Trace load exceeded trace soft limit on apid LOG (soft limit: 100 bytes/sec, current: 925 bytes/sec)]
**2024/09/17 10:25:03.056007   15720011 004 ECU1 LOG- DLTL log warn V 1 [Trace load exceeded trace hard limit on apid LOG (hard limit: 100 bytes/sec, current: 925 bytes/sec) 910 messages discarded. ]**```
Thank you


@alexmohr
Copy link
Contributor Author

Hello @alexmohr @Bichdao021195 Could anyone please give some demo using dlt binaries to observe the trace log limit in action? I would like to have a tester/user-experience view of this feature. Thanks in advance

I'm not 100% sure what you want to see here.

512434 2024/09/14 18:58:22.036219   21511302 149 LNX- SYS- DLTL log warn V 1 [Trace load exceeded trace hard limit.(hard limit: 200000 bytes/sec, current: 200924 bytes/sec) 301 messages discarded.]

Above is the output of dlt when application run into the trace load limits. The limits for SYS are configured like this

SYS 170000 200000

So from a user a perspective if logs show up with context id DLTL and something like "trace load exceeded" they know that messages have discarded or if the soft limit exceeded that they are close to it. I'll gladly provide more information / demo if you can be a bit more precise. I could also do a live demo if you prefer this.

A live demo is great, we would like to see how this feature takes action in a heavily logging system. Could you suggest how we proceed this activity? We can invite @svlad-90 and DLT members if they have time? Thanks

I can't show it to you on an actual system, as the logging data from real systems is confidential. I would have shown a demo pretty much the same way @Bichdao021195 already did.
If that's cool w/ you on 24th and 26th September my calendar is pretty empty. Im located in Germany so something between 10am and 6pm CEST would work for me.
I could also provide a redacted DLT offline trace.

pthread_rwlock_init(&trace_load_rw_lock, NULL);
pthread_rwlock_wrlock(&trace_load_rw_lock);

trace_load_settings = malloc(sizeof(DltTraceLoadSettings));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kindly please ensure that the memory is allocated successfully before accessing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -318,6 +386,12 @@ int dlt_daemon_init(DltDaemon *daemon,

daemon->state = DLT_DAEMON_STATE_INIT; /* initial logging state */

#ifdef DLT_TRACE_LOAD_CTRL_ENABLE
daemon->preconfigured_trace_load_settings = NULL;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need init value for daemon->preconfigured_trace_load_settings_count?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is good practice to initialize memory, so Nullchecks work

#ifdef DLT_TRACE_LOAD_CTRL_ENABLE
if (daemon->preconfigured_trace_load_settings != NULL) {
free(daemon->preconfigured_trace_load_settings);
daemon->preconfigured_trace_load_settings = NULL;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it required to reset preconfigured_trace_load_settings_count to 0 after deallocating.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is good practice, to make sure we are not using empty trace load settings

if (application->trace_load_settings == NULL) {
DltTraceLoadSettings* pre_configured_trace_load_settings = NULL;
int num_settings = 0;
DltReturnValue rv = dlt_daemon_find_preconfigured_trace_load_settings(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it is possible, kindly please using the meaning name instead of "rv"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RV is pretty common for return value

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to find_trace_settings_return_value

(int32_t) (sizeof(DltUserHeader) + sizeof(uint32_t ));
if (receiver->bytesRcvd < (int32_t)trace_load_settings_user_message_bytes_required) {
// Not enough data to read the message length
leave_while = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it is possible, kindly please using the meaning name instead of "leave_while".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't do, leave_while is pr existing code. This PR introduces enough changes already so I won't be doing refactorings here.
@minminlittleshrimp does that work for you ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, I am okay with current changes and I think it can be delivered/merged anytime.
Further refactoring and investigating can be spent on other PRs later under enhancement/refactor labels.

TBD from me to Alex in any future PR: why we dont make use of original dlt parser and using filter section concept like in dltlogstorage.conf, dlt gateway.conf but to reimplement new one just for trace limit?

Copy link
Contributor Author

@alexmohr alexmohr Sep 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, I am okay with current changes and I think it can be delivered/merged anytime.

I'll fix the remaining findings from @Bichdao021195, especially the malloc comments.

using filter section concept like in dltlogstorage.conf

First of all because I wasn't aware of the concept. But isn't this key value only? For trace load limits we have up to three values per key

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @alexmohr

The parser supports detecting the section and parsing the field, for the field it can be,saying:

[LIMIT1]
Apid=LOG
Ctid=TEST
Soft=
Hard=

I might think of smt like this, but yeah, just get this huge feature merged because I dont see any reasons to hold it for now. I can then raise a PR for this if I have time to do so.

I recommend to soon merge this change as it looks useful to dlt to have this feature added.

Hello @michael-methner
Could you spend sometimes review before we merge it? Your comment will be valuable to us.

To DLT team, if you guys still need times then we can prepare some questions for Q&A after the demo section, when everything is clear and we can see how the actual working limit from the author perspective. For now I suggest we go applying this change and test under high traffic of, e.g, QNX, Android. A Rasp4 already up and run on my desk with AOSP14 so I also try this myself 😋

Copy link
Contributor Author

@alexmohr alexmohr Sep 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remaining findings from @Bichdao021195 are done and the ones I don't want to change because I think it's good practice to properly init and reset values are commented.
Looking forward to the demo next week :)


char msg[255];
trace_load_settings_alloc_size = sizeof(DltTraceLoadSettings) * trace_load_settings_user_messages_count;
trace_load_settings = malloc(trace_load_settings_alloc_size);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kindly please ensure that the memory is allocated successfully before accessing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@Bichdao021195
Copy link
Contributor

Bichdao021195 commented Sep 18, 2024

Hello @alexmohr @Bichdao021195 Could anyone please give some demo using dlt binaries to observe the trace log limit in action? I would like to have a tester/user-experience view of this feature. Thanks in advance

I'm not 100% sure what you want to see here.

512434 2024/09/14 18:58:22.036219   21511302 149 LNX- SYS- DLTL log warn V 1 [Trace load exceeded trace hard limit.(hard limit: 200000 bytes/sec, current: 200924 bytes/sec) 301 messages discarded.]

Above is the output of dlt when application run into the trace load limits. The limits for SYS are configured like this

SYS 170000 200000

So from a user a perspective if logs show up with context id DLTL and something like "trace load exceeded" they know that messages have discarded or if the soft limit exceeded that they are close to it. I'll gladly provide more information / demo if you can be a bit more precise. I could also do a live demo if you prefer this.

A live demo is great, we would like to see how this feature takes action in a heavily logging system. Could you suggest how we proceed this activity? We can invite @svlad-90 and DLT members if they have time? Thanks

I can't show it to you on an actual system, as the logging data from real systems is confidential. I would have shown a demo pretty much the same way @Bichdao021195 already did. If that's cool w/ you on 24th and 26th September my calendar is pretty empty. Im located in Germany so something between 10am and 6pm CEST would work for me. I could also provide a redacted DLT offline trace.

Hi @alexmohr
Is it possible to take part in the meeting at 1pm-2pm CEST 26th Sept ?
Are you using the microsoft team meeting?
Can you give me your mail to send the invitation?

@alexmohr
Copy link
Contributor Author

Hello @alexmohr @Bichdao021195 Could anyone please give some demo using dlt binaries to observe the trace log limit in action? I would like to have a tester/user-experience view of this feature. Thanks in advance

I'm not 100% sure what you want to see here.

512434 2024/09/14 18:58:22.036219   21511302 149 LNX- SYS- DLTL log warn V 1 [Trace load exceeded trace hard limit.(hard limit: 200000 bytes/sec, current: 200924 bytes/sec) 301 messages discarded.]

Above is the output of dlt when application run into the trace load limits. The limits for SYS are configured like this

SYS 170000 200000

So from a user a perspective if logs show up with context id DLTL and something like "trace load exceeded" they know that messages have discarded or if the soft limit exceeded that they are close to it. I'll gladly provide more information / demo if you can be a bit more precise. I could also do a live demo if you prefer this.

A live demo is great, we would like to see how this feature takes action in a heavily logging system. Could you suggest how we proceed this activity? We can invite @svlad-90 and DLT members if they have time? Thanks

I can't show it to you on an actual system, as the logging data from real systems is confidential. I would have shown a demo pretty much the same way @Bichdao021195 already did. If that's cool w/ you on 24th and 26th September my calendar is pretty empty. Im located in Germany so something between 10am and 6pm CEST would work for me. I could also provide a redacted DLT offline trace.

Hi @alexmohr Is it possible to take part in the meeting at 1pm-2pm CEST 26th Sept ? Are you using the microsoft team meeting? Can you give me your mail to send the invitation?

Hi @Bichdao021195

26th sept 1pm-2pm CEST works for me.
If you want to send an invite my mail address is '[email protected]'
@minminlittleshrimp Does this work for you as well? If yes could you post your

Hello @alexmohr @Bichdao021195 Could anyone please give some demo using dlt binaries to observe the trace log limit in action? I would like to have a tester/user-experience view of this feature. Thanks in advance

I'm not 100% sure what you want to see here.

512434 2024/09/14 18:58:22.036219   21511302 149 LNX- SYS- DLTL log warn V 1 [Trace load exceeded trace hard limit.(hard limit: 200000 bytes/sec, current: 200924 bytes/sec) 301 messages discarded.]

Above is the output of dlt when application run into the trace load limits. The limits for SYS are configured like this

SYS 170000 200000

So from a user a perspective if logs show up with context id DLTL and something like "trace load exceeded" they know that messages have discarded or if the soft limit exceeded that they are close to it. I'll gladly provide more information / demo if you can be a bit more precise. I could also do a live demo if you prefer this.

A live demo is great, we would like to see how this feature takes action in a heavily logging system. Could you suggest how we proceed this activity? We can invite @svlad-90 and DLT members if they have time? Thanks

I can't show it to you on an actual system, as the logging data from real systems is confidential. I would have shown a demo pretty much the same way @Bichdao021195 already did. If that's cool w/ you on 24th and 26th September my calendar is pretty empty. Im located in Germany so something between 10am and 6pm CEST would work for me. I could also provide a redacted DLT offline trace.

Hi @alexmohr Is it possible to take part in the meeting at 1pm-2pm CEST 26th Sept ? Are you using the microsoft team meeting? Can you give me your mail to send the invitation?

Hi @Bichdao021195

26th sept 1pm-2pm CEST works for me.
If you want to send an invite my mail address is '[email protected]'. Otherwise you can sent me your mail address and I'll send an invite.
@minminlittleshrimp Does this work for you as well? If yes could you post your mail address.

@minminlittleshrimp
Copy link
Collaborator

minminlittleshrimp commented Sep 18, 2024

If you want to send an invite my mail address is '[email protected]'. Otherwise you can sent me your mail address and I'll send an invite.

@minminlittleshrimp Does this work for you as well? If yes could you post your mail address.

Sorry to quote again 😅
Das ist super! Yes I am okay with this timeline. We actually work in the same team in Bosch, Dao already has my email so @Bichdao021195 please add me to the invitation list: [email protected]
Thanks in advance

@alexmohr alexmohr force-pushed the add-trace-load-limits branch from 732a3f9 to 8f332d0 Compare September 18, 2024 17:53
@Bichdao021195
Copy link
Contributor

Hi @alexmohr @minminlittleshrimp
I have already scheduled the meeting as our aligned time.
Kindly let me know if you have received the invitation successfully.
Please inform me if there are any issues on your end.
Thank you

@Bichdao021195
Copy link
Contributor

HI @alexmohr @minminlittleshrimp
It seems that all apps need to configure the limit when this feature is enabled. However, there is a use case where some apps may not want to apply the trace load limit, while others prefer to use the trace load limit. For a specific project, we would like to disable the trace load limit if an app hasn’t configured the limit in the configuration file. Do you have any suggestions on how to handle this case?"

@alexmohr
Copy link
Contributor Author

HI @alexmohr @minminlittleshrimp It seems that all apps need to configure the limit when this feature is enabled. However, there is a use case where some apps may not want to apply the trace load limit, while others prefer to use the trace load limit. For a specific project, we would like to disable the trace load limit if an app hasn’t configured the limit in the configuration file. Do you have any suggestions on how to handle this case?"

Hi, you can set the defaults in cmake for user and daemon to uint32_max (2^32-1) and configure apps you want to restrict with a lower limit via the configuration file.
While technically this is not unlimited in practice you won't reach this bandwidth anyways.

@minminlittleshrimp
Copy link
Collaborator

minminlittleshrimp commented Sep 19, 2024

IMPORTANT: Build fail in local!
Root cause: Current parser is using DLT local struct, this is different than the original mechanism of file parser dlt.
Solution: @alexmohr please correct changes in the commit review: fix findings 2024/09/15 by only keep removing the ifdef redundant. To reproduce you can build with a make clean first.

diff --git a/include/dlt/dlt_common.h b/include/dlt/dlt_common.h
index 42d9563..fcc569f 100644
--- a/include/dlt/dlt_common.h
+++ b/include/dlt/dlt_common.h
@@ -833,8 +833,6 @@ extern "C"
 /* For trace load control feature */
 
 #include <pthread.h>
-/* For trace load control */
-#ifdef DLT_TRACE_LOAD_CTRL_ENABLE
 
 /* Number of slots in window for recording trace load (Default: 60)
  * Average trace load in this window will be used as trace load
@@ -876,8 +874,6 @@ extern "C"
  */
 #define DLT_TIMESTAMP_RESOLUTION          (10000)
 
-#endif
-
 typedef struct
 {
     // Window for recording total bytes for each slots [bytes]

Signed-off-by: Alexander Mohr <[email protected]>
Signed-off-by: Alexander Mohr <[email protected]>
@alexmohr alexmohr force-pushed the add-trace-load-limits branch from 8f332d0 to cf63c53 Compare September 19, 2024 08:21
@alexmohr
Copy link
Contributor Author

IMPORTANT: Build fail in local!

oops... fixed.

@svlad-90
Copy link

A live demo is great, we would like to see how this feature takes action in a heavily logging system. Could you suggest how we proceed this activity? We can invite @svlad-90 and DLT members if they have time? Thanks

I know how this feature works, so my participation in the demo is not mandatory. I've already seen it in action. ))

@minminlittleshrimp
Copy link
Collaborator

Thanks @svlad-90 for dropping fb

For the live demo DLT would like to have an insight of the feature and also the extensibility and future plan so it is optional for you.

Thanks for your participation in building up this feature and hope to cooperate with you in the future in more dlt topics.

Minh

@minminlittleshrimp minminlittleshrimp merged commit 1533d0f into COVESA:master Sep 27, 2024
3 checks passed
@minminlittleshrimp
Copy link
Collaborator

minminlittleshrimp commented Sep 27, 2024

Yesterday we already had live debug session, to me the session was good.
Hence, finally merged, thank you all for your dedication.
Regards

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Trace budgeting mechanism
5 participants