-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feedback] Getting repos analysed #9
Comments
Hi @valeriocos, thanks for the feedback. We are well-aware of issues regarding performance and memory consumption for large projects with some Rascal metrics (mostly dependencies + API-related ones). I started to push some fixes that make the situation better (usethesource/rascal@78e0b60) and will continue in this direction. I'm hopeful we can make it a lot better, though memory consumption will unavoidably remain an issue: analyzing large projects will always consume a lot of resources. I'll keep you updated on our progress. |
Thank you @tdegueul for the quick reply! |
Thanks a lot @valeriocos for the feedback and the hints. I've also gathered some information in #338 (submitted in the scava repo [1]). Regarding performance I often have a huge load on the ci4 server (I've seen a load of 192 the other day, I didn't even know it could go that high! ;-) and the computations take a lot of time.. I've setup several workers and it works quite well, but the worker that's stuck (see the issue mentioned above) takes up a lot of resources -- and it can't be stopped through the UI. As a result the host (a 8-cpus and 64GB of RAM power-host) is always 100%. Good point: multi-threading works well. Since we have a lot of projects to analyse this will be problematic; we've decided to use a very short range (starting from beginning of 2018) for all projects and we'll probably use your metrics list. Any other hint is welcome.. |
As suggested by @phkrief , the idea of this issue is to share the experience when analysing projects with CROSSMINER, and trigger discussions to identify possible bugs/limitations on the platform. This issue is related to:
The default docker-compose may drain out the resources of a machine (even a powerful one). Thus, in order to get projects analysed, a solution (more details here) consists of:
Nevertheless, in my specific case, sometimes
oss-app
freezed (probably due to the limitation of my machine, something @MarcioMateus agreed on) and I had to delete theoss-db
container. Furthermore, also queuing new task analysis was causing the current task to stop. Thus, I was waiting for a task to finish before adding a new task. Then, I was importing the data to elasticsearch with the script available at: https://github.com/valeriocos/scava/blob/bit/web-dashboards/scava-metrics/scava2es_battery.py (which calls scava2es on a battery of repos).With what commented above, I was able to analyze all CHAOSS repos plus puppet-elasticsearch from 01/01/2019 to 30/06/2019 using the following metrics providers:
As suggested by @creat89 here, I had a quick look to the metrics. I noticed that the ones related to dev dependencies metric providers (e.g. osgi and maven) seem to eat up a considerable amount of memory. I checked it by selecting https://github.com/elastic/elasticsearch as target project (from 01/01/2019 to 30/06/2019), and looking at the memory consumption with
slimbook@slimbook-KATANA:~$ top
The text was updated successfully, but these errors were encountered: