[feedback] Getting repos analysed #9

davidediruscio · 2020-01-06T19:03:07Z

As suggested by @phkrief , the idea of this issue is to share the experience when analysing projects with CROSSMINER, and trigger discussions to identify possible bugs/limitations on the platform. This issue is related to:

The default docker-compose may drain out the resources of a machine (even a powerful one). Thus, in order to get projects analysed, a solution (more details here) consists of:

limiting the services in the docker compose. A reduced version of the docker compose is available here
limiting the metrics provider selected when defining a task analysis
limiting the time interval when defining a task analysis

Nevertheless, in my specific case, sometimes oss-app freezed (probably due to the limitation of my machine, something @MarcioMateus agreed on) and I had to delete the oss-db container. Furthermore, also queuing new task analysis was causing the current task to stop. Thus, I was waiting for a task to finish before adding a new task. Then, I was importing the data to elasticsearch with the script available at: https://github.com/valeriocos/scava/blob/bit/web-dashboards/scava-metrics/scava2es_battery.py (which calls scava2es on a battery of repos).

With what commented above, I was able to analyze all CHAOSS repos plus puppet-elasticsearch from 01/01/2019 to 30/06/2019 using the following metrics providers:

sentiment.SentimentHistoricMetricProvider
severity.SeverityHistoricMetricProvider
severitybugstatus.SeverityBugStatusHistoricMetricProvider
severityresponsetime.SeverityResponseTimeHistoricMetricProvider
severitysentiment.SeveritySentimentHistoricMetricProvider
newbugs.NewBugsHistoricMetricProvider
comments.CommentsHistoricMetricProvider
patches.PatchesHistoricMetricProvider
emotions.EmotionsHistoricMetricProvider
docker.dependencies
docker.smells
topics.TopicsHistoricMetricProvider

As suggested by @creat89 here, I had a quick look to the metrics. I noticed that the ones related to dev dependencies metric providers (e.g. osgi and maven) seem to eat up a considerable amount of memory. I checked it by selecting https://github.com/elastic/elasticsearch as target project (from 01/01/2019 to 30/06/2019), and looking at the memory consumption with slimbook@slimbook-KATANA:~$ top

The text was updated successfully, but these errors were encountered:

davidediruscio · 2020-01-06T19:03:09Z

Hi @valeriocos, thanks for the feedback.

We are well-aware of issues regarding performance and memory consumption for large projects with some Rascal metrics (mostly dependencies + API-related ones). I started to push some fixes that make the situation better (usethesource/rascal@78e0b60) and will continue in this direction. I'm hopeful we can make it a lot better, though memory consumption will unavoidably remain an issue: analyzing large projects will always consume a lot of resources. I'll keep you updated on our progress.

davidediruscio · 2020-01-06T19:03:11Z

Thank you @tdegueul for the quick reply!

davidediruscio · 2020-01-06T19:03:12Z

Thanks a lot @valeriocos for the feedback and the hints.

I've also gathered some information in #338 (submitted in the scava repo [1]).
[1] crossminer/scava#338

Regarding performance I often have a huge load on the ci4 server (I've seen a load of 192 the other day, I didn't even know it could go that high! ;-) and the computations take a lot of time..

I've setup several workers and it works quite well, but the worker that's stuck (see the issue mentioned above) takes up a lot of resources -- and it can't be stopped through the UI. As a result the host (a 8-cpus and 64GB of RAM power-host) is always 100%. Good point: multi-threading works well.

Since we have a lot of projects to analyse this will be problematic; we've decided to use a very short range (starting from beginning of 2018) for all projects and we'll probably use your metrics list. Any other hint is welcome..

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feedback] Getting repos analysed #9

[feedback] Getting repos analysed #9

davidediruscio commented Jan 6, 2020

davidediruscio commented Jan 6, 2020

davidediruscio commented Jan 6, 2020

davidediruscio commented Jan 6, 2020