Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feedback] Getting repos analysed #9

Open
davidediruscio opened this issue Jan 6, 2020 · 3 comments
Open

[feedback] Getting repos analysed #9

davidediruscio opened this issue Jan 6, 2020 · 3 comments

Comments

@davidediruscio
Copy link

As suggested by @phkrief , the idea of this issue is to share the experience when analysing projects with CROSSMINER, and trigger discussions to identify possible bugs/limitations on the platform. This issue is related to:

The default docker-compose may drain out the resources of a machine (even a powerful one). Thus, in order to get projects analysed, a solution (more details here) consists of:

  • limiting the services in the docker compose. A reduced version of the docker compose is available here
  • limiting the metrics provider selected when defining a task analysis
  • limiting the time interval when defining a task analysis

Nevertheless, in my specific case, sometimes oss-app freezed (probably due to the limitation of my machine, something @MarcioMateus agreed on) and I had to delete the oss-db container. Furthermore, also queuing new task analysis was causing the current task to stop. Thus, I was waiting for a task to finish before adding a new task. Then, I was importing the data to elasticsearch with the script available at: https://github.com/valeriocos/scava/blob/bit/web-dashboards/scava-metrics/scava2es_battery.py (which calls scava2es on a battery of repos).

With what commented above, I was able to analyze all CHAOSS repos plus puppet-elasticsearch from 01/01/2019 to 30/06/2019 using the following metrics providers:

sentiment.SentimentHistoricMetricProvider
severity.SeverityHistoricMetricProvider
severitybugstatus.SeverityBugStatusHistoricMetricProvider
severityresponsetime.SeverityResponseTimeHistoricMetricProvider
severitysentiment.SeveritySentimentHistoricMetricProvider
newbugs.NewBugsHistoricMetricProvider
comments.CommentsHistoricMetricProvider
patches.PatchesHistoricMetricProvider
emotions.EmotionsHistoricMetricProvider
docker.dependencies
docker.smells
topics.TopicsHistoricMetricProvider

As suggested by @creat89 here, I had a quick look to the metrics. I noticed that the ones related to dev dependencies metric providers (e.g. osgi and maven) seem to eat up a considerable amount of memory. I checked it by selecting https://github.com/elastic/elasticsearch as target project (from 01/01/2019 to 30/06/2019), and looking at the memory consumption with slimbook@slimbook-KATANA:~$ top

@davidediruscio
Copy link
Author

Hi @valeriocos, thanks for the feedback.

We are well-aware of issues regarding performance and memory consumption for large projects with some Rascal metrics (mostly dependencies + API-related ones). I started to push some fixes that make the situation better (usethesource/rascal@78e0b60) and will continue in this direction. I'm hopeful we can make it a lot better, though memory consumption will unavoidably remain an issue: analyzing large projects will always consume a lot of resources. I'll keep you updated on our progress.

@davidediruscio
Copy link
Author

Thank you @tdegueul for the quick reply!

@davidediruscio
Copy link
Author

Thanks a lot @valeriocos for the feedback and the hints.

I've also gathered some information in #338 (submitted in the scava repo [1]).
[1] crossminer/scava#338

Regarding performance I often have a huge load on the ci4 server (I've seen a load of 192 the other day, I didn't even know it could go that high! ;-) and the computations take a lot of time..

I've setup several workers and it works quite well, but the worker that's stuck (see the issue mentioned above) takes up a lot of resources -- and it can't be stopped through the UI. As a result the host (a 8-cpus and 64GB of RAM power-host) is always 100%. Good point: multi-threading works well.

Capture du 2019-08-29 17-39-45

Since we have a lot of projects to analyse this will be problematic; we've decided to use a very short range (starting from beginning of 2018) for all projects and we'll probably use your metrics list. Any other hint is welcome..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant