-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce package(s) size #87
Comments
Reducing the logo file size is relatively easy (using an online tool reduced the size by 70% to ~500KB). I also think that the size of the logo should be checked, as I found that for formatters is quite big 1299x1501 (compared it with teal logo at teal/man/figures/teal.png of 278x321 it is >20 times bigger). Dimensions of the logos found:
I think that only compressing can be enough, but we should reduce the file size of the logos for rtables and formatters. Checking for objects/datasets not used can be more complicated because it is harder to automate as they might be used internally or by other repositories, and I don't know a good method to automatically extract which objects are datasets from an R package (checking https://pharmaverse.r-universe.dev/datasets, reports 33 datasets from teal.* packages but it is harder to explore) . |
[It has been detected](insightsengineering/nestdevs-tasks#87) that the formatters logo is quite big in comparison to other packages. This PR compress the logo so it no longer takes so much of the package size.
Logos were compressed to reduce size. After the two compressors only a third of the size was left ~500Kb. As described above, logos have different dimension size, reducing it to the minimum would help too to avoid bloating packages. Manual check of datasets & usageGo to r-universe.dev/datasets and filter by "teal." => There are 33 datasets
From the datasets name (and size) there seems to be some data duplication or reexport (for example rADTTE is in 4 different packages). One way to reduce package size would be to move all these datasets to a single data package. |
@llrs-roche in the past we had all datasets in one package, but we decided to delete this package and just move data directly to packages where the data is used. That's why you see some duplications |
@llrs-roche 3 PRs where we can remove datasets becuase they all exist in teal.data already |
@llrs-roche thanks for your work on this! |
@pawelru there are ~40 logos to deal with on the organization, and I don't have a script to automate this... |
Yes - that's somewhat expected. I don't think it would be ~40 - rather closer to ~20 but still quite a lot of work. I'm aware. Hopefully not all of them would require action but I just wanted do this completely and never come back to it. |
In the spirit of package size reduction insightsengineering/nestdevs-tasks#87 I replaced all the calls for tmg datasets with their respective datasets from teal.data. This way we no longer need to have datasets copies in tmg. --------- Co-authored-by: github-actions <41898282+github-actions[bot]@users.noreply.github.com>
In the spirit of package size reduction insightsengineering/nestdevs-tasks#87 I replaced all the calls for tmg datasets with their respective datasets from teal.data. This way we no longer need to have datasets copies in tmg.
Alrighty, @pawelru and @llrs-roche I think we can reduce logos during #93 And when it comes to datasets - 3 packages can be simplified and reuse data from teal.data #87 (comment) Lastly @llrs-roche I need to know your recommendations on those formatters datasets |
In the spirit of package size reduction insightsengineering/nestdevs-tasks#87 I replaced all the calls for tmg datasets with their respective datasets from teal.data. This way we no longer need to have datasets copies in tmg. --------- Co-authored-by: github-actions <41898282+github-actions[bot]@users.noreply.github.com>
Those datasets are from formatters, and some overlap by name with those on tmc. However, between tmc and formatters there are some differences often tmc datasets have fewer rows and some have fewer columns. Sometimes a similar dataset is present on tern. formatters data is used by other packages like rlistings on the vignettes. There is no dependency controlled by us shared between formatters and tmc, so I wouldn't remove the data from either of them to add a new dependency from one to the other. This would complicate the installation for minimal gains. |
Alrighty, thanks @llrs-roche for the review. I guess we can close this issue for now? |
Yes,, we can close and keep track of the logos size on #93 |
Have a look at the
formatters
:when untared:
ex_
datasets at all. But it might be used elsewhere. There are few packages that are usingex_adlb
but I cannot find any usage ofex_advs
andex_adqs
.Adding this in general task repo to repeat the analysis for multiple packages and find a common solution.
It's important because of:
.tgz
files and order by size.The text was updated successfully, but these errors were encountered: