Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pdstools V4 #260

Merged
merged 171 commits into from
Oct 30, 2024
Merged

Pdstools V4 #260

merged 171 commits into from
Oct 30, 2024

Conversation

StijnKas
Copy link
Collaborator

@StijnKas StijnKas commented Sep 17, 2024

This PR contains the initial version of the pdstools V4 release. We will first release an alpha release of version 4, as this is a pretty big upgrade.

V4 brings some pretty major (and necessary) changes. A lot of them are, unfortunately, breaking - but it's for the best. pdstools is now much easier to maintain and keep consistent, and new functionality now has a much more logical place to go.

My goal is for the initial V4 release to contain most of the breaking (API-centric) changes we foresee in a long time. Then, we can of course still change the inner functionality and/or add new functions - but hopefully the most important function schemas/API don't need more changes anytime soon.

✨Highlights

  • Farewell R - you've served us well, but PDSTools is now Python only
  • Introducing the Pega DX API Client
    • Starting out with support for the 24.2 Prediction Studio and Knowledge Buddy APIs
  • Major refactor of the entire codebase: consistent python naming, optional dependency groups, well-defined typehints

❌Deprecations/removals

  • The R version of PDS Tools has been removed. In case you still want to use the R tools, you should manually clone the repo at the V3.x tag.
  • The legacy IH utilities have been dropped. These were old parts of the codebase and untested/unused. New IH utilities are on their way!

🔨Changes

  • Consistent pythonic casing, meaning PascalCase for classes & snake_case for methods, variables & arguments
  • Much improved typehints, so it's much more obvious what the response of a given function will be
  • Fewer 'base' dependencies; different functionality is split up into 'namespaces' that all have their own set of requirements
    • The first time you invoke a method in a 'namespace', it verifies the dependencies and gives a clear warning if any are missing
  • To expand on the previous point: functionality is split up much more logically. Taking the ADMDatamart class as an example:
    • Plotting functionality is part of ADMDatamart.plot.bubble_chart() (or any other plot of course)
    • The health check and other reports are part of ADMDatamart.generate.health_check() (for instance)
    • The intermediate aggregations needed are part of ADMDatamart.aggregations.pivot() (for instance)
  • Using classmethods, we can initialize the ADMDatamart class in particular in a much more flexible way.
    • The main __init__ method of the ADMDatamart class is very simple: it expects two polars.LazyFrames; one for model_data and one for prediction_data. If you've already read in your data, simply use this
    • If, instead, you want to use the previous functionality which automatically found the most recent file in a folder, you should initialize the datamart class like ADMDatamart.from_ds_export()
    • Or, if instead, you are consuming the results of a data flow (including the OOTB Prediction Studio export), you can simply initialize the datamart class like ADMDatamart.from_dataflow_export(model_data="pattern_for_model_files*.json", predictor_data="pattern_for_predictor_files*.json"). We can also cache the files we've read in before by writing to a 'cache' file automatically - this makes things move quickly. This closes Glob method for combining multiple zip files in readMultizip in Pega_IO #205 as well.

StijnKas and others added 30 commits June 18, 2024 14:23
@StijnKas StijnKas added this to the V4 milestone Oct 28, 2024
Copy link

codecov bot commented Oct 30, 2024

Codecov Report

Attention: Patch coverage is 78.30792% with 441 lines in your changes missing coverage. Please review.

Project coverage is 58.90%. Comparing base (a76a11e) to head (1025c36).
Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
python/pdstools/adm/Reports.py 21.32% 107 Missing ⚠️
python/pdstools/adm/Plots.py 84.22% 53 Missing ⚠️
python/pdstools/adm/ADMTrees.py 44.94% 49 Missing ⚠️
...ces/prediction_studio/v24_2/champion_challenger.py 78.96% 49 Missing ⚠️
python/pdstools/infinity/internal/_pagination.py 63.71% 41 Missing ⚠️
...infinity/resources/prediction_studio/async_base.py 0.00% 39 Missing ⚠️
python/pdstools/infinity/internal/_resource.py 65.21% 24 Missing ⚠️
...y/resources/prediction_studio/local_model_utils.py 90.08% 24 Missing ⚠️
python/pdstools/adm/Aggregates.py 78.40% 19 Missing ⚠️
...inity/resources/knowledge_buddy/knowledge_buddy.py 82.35% 9 Missing ⚠️
... and 9 more
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #260      +/-   ##
==========================================
+ Coverage   54.21%   58.90%   +4.69%     
==========================================
  Files          29       56      +27     
  Lines        3805     4877    +1072     
==========================================
+ Hits         2063     2873     +810     
- Misses       1742     2004     +262     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@StijnKas StijnKas merged commit 5a42b5c into master Oct 30, 2024
9 of 12 checks passed
@StijnKas StijnKas deleted the pdstools_v4 branch December 4, 2024 10:59
@StijnKas StijnKas mentioned this pull request Dec 9, 2024
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment