Skip to content

Pega Data Scientist Tools V4

Compare
Choose a tag to compare
@StijnKas StijnKas released this 09 Dec 15:47
· 72 commits to master since this release

This release of pdstools is a big cleanup from version 3. A lot of changes are breaking - but that's for the best: pdstools is now much easier to maintain, new functionality has a more logical place to go, and the API should be a lot more intuitive. The goal is for the initial V4 release to contain most of the breaking API changes we foresee in a long time. Then, we can of course still change the inner functionality and/or add new functions - but hopefully the most important function schemas/API don't need more changes anytime soon.

✨Highlights

  • Farewell R - you've served us well, but pdstools is now Python only
  • Introducing the Pega DX API Client
    • Starting out with support for the 24.2 Prediction Studio and Knowledge Buddy APIs
  • Major refactor of the entire codebase: consistent python naming, optional dependency groups, well-defined typehints

❌Deprecations/removals

  • The R version of pdstools has been removed. In case you still want to use the R tools, you should manually clone the repo at the V3.x tag.
  • The legacy IH utilities have been dropped. These were old parts of the codebase and untested/unused. New IH utilities are on their way!
  • The Wiki documentation has been ported to the (tracked) Python documentation. We'll deprecate the wiki, but keep it live to give external links some time to link to the documentation instead.

🔨Changes

  • Consistent pythonic casing, meaning PascalCase for classes & snake_case for methods, variables & arguments
  • Much improved typehints, so it's much more obvious what the response of a given function will be
  • Fewer 'base' dependencies; different functionality is split up into 'namespaces' that all have their own set of requirements
    • The first time you invoke a method in a 'namespace', it verifies the dependencies and gives a clear warning if any are missing
  • To expand on the previous point: functionality is split up much more logically. Taking the ADMDatamart class as an example:
    • Plotting functionality is part of ADMDatamart.plot.bubble_chart() (or any other plot of course)
    • The health check and other reports are part of ADMDatamart.generate.health_check() (for instance)
    • The intermediate aggregations needed are part of ADMDatamart.aggregations.pivot() (for instance)
  • Using classmethods, we can initialize the ADMDatamart class in particular in a much more flexible way.
    • The main __init__ method of the ADMDatamart class is very simple: it expects two polars.LazyFrames; one for model_data and one for prediction_data. If you've already read in your data, simply use this
    • If, instead, you want to use the previous functionality which automatically found the most recent file in a folder, you should initialize the datamart class like ADMDatamart.from_ds_export()
    • Or, if instead, you are consuming the results of a data flow (including the OOTB Prediction Studio export), you can simply initialize the datamart class like ADMDatamart.from_dataflow_export(model_data="pattern_for_model_files*.json", predictor_data="pattern_for_predictor_files*.json"). We can also cache the files we've read in before by writing to a 'cache' file automatically - this makes things move quickly. This closes #205 as well.

Full Changelog: V3.5.2...V4.0.0