Real Storage asv perf POC #2165

grusev · 2025-02-04T07:46:42Z

Reference Issues/PRs

Main idea of this set is to provide repeatable set of performance measurments against persistent storage initially ASW S3 and later any other storage that we support.

Different types of storage provide different type of performance and characteristics. Thus having one and the same asv benchmark test to do both is far from optimal

Persistent storages provide a way to setup once the data needed for the tests (especially for read tests/ those that do not modify anything)

At the same time they should isolate that data from other other tests - that do writes by providing a temporary storage for their use

The approach to build this kind of test setup is first to separate the logic for setting up the libraries and symbols data from the actual asv. In other words each test should use anothe class which is specialized for dealing with setting up data, making some checks and wiping it out if necessary.

Having a setup class that is not part of asv test provides way to controlled setting up environment on demand and wiping it out, and all other stuff. This class can fascilitate easily that logic for all types of storages.

In this model the asv test becomes simple to write.
Most importantly the tests are now ASV independent, then can be quickly moved to other tool etc if needed

It also gives opportunity to use parts of presetup logic for tests outside of asv itself. In other words the persistent part can be shared accross different types of tests.

Adding later new storage type (azure etc) is very easy (provided that it is shared type of storage similar to aws):

extend logic in base class ...
inherit class of the primary implementation of logic AWS - that's it

What does this implement or fix?

Any other comments?

Checklist

Checklist for code changes...

Have you updated the relevant docstrings, documentation and copyright notice?
Is this contribution tested against all ArcticDB's features?
Do all exceptions introduced raise appropriate error messages?
Are API changes highlighted in the PR description?
Is the PR labelled as enhancement or bug so it appears in autogenerated release notes?

G-D-Petrov · 2025-02-04T14:54:50Z

python/arcticdb/util/utils.py

+        self.__types[name] = dtype
+        return self
+
+    def add_float_col_ex(self, name: str, min: float, max: float, round_at: int, dtype: ArcticFloatType = np.float64


there is a lot of duplicated code here, why do we need both the _col and _col_ex version of these functions?

Also we already have similar functionality here can we reuse/extend that?

you cannot have in python 2 functions with different parameter sets, hence there are 2 main scenarios for usafe float and integer function - one just get random vals (implicitly it means whole set of vals from -min to +max) and the other scenario is I need only between ceratin interval , and for floats additional opt that you do not care for more than X digits, which could be important.

for each scenario there is a separate function to limit options only to those that matters. Usually in C# etc functions that have most arguments are with "_ex" ending, here I was not exactly sure what to add. But definately from suportability perspective going away from standard python pattern where you go and handle all in code is perhaps not best

Also we already have similar functionality [here]
What I could I reused. What I should not touch I tried not to touch, instead it is more clean approach to start from clean start, especially if you look at floats ...

Overall, what is is there is something used here and there and should be change with quite much care ...

At the other hand I think we need new more systematic approach which is aligned with intelisense and grouping operations in related classes of utility functions. This would be more evident and readily available and prevent mistakes.

Therefore going forward in more complex scenarios this approach will show its benefits through faster writing tests with more steps

G-D-Petrov · 2025-02-04T15:02:50Z