Use StackBased Vectors and Matrices in ConstitutiveLaw interface #8171

RiccardoRossi · 2021-01-28T11:07:25Z

Description
In @KratosMultiphysics/technical-committee we are in principle favorable in changing the interface of ConstitutiveLaw to use StackBased vectors (of variable size) for Strain,Stress and F instead of heap allocated vector.

this would imply a API breaker change in the ConstitutiveLawParameters

to be discussed further for "details"

loumalouomega · 2021-01-28T11:18:10Z

I am in favour. BTW I suggested something similar some time ago using bounding vectors and matrices using a template. Which class is StackBased vector of variable class?

pooyan-dadvand · 2021-11-15T09:25:55Z

@KratosMultiphysics/technical-committee forwards this to the @KratosMultiphysics/implementation-committee

AlejandroCornejo · 2022-01-18T10:58:09Z

I'm pushing this issue now. I am working in the assessing-bounded-vector-in-cl branch. My idea is to initially modify the ElasticIsotropic3DLaw and see the performance change for building a case. I do this representing the @KratosMultiphysics/implementation-committee

In the meantime, I kindly ask @RiccardoRossi ,@philbucher and @loumalouomega to discuss a bit which should be the performance test to effectively see if it is worth the change or not.

Thank you!

loumalouomega · 2022-01-18T15:47:10Z

I'm pushing this issue now. I am working in the assessing-bounded-vector-in-cl branch. My idea is to initially modify the ElasticIsotropic3DLaw and see the performance change for building a case. I do this representing the @KratosMultiphysics/implementation-committee

In the meantime, I kindly ask @RiccardoRossi ,@philbucher and @loumalouomega to discuss a bit which should be the performance test to effectively see if it is worth the change or not.

Thank you!

I am in favor of using the simplest possible, but yet complete. A patch test should be fine. And measure times with vtune (now it's free)

AlejandroCornejo · 2022-01-18T15:57:13Z

Do you remember any exmaple that I can run? Like a test? I suppose that in c++ is better to asses performance

loumalouomega · 2022-01-18T18:00:10Z

Do you remember any exmaple that I can run? Like a test? I suppose that in c++ is better to asses performance

I think there is already a c++ patch test, and if not it is relatively easy to create. The point what I said of doing a complete simulation is because during the short time I worked in explicit I discovered many things are called many times, that in the implicit case the penalty is small, but for explicit is a lot, and you can see that only if you do a whole simulation.

Short answer: Python will be fine, as long as we use Vtune to measure times and check differences

RiccardoRossi · 2022-01-19T09:09:22Z

The proposal i made to Alejandro is to generate a few millinos of simple structural elements, and to call in parallel the function CalculateSystemContributions onto them.

i expect a nonnegligible performance uplift...

loumalouomega · 2022-01-19T13:20:41Z

The proposal i made to Alejandro is to generate a few millinos of simple structural elements, and to call in parallel the function CalculateSystemContributions onto them.

i expect a nonnegligible performance uplift...

Yes, that will give us a hint. But for a real performance impact study we should do a complete simulation (there all the calls that are supposed to be done will be done, and we will see real performance effect). Again, this is from the perspective I acquired when working in explicit

AlejandroCornejo · 2022-01-19T14:54:30Z

This is the test:

// KRATOS  ___|  |                   |                   |
//       \___ \  __|  __| |   |  __| __| |   |  __| _` | |
//             | |   |    |   | (    |   |   | |   (   | |
//       _____/ \__|_|   \__,_|\___|\__|\__,_|_|  \__,_|_| MECHANICS
//
//  License:		 BSD License
//					 license: structural_mechanics_application/license.txt
//
//  Main authors:    Riccardo Rossi
//

// Project includes
#include "containers/model.h"
#include "testing/testing.h"
#include "structural_mechanics_application_variables.h"
#include "custom_elements/small_displacement.h"
#include "utilities/parallel_utilities.h"
#include "utilities/builtin_timer.h"

namespace Kratos
{
namespace Testing
{

    KRATOS_TEST_CASE_IN_SUITE(TotalLagrangian2D3N, KratosStructuralMechanicsFastSuite)
    {
        Model current_model;
        auto &r_model_part = current_model.CreateModelPart("ModelPart",1);
        r_model_part.GetProcessInfo().SetValue(DOMAIN_SIZE, 2);

        r_model_part.AddNodalSolutionStepVariable(DISPLACEMENT);
        r_model_part.AddNodalSolutionStepVariable(VOLUME_ACCELERATION);

        // Set the element properties
        auto p_elem_prop = r_model_part.CreateNewProperties(0);
        p_elem_prop->SetValue(YOUNG_MODULUS, 2.0e+06);
        p_elem_prop->SetValue(POISSON_RATIO, 0.3);
        p_elem_prop->SetValue(THICKNESS, 0.01);
        const auto &r_clone_cl = KratosComponents<ConstitutiveLaw>::Get("LinearElastic3DLaw");
        p_elem_prop->SetValue(CONSTITUTIVE_LAW, r_clone_cl.Clone());

        // Constants for the computation of the stress
        const double E = p_elem_prop->GetValue(YOUNG_MODULUS);
        const double NU = p_elem_prop->GetValue(POISSON_RATIO);

        // Create the test element
        auto p_node_1 = r_model_part.CreateNewNode(1, 0.0000000000, 0.0100000000, 0.0100000000);
        auto p_node_2 = r_model_part.CreateNewNode(2, 0.0000000000, 0.0000000000, 0.0100000000);
        auto p_node_3 = r_model_part.CreateNewNode(3, 0.0000000000, 0.0100000000, 0.0000000000);
        auto p_node_4 = r_model_part.CreateNewNode(4, 0.0100000000, 0.0100000000, 0.0100000000);
        auto p_node_5 = r_model_part.CreateNewNode(5, 0.0100000000, 0.0000000000, 0.0100000000);
        auto p_node_6 = r_model_part.CreateNewNode(6, 0.0100000000, 0.0100000000, 0.0000000000);
        auto p_node_7 = r_model_part.CreateNewNode(7, 0.0000000000, 0.0000000000, 0.0000000000);
        auto p_node_8 = r_model_part.CreateNewNode(8, 0.0100000000, 0.0000000000, 0.0000000000);

        for (auto& r_node : r_model_part.Nodes()){
            r_node.AddDof(DISPLACEMENT_X);
            r_node.AddDof(DISPLACEMENT_Y);
            r_node.AddDof(DISPLACEMENT_Z);
        }

        std::vector<ModelPart::IndexType> element_nodes {4,1,3,6,5,2,7,8};
        for (int i = 1; i < 1e7; i++) // we create 1M elements
            auto p_element = r_model_part.CreateNewElement("SmallDisplacementElement3D8N", i, element_nodes, p_elem_prop);

        for (auto& r_elem : r_model_part.Elements()){
            r_elem.Initialize(r_model_part.GetProcessInfo());
        }

        struct my_tls {
            Vector mVec;
            Matrix mMat;
        };
        const auto& const_procinfo_ref = r_model_part.GetProcessInfo();

        BuiltinTimer setup_system_time;
        block_for_each(r_model_part.Elements(), my_tls(),  [&const_procinfo_ref](Element& r_elem, my_tls & MyTls) {
            r_elem.CalculateLocalSystem(MyTls.mMat, MyTls.mVec, const_procinfo_ref);
        });
        std::cout << "Build Time: " << setup_system_time.ElapsedSeconds() << std::endl;
    }
}
}

loumalouomega · 2022-01-19T18:29:52Z

This is the test:

As I said, in this test we don't fully see the influence. It will be nice to also measure differences with a full simulation, considering numerous elements

loumalouomega · 2022-01-19T18:31:07Z

This is the test:

As I said, in this test we don't fully see the influence. It will be nice to also measure differences with a full simulation, considering numerous elements

BTW, this is using a quite simple CL, maybe in more complex laws the difference is more significant. (5% in a comment you removed)

loumalouomega · 2022-11-16T14:42:45Z

@KratosMultiphysics/implementation-committee decided that in order to properly track the performance of the changes, we should in first place implement a proper benchmark utility (@matekelemen has some works in this regard). Then @AlejandroCornejo could test the performance changes in the simplest linear CL.

AlejandroCornejo · 2022-11-16T14:43:46Z

Just to complement, I did the test but with Visual Studio compiler... I should redo in Linux

loumalouomega · 2022-11-16T14:44:18Z

@KratosMultiphysics/implementation-committee decided that in order to properly track the performance of the changes, we should in first place implement a proper benchmark utility (@matekelemen has some works in this regard). Then @AlejandroCornejo could test the performance changes in the simplest linear CL.

We also spoke about declaring some member variables as static in order to reduce allocation/deallocation time. Not 100% related, but interesting.

AlejandroCornejo · 2023-01-03T07:54:02Z

I have been working on this today (assessing-bounded-vector-in-cl branch) and I could draw the following conclusions:

Compiling in linux by means of WSL Ubuntu

My test consists in creating in c++ 1M linear hexas with elastic 3D law and call the CalculateLocalSystem and check the build time.

Using Matrix and Vector, the build time is on average 6.4 s. (noalias( rLeftHandSideMatrix ) += IntegrationWeight * prod( trans( rB ), Matrix(prod(rD, rB)));)
Using Bounded Vector and Bounded Matrix (predefined size for this particular case) 2.9 s

With the Bounded I did the multiplications as:

    BoundedMatrix<double, 6, 24> aux;
    BoundedMatrix<double, 24, 6> aux2;
    noalias(aux2) = trans(rB);
    noalias(aux) = prod(IntegrationWeight *rD, rB);
    noalias( rLeftHandSideMatrix ) += prod(aux2, aux);

This means that we can reduce build times in about a 50%!!

AlejandroCornejo · 2023-01-03T08:05:35Z

I have also seen that the matrix.clear() is faster than noalias(matrix) = ZeroMAtrix(); Can this be?

loumalouomega · 2023-01-03T08:07:20Z

I have been working on this today (assessing-bounded-vector-in-cl branch) and I could draw the following conclusions:

Compiling in linux by means of WSL Ubuntu

My test consists in creating in c++ 1M linear hexas with elastic 3D law and call the CalculateLocalSystem and check the build time.
* Using `Matrix` and `Vector`, the build time is on average **6.4 s.** `(noalias( rLeftHandSideMatrix ) += IntegrationWeight * prod( trans( rB ), Matrix(prod(rD, rB)));)`

* Using `Bounded Vector` and `Bounded Matrix` (predefined size for this particular case) **2.9 s**
With the Bounded I did the multiplications as:
    BoundedMatrix<double, 6, 24> aux;
    BoundedMatrix<double, 24, 6> aux2;
    noalias(aux2) = trans(rB);
    noalias(aux) = prod(IntegrationWeight *rD, rB);
    noalias( rLeftHandSideMatrix ) += prod(aux2, aux);
This means that we can reduce build times in about a 50%!!

This would be even faster using the MathUtils operation

AlejandroCornejo · 2023-01-03T08:08:15Z

This was my idea too, reality is that is by far slower! :S

loumalouomega · 2023-01-03T08:09:14Z

I have also seen that the matrix.clear() is faster than noalias(matrix) = ZeroMAtrix(); Can this be?

Yes, because clear is an internal operation and in the other case you are creating a new matrix and copying values

loumalouomega · 2023-01-03T08:09:28Z

This was my idea too, reality is that is by far slower! :S

This is strange...

AlejandroCornejo · 2023-01-03T08:11:31Z

With sympy generation of the code (BtDB) is also slightly slower than the bounded matrix multiplication

loumalouomega · 2023-01-03T08:14:07Z

With sympy generation of the code (BtDB) is also slightly slower than the bounded matrix multiplication

May be related with some extension optimization of Ublas...

pooyan-dadvand · 2023-01-13T08:39:59Z

Because in both cases you are not using the vectorization but with prod you may use it depending on size of the matrices

rubenzorrilla · 2023-02-24T11:17:24Z

@KratosMultiphysics/implementation-committee from the @KratosMultiphysics/technical-committee we would like to know how this is evolving. Thanks.

AlejandroCornejo · 2023-02-24T11:27:57Z

Well I am mostly in charge of this task within the @KratosMultiphysics/implementation-committee . I draw some conclusions and I would be happy to have a discussion regarding this in a meeting to know if this is something to be done or not.

pooyan-dadvand · 2024-11-18T10:27:37Z

@KratosMultiphysics/technical-committee delegates this to @RiccardoRossi.

RiccardoRossi added Enhancement Discussion API Breaker labels Jan 28, 2021

RiccardoRossi assigned RiccardoRossi and AlejandroCornejo Jan 28, 2021

roigcarlo added this to Technical Commiittee Oct 30, 2023

roigcarlo moved this to 🆕 New in Technical Commiittee Oct 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use StackBased Vectors and Matrices in ConstitutiveLaw interface #8171

Use StackBased Vectors and Matrices in ConstitutiveLaw interface #8171

RiccardoRossi commented Jan 28, 2021

loumalouomega commented Jan 28, 2021

pooyan-dadvand commented Nov 15, 2021

AlejandroCornejo commented Jan 18, 2022 •

edited

Loading

loumalouomega commented Jan 18, 2022

AlejandroCornejo commented Jan 18, 2022

loumalouomega commented Jan 18, 2022

RiccardoRossi commented Jan 19, 2022

loumalouomega commented Jan 19, 2022

AlejandroCornejo commented Jan 19, 2022 •

edited

Loading

loumalouomega commented Jan 19, 2022

loumalouomega commented Jan 19, 2022

loumalouomega commented Nov 16, 2022

AlejandroCornejo commented Nov 16, 2022

loumalouomega commented Nov 16, 2022

AlejandroCornejo commented Jan 3, 2023 •

edited

Loading

AlejandroCornejo commented Jan 3, 2023

loumalouomega commented Jan 3, 2023

AlejandroCornejo commented Jan 3, 2023

loumalouomega commented Jan 3, 2023

loumalouomega commented Jan 3, 2023

AlejandroCornejo commented Jan 3, 2023

loumalouomega commented Jan 3, 2023

pooyan-dadvand commented Jan 13, 2023

rubenzorrilla commented Feb 24, 2023

AlejandroCornejo commented Feb 24, 2023 •

edited

Loading

pooyan-dadvand commented Nov 18, 2024

Use StackBased Vectors and Matrices in ConstitutiveLaw interface #8171

Use StackBased Vectors and Matrices in ConstitutiveLaw interface #8171

Comments

RiccardoRossi commented Jan 28, 2021

loumalouomega commented Jan 28, 2021

pooyan-dadvand commented Nov 15, 2021

AlejandroCornejo commented Jan 18, 2022 • edited Loading

loumalouomega commented Jan 18, 2022

AlejandroCornejo commented Jan 18, 2022

loumalouomega commented Jan 18, 2022

RiccardoRossi commented Jan 19, 2022

loumalouomega commented Jan 19, 2022

AlejandroCornejo commented Jan 19, 2022 • edited Loading

loumalouomega commented Jan 19, 2022

loumalouomega commented Jan 19, 2022

loumalouomega commented Nov 16, 2022

AlejandroCornejo commented Nov 16, 2022

loumalouomega commented Nov 16, 2022

AlejandroCornejo commented Jan 3, 2023 • edited Loading

AlejandroCornejo commented Jan 3, 2023

loumalouomega commented Jan 3, 2023

AlejandroCornejo commented Jan 3, 2023

loumalouomega commented Jan 3, 2023

loumalouomega commented Jan 3, 2023

AlejandroCornejo commented Jan 3, 2023

loumalouomega commented Jan 3, 2023

pooyan-dadvand commented Jan 13, 2023

rubenzorrilla commented Feb 24, 2023

AlejandroCornejo commented Feb 24, 2023 • edited Loading

pooyan-dadvand commented Nov 18, 2024

AlejandroCornejo commented Jan 18, 2022 •

edited

Loading

AlejandroCornejo commented Jan 19, 2022 •

edited

Loading

AlejandroCornejo commented Jan 3, 2023 •

edited

Loading

AlejandroCornejo commented Feb 24, 2023 •

edited

Loading