Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use StackBased Vectors and Matrices in ConstitutiveLaw interface #8171

Open
RiccardoRossi opened this issue Jan 28, 2021 · 26 comments
Open

Use StackBased Vectors and Matrices in ConstitutiveLaw interface #8171

RiccardoRossi opened this issue Jan 28, 2021 · 26 comments

Comments

@RiccardoRossi
Copy link
Member

Description
In @KratosMultiphysics/technical-committee we are in principle favorable in changing the interface of ConstitutiveLaw to use StackBased vectors (of variable size) for Strain,Stress and F instead of heap allocated vector.

this would imply a API breaker change in the ConstitutiveLawParameters

to be discussed further for "details"

@loumalouomega
Copy link
Member

I am in favour. BTW I suggested something similar some time ago using bounding vectors and matrices using a template. Which class is StackBased vector of variable class?

@pooyan-dadvand
Copy link
Member

@KratosMultiphysics/technical-committee forwards this to the @KratosMultiphysics/implementation-committee

@AlejandroCornejo
Copy link
Member

AlejandroCornejo commented Jan 18, 2022

I'm pushing this issue now. I am working in the assessing-bounded-vector-in-cl branch. My idea is to initially modify the ElasticIsotropic3DLaw and see the performance change for building a case. I do this representing the @KratosMultiphysics/implementation-committee

In the meantime, I kindly ask @RiccardoRossi ,@philbucher and @loumalouomega to discuss a bit which should be the performance test to effectively see if it is worth the change or not.

Thank you!

@loumalouomega
Copy link
Member

I'm pushing this issue now. I am working in the assessing-bounded-vector-in-cl branch. My idea is to initially modify the ElasticIsotropic3DLaw and see the performance change for building a case. I do this representing the @KratosMultiphysics/implementation-committee

In the meantime, I kindly ask @RiccardoRossi ,@philbucher and @loumalouomega to discuss a bit which should be the performance test to effectively see if it is worth the change or not.

Thank you!

I am in favor of using the simplest possible, but yet complete. A patch test should be fine. And measure times with vtune (now it's free)

@AlejandroCornejo
Copy link
Member

Do you remember any exmaple that I can run? Like a test? I suppose that in c++ is better to asses performance

@loumalouomega
Copy link
Member

Do you remember any exmaple that I can run? Like a test? I suppose that in c++ is better to asses performance

I think there is already a c++ patch test, and if not it is relatively easy to create. The point what I said of doing a complete simulation is because during the short time I worked in explicit I discovered many things are called many times, that in the implicit case the penalty is small, but for explicit is a lot, and you can see that only if you do a whole simulation.

Short answer: Python will be fine, as long as we use Vtune to measure times and check differences

@RiccardoRossi
Copy link
Member Author

The proposal i made to Alejandro is to generate a few millinos of simple structural elements, and to call in parallel the function CalculateSystemContributions onto them.

i expect a nonnegligible performance uplift...

@loumalouomega
Copy link
Member

The proposal i made to Alejandro is to generate a few millinos of simple structural elements, and to call in parallel the function CalculateSystemContributions onto them.

i expect a nonnegligible performance uplift...

Yes, that will give us a hint. But for a real performance impact study we should do a complete simulation (there all the calls that are supposed to be done will be done, and we will see real performance effect). Again, this is from the perspective I acquired when working in explicit

@AlejandroCornejo
Copy link
Member

AlejandroCornejo commented Jan 19, 2022

This is the test:

// KRATOS  ___|  |                   |                   |
//       \___ \  __|  __| |   |  __| __| |   |  __| _` | |
//             | |   |    |   | (    |   |   | |   (   | |
//       _____/ \__|_|   \__,_|\___|\__|\__,_|_|  \__,_|_| MECHANICS
//
//  License:		 BSD License
//					 license: structural_mechanics_application/license.txt
//
//  Main authors:    Riccardo Rossi
//

// Project includes
#include "containers/model.h"
#include "testing/testing.h"
#include "structural_mechanics_application_variables.h"
#include "custom_elements/small_displacement.h"
#include "utilities/parallel_utilities.h"
#include "utilities/builtin_timer.h"

namespace Kratos
{
namespace Testing
{

    KRATOS_TEST_CASE_IN_SUITE(TotalLagrangian2D3N, KratosStructuralMechanicsFastSuite)
    {
        Model current_model;
        auto &r_model_part = current_model.CreateModelPart("ModelPart",1);
        r_model_part.GetProcessInfo().SetValue(DOMAIN_SIZE, 2);

        r_model_part.AddNodalSolutionStepVariable(DISPLACEMENT);
        r_model_part.AddNodalSolutionStepVariable(VOLUME_ACCELERATION);

        // Set the element properties
        auto p_elem_prop = r_model_part.CreateNewProperties(0);
        p_elem_prop->SetValue(YOUNG_MODULUS, 2.0e+06);
        p_elem_prop->SetValue(POISSON_RATIO, 0.3);
        p_elem_prop->SetValue(THICKNESS, 0.01);
        const auto &r_clone_cl = KratosComponents<ConstitutiveLaw>::Get("LinearElastic3DLaw");
        p_elem_prop->SetValue(CONSTITUTIVE_LAW, r_clone_cl.Clone());

        // Constants for the computation of the stress
        const double E = p_elem_prop->GetValue(YOUNG_MODULUS);
        const double NU = p_elem_prop->GetValue(POISSON_RATIO);

        // Create the test element
        auto p_node_1 = r_model_part.CreateNewNode(1, 0.0000000000, 0.0100000000, 0.0100000000);
        auto p_node_2 = r_model_part.CreateNewNode(2, 0.0000000000, 0.0000000000, 0.0100000000);
        auto p_node_3 = r_model_part.CreateNewNode(3, 0.0000000000, 0.0100000000, 0.0000000000);
        auto p_node_4 = r_model_part.CreateNewNode(4, 0.0100000000, 0.0100000000, 0.0100000000);
        auto p_node_5 = r_model_part.CreateNewNode(5, 0.0100000000, 0.0000000000, 0.0100000000);
        auto p_node_6 = r_model_part.CreateNewNode(6, 0.0100000000, 0.0100000000, 0.0000000000);
        auto p_node_7 = r_model_part.CreateNewNode(7, 0.0000000000, 0.0000000000, 0.0000000000);
        auto p_node_8 = r_model_part.CreateNewNode(8, 0.0100000000, 0.0000000000, 0.0000000000);

        for (auto& r_node : r_model_part.Nodes()){
            r_node.AddDof(DISPLACEMENT_X);
            r_node.AddDof(DISPLACEMENT_Y);
            r_node.AddDof(DISPLACEMENT_Z);
        }

        std::vector<ModelPart::IndexType> element_nodes {4,1,3,6,5,2,7,8};
        for (int i = 1; i < 1e7; i++) // we create 1M elements
            auto p_element = r_model_part.CreateNewElement("SmallDisplacementElement3D8N", i, element_nodes, p_elem_prop);

        for (auto& r_elem : r_model_part.Elements()){
            r_elem.Initialize(r_model_part.GetProcessInfo());
        }

        struct my_tls {
            Vector mVec;
            Matrix mMat;
        };
        const auto& const_procinfo_ref = r_model_part.GetProcessInfo();

        BuiltinTimer setup_system_time;
        block_for_each(r_model_part.Elements(), my_tls(),  [&const_procinfo_ref](Element& r_elem, my_tls & MyTls) {
            r_elem.CalculateLocalSystem(MyTls.mMat, MyTls.mVec, const_procinfo_ref);
        });
        std::cout << "Build Time: " << setup_system_time.ElapsedSeconds() << std::endl;
    }
}
}

@loumalouomega
Copy link
Member

This is the test:

As I said, in this test we don't fully see the influence. It will be nice to also measure differences with a full simulation, considering numerous elements

@loumalouomega
Copy link
Member

This is the test:

As I said, in this test we don't fully see the influence. It will be nice to also measure differences with a full simulation, considering numerous elements

BTW, this is using a quite simple CL, maybe in more complex laws the difference is more significant. (5% in a comment you removed)

@loumalouomega
Copy link
Member

@KratosMultiphysics/implementation-committee decided that in order to properly track the performance of the changes, we should in first place implement a proper benchmark utility (@matekelemen has some works in this regard). Then @AlejandroCornejo could test the performance changes in the simplest linear CL.

@AlejandroCornejo
Copy link
Member

Just to complement, I did the test but with Visual Studio compiler... I should redo in Linux

@loumalouomega
Copy link
Member

@KratosMultiphysics/implementation-committee decided that in order to properly track the performance of the changes, we should in first place implement a proper benchmark utility (@matekelemen has some works in this regard). Then @AlejandroCornejo could test the performance changes in the simplest linear CL.

We also spoke about declaring some member variables as static in order to reduce allocation/deallocation time. Not 100% related, but interesting.

@AlejandroCornejo
Copy link
Member

AlejandroCornejo commented Jan 3, 2023

I have been working on this today (assessing-bounded-vector-in-cl branch) and I could draw the following conclusions:

Compiling in linux by means of WSL Ubuntu

My test consists in creating in c++ 1M linear hexas with elastic 3D law and call the CalculateLocalSystem and check the build time.

  • Using Matrix and Vector, the build time is on average 6.4 s. (noalias( rLeftHandSideMatrix ) += IntegrationWeight * prod( trans( rB ), Matrix(prod(rD, rB)));)
  • Using Bounded Vector and Bounded Matrix (predefined size for this particular case) 2.9 s

With the Bounded I did the multiplications as:

    BoundedMatrix<double, 6, 24> aux;
    BoundedMatrix<double, 24, 6> aux2;
    noalias(aux2) = trans(rB);
    noalias(aux) = prod(IntegrationWeight *rD, rB);
    noalias( rLeftHandSideMatrix ) += prod(aux2, aux);

This means that we can reduce build times in about a 50%!!

@AlejandroCornejo
Copy link
Member

I have also seen that the matrix.clear() is faster than noalias(matrix) = ZeroMAtrix(); Can this be?

@loumalouomega
Copy link
Member

I have been working on this today (assessing-bounded-vector-in-cl branch) and I could draw the following conclusions:

Compiling in linux by means of WSL Ubuntu

My test consists in creating in c++ 1M linear hexas with elastic 3D law and call the CalculateLocalSystem and check the build time.

* Using `Matrix` and `Vector`, the build time is on average **6.4 s.** `(noalias( rLeftHandSideMatrix ) += IntegrationWeight * prod( trans( rB ), Matrix(prod(rD, rB)));)`

* Using `Bounded Vector` and `Bounded Matrix` (predefined size for this particular case) **2.9 s**

With the Bounded I did the multiplications as:

    BoundedMatrix<double, 6, 24> aux;
    BoundedMatrix<double, 24, 6> aux2;
    noalias(aux2) = trans(rB);
    noalias(aux) = prod(IntegrationWeight *rD, rB);
    noalias( rLeftHandSideMatrix ) += prod(aux2, aux);

This means that we can reduce build times in about a 50%!!

This would be even faster using the MathUtils operation

@AlejandroCornejo
Copy link
Member

This was my idea too, reality is that is by far slower! :S

@loumalouomega
Copy link
Member

I have also seen that the matrix.clear() is faster than noalias(matrix) = ZeroMAtrix(); Can this be?

Yes, because clear is an internal operation and in the other case you are creating a new matrix and copying values

@loumalouomega
Copy link
Member

This was my idea too, reality is that is by far slower! :S

This is strange...

@AlejandroCornejo
Copy link
Member

With sympy generation of the code (BtDB) is also slightly slower than the bounded matrix multiplication

@loumalouomega
Copy link
Member

With sympy generation of the code (BtDB) is also slightly slower than the bounded matrix multiplication

May be related with some extension optimization of Ublas...

@pooyan-dadvand
Copy link
Member

Because in both cases you are not using the vectorization but with prod you may use it depending on size of the matrices

@rubenzorrilla
Copy link
Member

@KratosMultiphysics/implementation-committee from the @KratosMultiphysics/technical-committee we would like to know how this is evolving. Thanks.

@AlejandroCornejo
Copy link
Member

AlejandroCornejo commented Feb 24, 2023

Well I am mostly in charge of this task within the @KratosMultiphysics/implementation-committee . I draw some conclusions and I would be happy to have a discussion regarding this in a meeting to know if this is something to be done or not.

@pooyan-dadvand
Copy link
Member

@KratosMultiphysics/technical-committee delegates this to @RiccardoRossi.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: 🏗 In progress
Development

No branches or pull requests

5 participants