-
Notifications
You must be signed in to change notification settings - Fork 248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use StackBased Vectors and Matrices in ConstitutiveLaw interface #8171
Comments
I am in favour. BTW I suggested something similar some time ago using bounding vectors and matrices using a template. Which class is StackBased vector of variable class? |
@KratosMultiphysics/technical-committee forwards this to the @KratosMultiphysics/implementation-committee |
I'm pushing this issue now. I am working in the In the meantime, I kindly ask @RiccardoRossi ,@philbucher and @loumalouomega to discuss a bit which should be the performance test to effectively see if it is worth the change or not. Thank you! |
I am in favor of using the simplest possible, but yet complete. A patch test should be fine. And measure times with vtune (now it's free) |
Do you remember any exmaple that I can run? Like a test? I suppose that in c++ is better to asses performance |
I think there is already a c++ patch test, and if not it is relatively easy to create. The point what I said of doing a complete simulation is because during the short time I worked in explicit I discovered many things are called many times, that in the implicit case the penalty is small, but for explicit is a lot, and you can see that only if you do a whole simulation. Short answer: Python will be fine, as long as we use Vtune to measure times and check differences |
The proposal i made to Alejandro is to generate a few millinos of simple structural elements, and to call in parallel the function CalculateSystemContributions onto them. i expect a nonnegligible performance uplift... |
Yes, that will give us a hint. But for a real performance impact study we should do a complete simulation (there all the calls that are supposed to be done will be done, and we will see real performance effect). Again, this is from the perspective I acquired when working in explicit |
This is the test: // KRATOS ___| | | |
// \___ \ __| __| | | __| __| | | __| _` | |
// | | | | | ( | | | | ( | |
// _____/ \__|_| \__,_|\___|\__|\__,_|_| \__,_|_| MECHANICS
//
// License: BSD License
// license: structural_mechanics_application/license.txt
//
// Main authors: Riccardo Rossi
//
// Project includes
#include "containers/model.h"
#include "testing/testing.h"
#include "structural_mechanics_application_variables.h"
#include "custom_elements/small_displacement.h"
#include "utilities/parallel_utilities.h"
#include "utilities/builtin_timer.h"
namespace Kratos
{
namespace Testing
{
KRATOS_TEST_CASE_IN_SUITE(TotalLagrangian2D3N, KratosStructuralMechanicsFastSuite)
{
Model current_model;
auto &r_model_part = current_model.CreateModelPart("ModelPart",1);
r_model_part.GetProcessInfo().SetValue(DOMAIN_SIZE, 2);
r_model_part.AddNodalSolutionStepVariable(DISPLACEMENT);
r_model_part.AddNodalSolutionStepVariable(VOLUME_ACCELERATION);
// Set the element properties
auto p_elem_prop = r_model_part.CreateNewProperties(0);
p_elem_prop->SetValue(YOUNG_MODULUS, 2.0e+06);
p_elem_prop->SetValue(POISSON_RATIO, 0.3);
p_elem_prop->SetValue(THICKNESS, 0.01);
const auto &r_clone_cl = KratosComponents<ConstitutiveLaw>::Get("LinearElastic3DLaw");
p_elem_prop->SetValue(CONSTITUTIVE_LAW, r_clone_cl.Clone());
// Constants for the computation of the stress
const double E = p_elem_prop->GetValue(YOUNG_MODULUS);
const double NU = p_elem_prop->GetValue(POISSON_RATIO);
// Create the test element
auto p_node_1 = r_model_part.CreateNewNode(1, 0.0000000000, 0.0100000000, 0.0100000000);
auto p_node_2 = r_model_part.CreateNewNode(2, 0.0000000000, 0.0000000000, 0.0100000000);
auto p_node_3 = r_model_part.CreateNewNode(3, 0.0000000000, 0.0100000000, 0.0000000000);
auto p_node_4 = r_model_part.CreateNewNode(4, 0.0100000000, 0.0100000000, 0.0100000000);
auto p_node_5 = r_model_part.CreateNewNode(5, 0.0100000000, 0.0000000000, 0.0100000000);
auto p_node_6 = r_model_part.CreateNewNode(6, 0.0100000000, 0.0100000000, 0.0000000000);
auto p_node_7 = r_model_part.CreateNewNode(7, 0.0000000000, 0.0000000000, 0.0000000000);
auto p_node_8 = r_model_part.CreateNewNode(8, 0.0100000000, 0.0000000000, 0.0000000000);
for (auto& r_node : r_model_part.Nodes()){
r_node.AddDof(DISPLACEMENT_X);
r_node.AddDof(DISPLACEMENT_Y);
r_node.AddDof(DISPLACEMENT_Z);
}
std::vector<ModelPart::IndexType> element_nodes {4,1,3,6,5,2,7,8};
for (int i = 1; i < 1e7; i++) // we create 1M elements
auto p_element = r_model_part.CreateNewElement("SmallDisplacementElement3D8N", i, element_nodes, p_elem_prop);
for (auto& r_elem : r_model_part.Elements()){
r_elem.Initialize(r_model_part.GetProcessInfo());
}
struct my_tls {
Vector mVec;
Matrix mMat;
};
const auto& const_procinfo_ref = r_model_part.GetProcessInfo();
BuiltinTimer setup_system_time;
block_for_each(r_model_part.Elements(), my_tls(), [&const_procinfo_ref](Element& r_elem, my_tls & MyTls) {
r_elem.CalculateLocalSystem(MyTls.mMat, MyTls.mVec, const_procinfo_ref);
});
std::cout << "Build Time: " << setup_system_time.ElapsedSeconds() << std::endl;
}
}
} |
As I said, in this test we don't fully see the influence. It will be nice to also measure differences with a full simulation, considering numerous elements |
BTW, this is using a quite simple CL, maybe in more complex laws the difference is more significant. (5% in a comment you removed) |
@KratosMultiphysics/implementation-committee decided that in order to properly track the performance of the changes, we should in first place implement a proper benchmark utility (@matekelemen has some works in this regard). Then @AlejandroCornejo could test the performance changes in the simplest linear CL. |
Just to complement, I did the test but with Visual Studio compiler... I should redo in Linux |
We also spoke about declaring some member variables as static in order to reduce allocation/deallocation time. Not 100% related, but interesting. |
I have been working on this today (assessing-bounded-vector-in-cl branch) and I could draw the following conclusions: Compiling in linux by means of WSL Ubuntu My test consists in creating in c++ 1M linear hexas with elastic 3D law and call the
With the Bounded I did the multiplications as: BoundedMatrix<double, 6, 24> aux;
BoundedMatrix<double, 24, 6> aux2;
noalias(aux2) = trans(rB);
noalias(aux) = prod(IntegrationWeight *rD, rB);
noalias( rLeftHandSideMatrix ) += prod(aux2, aux); This means that we can reduce build times in about a 50%!! |
I have also seen that the |
This would be even faster using the MathUtils operation |
This was my idea too, reality is that is by far slower! :S |
Yes, because clear is an internal operation and in the other case you are creating a new matrix and copying values |
This is strange... |
With sympy generation of the code (BtDB) is also slightly slower than the bounded matrix multiplication |
May be related with some extension optimization of Ublas... |
Because in both cases you are not using the vectorization but with prod you may use it depending on size of the matrices |
@KratosMultiphysics/implementation-committee from the @KratosMultiphysics/technical-committee we would like to know how this is evolving. Thanks. |
Well I am mostly in charge of this task within the @KratosMultiphysics/implementation-committee . I draw some conclusions and I would be happy to have a discussion regarding this in a meeting to know if this is something to be done or not. |
@KratosMultiphysics/technical-committee delegates this to @RiccardoRossi. |
Description
In @KratosMultiphysics/technical-committee we are in principle favorable in changing the interface of ConstitutiveLaw to use StackBased vectors (of variable size) for Strain,Stress and F instead of heap allocated vector.
this would imply a API breaker change in the ConstitutiveLawParameters
to be discussed further for "details"
The text was updated successfully, but these errors were encountered: