Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core] Adding debug check in CheckAndCorrectZeroDiagonalValues #12782

Conversation

loumalouomega
Copy link
Member

📝 Description

Initial work for #12768.

Basically checks in debug that the daigonal terms exist instead of directly asign them. This will not actually fix the issue in not initialzed diagonals, but this is a first step.

🆕 Changelog

}
}
KRATOS_ERROR << "Diagonal term (" << Index << ", " << Index << ") is not defined in the system matrix" << std::endl;
#else
rA(Index, Index) = scale_factor;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isnt this still some find operation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, what alternative there is?. Only executed in debug.

Copy link
Member

@RiccardoRossi RiccardoRossi Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, i would use the following (surely to be slightly adapted to make it to compile)

auto pos = std::lower_bound(&Acolndices[col_begin],&Acolndices[col_end]);
if pos != &Acolndices[col_end] //it was found
     rValues[pos - &Acolndices[col_begin]] = scale_factor;
#ifdef _DEBUG
else
    KRATOS_ERROR ... 
#endif

this code shall be a bit faster than what you have, and it will use a binary search

if you do not like this, there is a function "find_element(i,i)" which will do the same as calling the lower bound.
i think my version should be slightly faster because we already have col_begin and col_end.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anycase looks like we have tests failing in full debug, hurray

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you take a look to what are the cases failing? to know if it is a realistic scenario or just something we use in the tests (my bet is that there are constraints ... )

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With your code:

[  FAILED  ] KratosCoreFastSuite.BasicDisplacementBlockBuilderAndSolver
[  FAILED  ] KratosCoreFastSuite.BasicDisplacementBlockBuilderAndSolverWithZeroContribution
[  FAILED  ] KratosCoreFastSuite.BasicDisplacementBlockBuilderAndSolverWithConstraints
[  FAILED  ] KratosCoreFastSuite.BasicDisplacementBlockBuilderAndSolverWithInactiveConstraints
[  FAILED  ] KratosCoreFastSuite.BasicDisplacementBlockBuilderAndSolverWithConstraintsAuxiliarNode
[  FAILED  ] KratosCoreFastSuite.BasicDisplacementBlockBuilderAndSolverWithConstraintsAuxiliarNodeInverted
[  FAILED  ] KratosCoreFastSuite.BasicDisplacementBlockBuilderAndSolverAllDoFsMaster
[  FAILED  ] KratosCoreFastSuite.BasicDisplacementBlockBuilderAndSolverAllDoFsMasterFromStructureSide
[  FAILED  ] KratosCoreFastSuite.BasicDisplacementEliminationBuilderAndSolverWithZeroContribution

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my code was wrong. it should be

pos - &Acolndices[0]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, still failing

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, still failing

Wait it was my fault, testing

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay now it only fails in Debug (in my machine) and the failing tests are the same:

[  FAILED  ] KratosCoreFastSuite.BasicDisplacementBlockBuilderAndSolver
[  FAILED  ] KratosCoreFastSuite.BasicDisplacementBlockBuilderAndSolverWithZeroContribution
[  FAILED  ] KratosCoreFastSuite.BasicDisplacementBlockBuilderAndSolverWithConstraints
[  FAILED  ] KratosCoreFastSuite.BasicDisplacementBlockBuilderAndSolverWithInactiveConstraints
[  FAILED  ] KratosCoreFastSuite.BasicDisplacementBlockBuilderAndSolverAllDoFsMaster
[  FAILED  ] KratosCoreFastSuite.BasicDisplacementEliminationBuilderAndSolverWithZeroContribution

@RiccardoRossi
Copy link
Member

well, i am afraid we will need to debug then ... really calling A(i,i) would have an enormous performance impact (apart for threading issues)... so i really hope we are not doing it in real life

@loumalouomega
Copy link
Member Author

well, i am afraid we will need to debug then ... really calling A(i,i) would have an enormous performance impact (apart for threading issues)... so i really hope we are not doing it in real life

Looks like the iterators were giving issues, should work now.

Copy link
Contributor

@matekelemen matekelemen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Letting the user know what went wrong is the absolute minimum.

Ideally, an exception would be thrown even in release mode.

kratos/spaces/ublas_space.h Outdated Show resolved Hide resolved
kratos/spaces/ublas_space.h Outdated Show resolved Hide resolved
kratos/spaces/ublas_space.h Outdated Show resolved Hide resolved
kratos/spaces/ublas_space.h Show resolved Hide resolved
const auto it_Acol_indices_row_end = it_Acol_indices_begin + col_end;

const auto lower = std::lower_bound(it_Acol_indices_row_begin, it_Acol_indices_row_end, Index);
const auto upper = std::upper_bound(it_Acol_indices_row_begin, it_Acol_indices_row_end, Index);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const auto upper = std::upper_bound(it_Acol_indices_row_begin, it_Acol_indices_row_end, Index);
const auto upper = std::upper_bound(lower, it_Acol_indices_row_end, Index);

sry for the pickiness, but if we're doing it, we may as well do it properly.

No need to start searching from the range begin again. lower_bound <= upper_bound always

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think you need the upper (in line with the suggestion of @matekelemen):

if(empty) {
                const auto it_Acol_indices_row_begin = it_Acol_indices_begin + col_begin;
                const auto it_Acol_indices_row_end = it_Acol_indices_begin + col_end;

                const auto lower = std::lower_bound(it_Acol_indices_row_begin, it_Acol_indices_row_end, Index);
                if (lower != it_Acol_indices_row_end && *lower==j) { // Index was found
                    r_Avalues[std::distance(it_Acol_indices_begin, lower)] = scale_factor;
                } else {
            #ifdef KRATOS_DEBUG
                    KRATOS_ERROR << "Diagonal term (" << Index << ", " << Index << ") is not defined in the system matrix" << std::endl;
            #else

per regards the error i am not really sure.

This s a scaling operation. If the row is not there than i would say it makes no sense to scale it ... not that this is an error ...
What is actually wrong in my opinion is to put a nonzero diagonal where there was no diagonal (in a sense equivalent to having a zero diagonal)

@RiccardoRossi
Copy link
Member

ouch, we cannot use both lower_bound and upper_bound, the cost is the double ... at this point we could use rA.find_element(i,j)

@matekelemen
Copy link
Contributor

ouch, we cannot use both lower_bound and upper_bound, the cost is the double ... at this point we could use rA.find_element(i,j)

it's two O(log(n)) operations. A linear scan is ... linear, so think we can still go with this.

@matekelemen
Copy link
Contributor

alternatively, we could skip the upper_bound and check the value of what lower_bound returns. If it's the index we're looking for, the main diagonal entry is stored, otherwise it isn't.

just keep weird cases in mind like completely empty rows.

@loumalouomega
Copy link
Member Author

alternatively, we could skip the upper_bound and check the value of what lower_bound returns. If it's the index we're looking for, the main diagonal entry is stored, otherwise it isn't.

just keep weird cases in mind like completely empty rows.

That's what I was doing

Co-authored-by: Philipp Bucher <[email protected]>
@RiccardoRossi
Copy link
Member

@loumalouomega in release mode, does it run?

what i am asking if without throwing the error it does run. If it does, than it is an indication that my suggestion is correct: do nothing does not matter if the diagonal is not there

@loumalouomega
Copy link
Member Author

@loumalouomega in release mode, does it run?

what i am asking if without throwing the error it does run. If it does, than it is an indication that my suggestion is correct: do nothing does not matter if the diagonal is not there

Yes, in release yours worked

@rfaasse
Copy link
Contributor

rfaasse commented Oct 25, 2024

I know this PR is still under development, but just a heads up that in our pipelines (which run in release) we get a couple of tests failing, all with messages like:

   ERROR: test_k0_procedure_simple_dike (test_k0_procedure_process.KratosGeoMechanicsK0ProcedureProcessTests)
11:54:11   ----------------------------------------------------------------------
11:54:11   Traceback (most recent call last):
11:54:11     File "C:\BuildAgent\work\6b1b55f40e0ab639\bin\Release\applications\GeoMechanicsApplication\tests\test_k0_procedure_process.py", line 385, in test_k0_procedure_simple_dike
11:54:11       stage.Run()
11:54:11     File "C:\BuildAgent\work\6b1b55f40e0ab639\bin\Release\KratosMultiphysics\analysis_stage.py", line 49, in Run
11:54:11       self.RunSolutionLoop()
11:54:11     File "C:\BuildAgent\work\6b1b55f40e0ab639\bin\Release\KratosMultiphysics\GeoMechanicsApplication\geomechanics_analysis.py", line 178, in RunSolutionLoop
11:54:11       converged = self._GetSolver().SolveSolutionStep()
11:54:11     File "C:\BuildAgent\work\6b1b55f40e0ab639\bin\Release\KratosMultiphysics\GeoMechanicsApplication\geomechanics_solver.py", line 277, in SolveSolutionStep
11:54:11       is_converged = self.solver.SolveSolutionStep()
11:54:11   RuntimeError: Error: THE MATRIX IS STRUCTURALLY SINGULAR ... ZERO COLUMN AT 7139
11:54:11   
11:54:11   in applications/LinearSolversApplication/custom_solvers/eigen_sparse_lu_solver.h:53: EigenSparseLUSolver<double>::Compute
11:54:11      kratos/solving_strategies/builder_and_solvers/residualbased_block_builder_and_solver.h:467: ResidualBasedBlockBuilderAndSolver<class UblasSpace<double,class boost::numeric::ublas::compressed_matrix<...>,class boost::numeric::Vector >,...>::InternalSystemSolveWithPhysics
11:54:11      kratos/solving_strategies/builder_and_solvers/residualbased_block_builder_and_solver.h:517: ResidualBasedBlockBuilderAndSolver<class UblasSpace<double,class boost::numeric::ublas::compressed_matrix<...>,class boost::numeric::Vector >,...>::BuildAndSolve

So we actually end up in the situation where we have a completely zero column after these changes.

This was run with the last commit being @matekelemen suggestion, I just merged the latest changes to see if there is a difference (but I only see some simplifications, so it's probably the same situation)

Update: situation is unchanged after running with the latest changes

@loumalouomega
Copy link
Member Author

Now the tests failing are the ones froim Geo app

@rfaasse
Copy link
Contributor

rfaasse commented Oct 25, 2024

Now the tests failing are the ones froim Geo app

Indeed, for the 'Custom' builds the errors are the same as we find in our release builds, while the debug ones run into the newly added error:
Thread #0 caught exception: Error: Diagonal term (4000, 4000) is not defined in the system matrix

@loumalouomega
Copy link
Member Author

Now the tests failing are the ones froim Geo app

Indeed, for the 'Custom' builds the errors are the same as we find in our release builds, while the debug ones run into the newly added error: Thread #0 caught exception: Error: Diagonal term (4000, 4000) is not defined in the system matrix

yep

Copy link
Contributor

@matekelemen matekelemen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I guess what we need to decide then is whether we allow matrices with completely empty rows or not.

Personally, I don't think it would make any sense to allow them. If I recall correctly, matrices aren't manipulated after this point and are passed to linear solvers, who'll obviously fail in such cases.

Maybe the @KratosMultiphysics/geomechanics team can shed some light on how they're getting matrices with empty rows, and whether they are built with empty rows on purpose.

@RiccardoRossi
Copy link
Member

Probably the right approach is to add the diagonal item in the ConstructMatrixStructure

@loumalouomega
Copy link
Member Author

Probably the right approach is to add the diagonal item in the ConstructMatrixStructure

But that implies to modify the B&S. BTW, a priori if using the Block B&S the diagonal is guaranteed, if not in that case I will need to take a look.

@loumalouomega
Copy link
Member Author

What is the status of this, what should we do?

@rfaasse
Copy link
Contributor

rfaasse commented Oct 30, 2024

I had some discussions within the team and from the geomechanics side we don't directly expect to have empty rows in our matrix. We can have a look (Friday at the earliest) using this branch to see what's going on in the few specific failing test cases.

@rfaasse
Copy link
Contributor

rfaasse commented Oct 31, 2024

Hi @loumalouomega, we found some time and had a quick look at the failing test cases: the repeating pattern we see in the failing tests is our 'excavation process'. In short what it does is (de)activate elements/conditions in a certain model part (for more details see

void ApplyExcavationProcess::ExecuteInitialize()
{
KRATOS_TRY
VariableUtils{}.SetFlag(ACTIVE, !mDeactivateSoilPart, mrModelPart.Elements());
if (mDeactivateSoilPart) {
block_for_each(mrModelPart.Elements(),
[](Element& rElement) { rElement.ResetConstitutiveLaw(); });
} else {
VariableUtils{}.SetFlag(ACTIVE, true, mrModelPart.Nodes());
}
VariableUtils{}.SetFlag(ACTIVE, !mDeactivateSoilPart, mrModelPart.Conditions());
KRATOS_CATCH("")
}
)

When I remove this process from the test, it completes without the error (but of course the results are different, since we don't use the deactivation functionality).

Does the fix in this PR take into account what happens when certain elements are inactive? Could that for example result in certain diagonals not existing in the A matrix when calling this function?

FYI @mnabideltares

@loumalouomega
Copy link
Member Author

Hi @loumalouomega, we found some time and had a quick look at the failing test cases: the repeating pattern we see in the failing tests is our 'excavation process'. In short what it does is (de)activate elements/conditions in a certain model part (for more details see

void ApplyExcavationProcess::ExecuteInitialize()
{
KRATOS_TRY
VariableUtils{}.SetFlag(ACTIVE, !mDeactivateSoilPart, mrModelPart.Elements());
if (mDeactivateSoilPart) {
block_for_each(mrModelPart.Elements(),
[](Element& rElement) { rElement.ResetConstitutiveLaw(); });
} else {
VariableUtils{}.SetFlag(ACTIVE, true, mrModelPart.Nodes());
}
VariableUtils{}.SetFlag(ACTIVE, !mDeactivateSoilPart, mrModelPart.Conditions());
KRATOS_CATCH("")
}

)

When I remove this process from the test, it completes without the error (but of course the results are different, since we don't use the deactivation functionality).

Does the fix in this PR take into account what happens when certain elements are inactive? Could that for example result in certain diagonals not existing in the A matrix when calling this function?

FYI @mnabideltares

If you deactivate elements and the connectivity changes maybe you should update the dofs with the flag reform_dofs_each_step in the B&S. An alternative would be to add the dofs of inactive elements/conditions (but at least in the diagonal), but that would imply modify the B&S, and maybe is not the best option right now that the B&S is going to be gone. Opinion @RiccardoRossi ?

@loumalouomega
Copy link
Member Author

I forgot what is the status of this

@loumalouomega
Copy link
Member Author

I forgot what is the status of this

I think that people from @KratosMultiphysics/geomechanics needed to update tests?

@rfaasse
Copy link
Contributor

rfaasse commented Nov 18, 2024

@loumalouomega We are currently looking into the failing tests. The issue seems to be that in our workflow DoF are added for nodes which are only part of an inactive element. Their matrix entries remain uninitialized. We are looking into ways to fix this at our end. Ideally by not adding the DoF for inactive nodes in the first place (if this is possible depends a bit on the order of operations) and in the worst case add a diagonal value to the matrix to unblock this PR.

@loumalouomega
Copy link
Member Author

@loumalouomega We are currently looking into the failing tests. The issue seems to be that in our workflow DoF are added for nodes which are only part of an inactive element. Their matrix entries remain uninitialized. We are looking into ways to fix this at our end. Ideally by not adding the DoF for inactive nodes in the first place (if this is possible depends a bit on the order of operations) and in the worst case add a diagonal value to the matrix to unblock this PR.

Okidoki

@RiccardoRossi
Copy link
Member

@loumalouomega We are currently looking into the failing tests. The issue seems to be that in our workflow DoF are added for nodes which are only part of an inactive element. Their matrix entries remain uninitialized. We are looking into ways to fix this at our end. Ideally by not adding the DoF for inactive nodes in the first place (if this is possible depends a bit on the order of operations) and in the worst case add a diagonal value to the matrix to unblock this PR.

This is strange. Dovrete is constructed using all elements (not just the active ones) exactly to avoid this provlem. Same for construct matrix structure

@matekelemen
Copy link
Contributor

@loumalouomega We are currently looking into the failing tests. The issue seems to be that in our workflow DoF are added for nodes which are only part of an inactive element. Their matrix entries remain uninitialized. We are looking into ways to fix this at our end. Ideally by not adding the DoF for inactive nodes in the first place (if this is possible depends a bit on the order of operations) and in the worst case add a diagonal value to the matrix to unblock this PR.

Wait, so Dofs from inactive Nodes get added to the linear system anyway? Is the activation system unique to the geo app, or is it a core feature?

If it's a core feature and the system ignores it, then it must be fixed in core.

@RiccardoRossi
Copy link
Member

RiccardoRossi commented Nov 18, 2024 via email

@matekelemen
Copy link
Contributor

Got it @RiccardoRossi, thanks.

@rfaasse
Copy link
Contributor

rfaasse commented Nov 19, 2024

We are still looking into the failing tests, it seems that the issue is related to our time integration schemes performing some operations only on active elements (including GetDofList and EquationId, which seems to be incorrect considering the block builder and solver adds zeros for inactive elements). We'll do some further testing if that is really the only remaining issue and try to provide a fix soon, such that this PR can be merged.

@rfaasse
Copy link
Contributor

rfaasse commented Nov 21, 2024

Hi @loumalouomega, yesterday we merged a change on our end fixes the failing test cases in geo, at least locally (see #12864 for more details). Could you bring this branch up to date with master to see if it is also fixed in the github pipelines?

@loumalouomega loumalouomega merged commit 3ae2c5b into master Nov 21, 2024
11 checks passed
@loumalouomega loumalouomega deleted the core/adding-debug-check-in-CheckAndCorrectZeroDiagonalValues branch November 21, 2024 09:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants