Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fable: Fix excessive compilation duration #206

Merged
merged 7 commits into from
Mar 27, 2024
Merged

Conversation

clsim
Copy link
Contributor

@clsim clsim commented Dec 15, 2023

Users of fable experienced compilation durations up to 12h. This PR is delivering a proven-in-use mitigation.

@clsim clsim force-pushed the clsim/bugfix-make-schema branch 3 times, most recently from fe5deca to d563066 Compare December 18, 2023 13:53
@cassava cassava self-assigned this Jan 8, 2024
@cassava cassava added this to the 0.21.0 milestone Jan 8, 2024
@cassava
Copy link
Contributor

cassava commented Jan 8, 2024

Looks great, thanks for the work!

@cassava
Copy link
Contributor

cassava commented Jan 8, 2024

I'd merge this to master, to make it part of 0.21, and if requested, we can also backport this to 0.20 and 0.19. What do you think?

@clsim
Copy link
Contributor Author

clsim commented Jan 8, 2024

Thank you! A backport is not needed since the users have the same as a workaround and want to upgrade to develop soon.

@cassava cassava force-pushed the clsim/bugfix-make-schema branch from d563066 to da6d42b Compare January 9, 2024 09:27
@cassava
Copy link
Contributor

cassava commented Jan 9, 2024

Force-push: Removed formatting of code in order to better apply to master branch.

@cassava cassava force-pushed the clsim/bugfix-make-schema branch from da6d42b to 6ff5af8 Compare January 9, 2024 09:50
@clsim clsim requested a review from scpa1055 as a code owner January 9, 2024 09:50
@cassava cassava changed the base branch from develop to master January 9, 2024 09:50
@cassava cassava force-pushed the clsim/bugfix-make-schema branch 2 times, most recently from 07b1c22 to f0ee64d Compare January 10, 2024 15:30
@cassava
Copy link
Contributor

cassava commented Jan 15, 2024

Update: I've been attempting to get a positive result in before-after testing. So far I've been unsuccessful.

I've added a stress test that should be representative of problematic use-cases. I've been able to reproduce long compile times up to over an hour for a single file. However, the patch
doesn't seem to be able to reduce this compile time.

@clsim Could you have a look at this and see if I've missed something or made a mistake somewhere?

@cassava
Copy link
Contributor

cassava commented Jan 15, 2024

Here are the steps that can be used to compile and run the stress test:

git clone https://github.com/eclipse/cloe
cd cloe
git switch clsim/bugfix-make-schema
make -f Makefile.docker build-ubuntu-20.04 run-ubuntu-20.04
cd fable/examples/stress
make all
# -> This probably fails because cmake is too old on ubuntu:20.04 image,
#    but Conan tells us how to call cmake.
cd build/Release
cmake ../.. -G "Unix Makefiles" -DCMAKE_TOOLCHAIN_FILE=/cloe/fable/examples/stress/build/Release/generators/conan_toolchain.cmake -DCMAKE_POLICY_DEFAULT_CMP0091=NEW -DCMAKE_BUILD_TYPE=Release
time make all

With the default settings, this last command time make all takes 1min. You can increase this by passing in a different value to the first make call in the container with make all LARGE_STRUCT_SIZE=10000.

@clsim
Copy link
Contributor Author

clsim commented Jan 15, 2024

Update: I've been attempting to get a positive result in before-after testing. So far I've been unsuccessful.

I let the compilation run over night, this is the output of -ftime-report and the 'time' program:
Time variable usr sys wall GGC
phase setup : 0.00 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 1476 kB ( 0%)
phase parsing : 8.44 ( 0%) 1.85 ( 50%) 10.30 ( 0%) 659404 kB ( 35%)
phase lang. deferred : 2.16 ( 0%) 0.35 ( 9%) 2.51 ( 0%) 253179 kB ( 14%)
phase opt and generate :2587.38 (100%) 1.50 ( 40%)2594.69 (100%) 959789 kB ( 51%)
phase finalize : 0.00 ( 0%) 0.01 ( 0%) 0.01 ( 0%) 0 kB ( 0%)
|name lookup : 3.50 ( 0%) 0.30 ( 8%) 4.03 ( 0%) 21774 kB ( 1%)
|overload resolution : 4.09 ( 0%) 1.03 ( 28%) 5.28 ( 0%) 474337 kB ( 25%)
garbage collection : 1.19 ( 0%) 0.02 ( 1%) 1.20 ( 0%) 0 kB ( 0%)
dump files : 0.23 ( 0%) 0.08 ( 2%) 0.22 ( 0%) 0 kB ( 0%)
callgraph construction : 0.46 ( 0%) 0.04 ( 1%) 0.49 ( 0%) 21890 kB ( 1%)
callgraph optimization : 0.17 ( 0%) 0.06 ( 2%) 0.23 ( 0%) 0 kB ( 0%)
ipa dead code removal : 0.05 ( 0%) 0.00 ( 0%) 0.06 ( 0%) 0 kB ( 0%)
ipa inheritance graph : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 30 kB ( 0%)
ipa inlining heuristics : 0.12 ( 0%) 0.00 ( 0%) 0.09 ( 0%) 1 kB ( 0%)
ipa comdats : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 0 kB ( 0%)
ipa various optimizations : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 0 kB ( 0%)
ipa HSA : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 0 kB ( 0%)
ipa free lang data : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 0 kB ( 0%)
cfg construction : 0.08 ( 0%) 0.00 ( 0%) 0.10 ( 0%) 7213 kB ( 0%)
cfg cleanup : 0.29 ( 0%) 0.01 ( 0%) 0.25 ( 0%) 60 kB ( 0%)
trivially dead code : 0.22 ( 0%) 0.01 ( 0%) 0.17 ( 0%) 0 kB ( 0%)
df scan insns : 0.89 ( 0%) 0.05 ( 1%) 0.89 ( 0%) 398 kB ( 0%)
df live regs : 0.40 ( 0%) 0.02 ( 1%) 0.41 ( 0%) 0 kB ( 0%)
df reg dead/unused notes : 0.37 ( 0%) 0.00 ( 0%) 0.34 ( 0%) 17154 kB ( 1%)
register information : 0.15 ( 0%) 0.00 ( 0%) 0.17 ( 0%) 0 kB ( 0%)
alias analysis : 0.10 ( 0%) 0.00 ( 0%) 0.12 ( 0%) 6339 kB ( 0%)
rebuild jump labels : 0.10 ( 0%) 0.00 ( 0%) 0.11 ( 0%) 0 kB ( 0%)
preprocessing : 0.25 ( 0%) 0.31 ( 8%) 0.56 ( 0%) 20483 kB ( 1%)
parser (global) : 0.35 ( 0%) 0.15 ( 4%) 0.43 ( 0%) 66490 kB ( 4%)
parser struct body : 3.28 ( 0%) 0.13 ( 4%) 3.38 ( 0%) 40800 kB ( 2%)
parser enumerator list : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 247 kB ( 0%)
parser function body : 0.14 ( 0%) 0.06 ( 2%) 0.24 ( 0%) 23784 kB ( 1%)
parser inl. func. body : 0.15 ( 0%) 0.12 ( 3%) 0.22 ( 0%) 17315 kB ( 1%)
parser inl. meth. body : 2.25 ( 0%) 0.43 ( 12%) 2.80 ( 0%) 261097 kB ( 14%)
template instantiation : 3.39 ( 0%) 0.89 ( 24%) 4.32 ( 0%) 481791 kB ( 26%)
constant expression evaluation : 0.22 ( 0%) 0.09 ( 2%) 0.26 ( 0%) 566 kB ( 0%)
inline parameters : 0.14 ( 0%) 0.00 ( 0%) 0.14 ( 0%) 16775 kB ( 1%)
integration : 0.02 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 714 kB ( 0%)
tree gimplify : 0.48 ( 0%) 0.02 ( 1%) 0.52 ( 0%) 70382 kB ( 4%)
tree eh : 303.00 ( 12%) 0.14 ( 4%) 303.20 ( 12%) 85346 kB ( 5%)
tree CFG construction : 0.15 ( 0%) 0.02 ( 1%) 0.21 ( 0%) 58560 kB ( 3%)
tree CFG cleanup : 0.26 ( 0%) 0.01 ( 0%) 0.33 ( 0%) 35 kB ( 0%)
tree PHI insertion : 0.09 ( 0%) 0.01 ( 0%) 0.10 ( 0%) 12201 kB ( 1%)
tree SSA rewrite : 0.07 ( 0%) 0.00 ( 0%) 0.11 ( 0%) 23511 kB ( 1%)
tree SSA other : 0.07 ( 0%) 0.06 ( 2%) 0.15 ( 0%) 1210 kB ( 0%)
tree SSA incremental : 0.07 ( 0%) 0.00 ( 0%) 0.08 ( 0%) 0 kB ( 0%)
tree operand scan : 0.09 ( 0%) 0.07 ( 2%) 0.12 ( 0%) 23485 kB ( 1%)
tree switch lowering : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 197 kB ( 0%)
dominance frontiers : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 0 kB ( 0%)
dominance computation : 0.20 ( 0%) 0.02 ( 1%) 0.22 ( 0%) 0 kB ( 0%)
out of ssa : 0.11 ( 0%) 0.03 ( 1%) 0.13 ( 0%) 1823 kB ( 0%)
expand vars :2260.54 ( 87%) 0.20 ( 5%)2266.48 ( 87%) 9529 kB ( 1%)
expand : 0.87 ( 0%) 0.09 ( 2%) 0.94 ( 0%) 174799 kB ( 9%)
post expand cleanups : 0.22 ( 0%) 0.02 ( 1%) 0.20 ( 0%) 31064 kB ( 2%)
varconst : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 5 kB ( 0%)
jump : 0.03 ( 0%) 0.01 ( 0%) 0.01 ( 0%) 0 kB ( 0%)
loop init : 0.06 ( 0%) 0.01 ( 0%) 0.09 ( 0%) 6844 kB ( 0%)
loop fini : 0.03 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 0 kB ( 0%)
mode switching : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 0 kB ( 0%)
integrated RA : 2.81 ( 0%) 0.13 ( 4%) 3.11 ( 0%) 265531 kB ( 14%)
LRA non-specific : 1.08 ( 0%) 0.11 ( 3%) 1.12 ( 0%) 1625 kB ( 0%)
LRA virtuals elimination : 0.29 ( 0%) 0.00 ( 0%) 0.28 ( 0%) 11836 kB ( 1%)
LRA reload inheritance : 0.06 ( 0%) 0.00 ( 0%) 0.06 ( 0%) 7 kB ( 0%)
LRA create live ranges : 0.23 ( 0%) 0.01 ( 0%) 0.29 ( 0%) 1861 kB ( 0%)
LRA hard reg assignment : 0.08 ( 0%) 0.01 ( 0%) 0.10 ( 0%) 0 kB ( 0%)
reload : 0.04 ( 0%) 0.01 ( 0%) 0.04 ( 0%) 0 kB ( 0%)
thread pro- & epilogue : 0.52 ( 0%) 0.02 ( 1%) 0.57 ( 0%) 14411 kB ( 1%)
machine dep reorg : 0.07 ( 0%) 0.00 ( 0%) 0.06 ( 0%) 4086 kB ( 0%)
shorten branches : 0.33 ( 0%) 0.00 ( 0%) 0.46 ( 0%) 0 kB ( 0%)
reg stack : 0.01 ( 0%) 0.01 ( 0%) 0.00 ( 0%) 1 kB ( 0%)
final : 0.88 ( 0%) 0.05 ( 1%) 0.92 ( 0%) 47447 kB ( 3%)
symout : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 0 kB ( 0%)
initialize rtl : 0.00 ( 0%) 0.01 ( 0%) 0.01 ( 0%) 12 kB ( 0%)
rest of compilation : 10.14 ( 0%) 0.16 ( 4%) 10.31 ( 0%) 43090 kB ( 2%)
unaccounted post reload : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 0 kB ( 0%)
repair loop structures : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 0 kB ( 0%)
TOTAL :2597.98 3.71 2607.51 1873863 kB

real 891m14,703s
user 43m21,327s
sys 0m4,233s

@cassava cassava modified the milestones: 0.21.0, 0.22.0 Jan 30, 2024
cassava and others added 4 commits February 19, 2024 09:48
BREAKING CHANGES:

- Removed all make_schema() functions that do not fall into one of the two
  signatures:

    make_schema(T*, std::string&&)
    make_schema(T*, P&&, std::string&&)

  So far, only the two forms above have been used, and the others aren't
  strictly required and don't contribute to readability.
@cassava cassava force-pushed the clsim/bugfix-make-schema branch from 7753ea5 to aa8828d Compare February 19, 2024 10:55
This is the optimal form, according to an analysis by Nicolai Josuttis in a CppCon18 talk:

    Nicolai Josuttis “The Nightmare of Initialization in C++”
    https://www.youtube.com/watch?v=7DTlWPgX6zs
@cassava
Copy link
Contributor

cassava commented Feb 20, 2024

After some more work with @hvolx we managed to discover the issue and fix it. Essentially, perfect forwarding the desc type resolves the issue o_O

Copy link
Contributor Author

@clsim clsim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good for me

@cassava cassava merged commit 2084ca0 into master Mar 27, 2024
6 checks passed
@cassava cassava deleted the clsim/bugfix-make-schema branch July 11, 2024 08:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants