forked from chapel-lang/chapel
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathPERFORMANCE
102 lines (81 loc) · 4.55 KB
/
PERFORMANCE
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
========================
Chapel Performance Notes
========================
Though Chapel has been designed to ultimately yield high performance,
our focus to date has predominantly been on implementing its features
correctly and providing user-supported control of features like array
implementations, loop schedules, and architectural descriptions. To
that end, the current compiler is lacking several key optimizations
and therefore is often not competitive with hand-coded C, Fortran,
MPI, and the like. We are currently working on closing this gap.
Use the --fast flag!
--------------------
Once your program is correct and you are ready to do a performance
study, make sure to compile with the --fast flag. This is a compiler
meta-flag that turns off several execution-time correctness checks
(bounds checks, NULL pointer checks, etc.) and turns on C-level
optimizations. See the 'chpl' man page for details.
How is Chapel performance today?
--------------------------------
To characterize Chapel performance, generally speaking...
* single-locale (CHPL_COMM=none | --local) compilations perform better
than multi-locale (CHPL_COMM!=none | --no-local) compilations;
* 1D loops/arrays perform better than multidimensional cases;
* codes with structured communication (e.g., stencils) tend not to
perform competitively with hand-coded computations, whereas
embarrassingly parallel and unstructured communications tend to be
more competitive. The reason for this is that Chapel communications
currently tend to be very fine-grain and demand-driven unless array
assignments are used to move chunks of data between locales.
Experimental flags for improving performance
--------------------------------------------
Our current implementation supports the following config param-based
flags, which are intended to provide a preview of performance
improvements that we are working on delivering automatically in
upcoming releases. Both are available for use "at your own risk" in
that they are not guaranteed to maintain program correctness (detailed
after the flag's description).
* chpl -sassertNoSlicing ...
At present, indexing into a Chapel array tends to require an extra
multiply compared to C/Fortran, in order to support Chapel's rich
array semantics. More specifically, Chapel's support for striding,
reindexing, and rank-change of arrays requires a multiplication to
index into an array's innermost dimension in the general case; but
we pay this cost for every array. In contrast, C and Fortran do not
require such multiplications. For memory-bound programs, this
multiplication is rarely noticeable, but for programs that are
well-tuned for the memory hierarchy, this extra multiplication can
have a significant performance impact.
Work is currently underway to automatically distinguish between
arrays that require this multiplication and those that do not in
order to remove the overhead in the (common) cases where it is
unnecessary. In the meantime, one can request that this
multiplication never be used for a given Chapel program by compiling
with this flag. The flag should be safe for any program that does
not reindex a strided array or perform rank changes on an array to
remove the innermost dimension.
* chpl -snoRefCount ...
At present, Chapel reference counts arrays, domains, and domain maps
in a manner that is far too conservative. This can add unnecessary
overhead, particularly when passing such variables between functions.
This flag turns off such reference counting, but also results in
leaking all arrays, domains, and domain maps. For programs that
only use global arrays, domains, and domain maps, this is unlikely
to be an issue, but for programs with local arrays, domains, and
domain maps, the resulting memory leaks may prevent the program from
running correctly.
We are currently evaluating changes to the implementation and
language definition that would reduce (or eliminate) the amount of
reference counting required by Chapel programs without introducing
these memory leaks. Once these changes are complete, this flag will
be retired.
Tracking Chapel Performance
---------------------------
We are currently working to improve Chapel performance with each
release and are making significant strides. To track our progress
over time, refer to:
http://chapel.sourceforge.net/perf/
From this page, you can track performance tests, either on a nightly
or release-over-release basis. Interested users are encouraged to
submit their own performance tests back to the project for tracking on
this page.