forked from SPECFEM/specfem3d_globe
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path02_getting_started.tex
575 lines (483 loc) · 28 KB
/
02_getting_started.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
%-----------------------------------------------------------------------------------------------------------------------------------%
\chapter{Getting Started}\label{cha:Getting-Started}
%-----------------------------------------------------------------------------------------------------------------------------------%
%-----------------------------------------------------------------------------------------------------------------------------------%
\section{Configuring and compiling the source code}
%-----------------------------------------------------------------------------------------------------------------------------------%
To get the SPECFEM3D\_GLOBE software package, type this:
{\small
\begin{verbatim}
git clone --recursive --branch devel https://github.com/geodynamics/specfem3d_globe.git
\end{verbatim}
}
\noindent
Then, to configure the software for your system, run the
\texttt{configure} shell script. This script will attempt to guess
the appropriate configuration values for your system. However, at
a minimum, it is recommended that you explicitly specify the appropriate
command names for your Fortran compiler (another option is to define FC, CC and MPIF90 in your .bash\_profile
or your .cshrc file):
{\small
\begin{verbatim}
./configure FC=gfortran CC=gcc MPIFC=mpif90
\end{verbatim}
}
\noindent
You can replace the GNU compilers above (gfortran and gcc) with other compilers if you want to; for instance for Intel ifort and icc use FC=ifort CC=icc instead.\\
Before running the \texttt{configure} script, you should probably edit file \texttt{flags.guess} to make sure that it contains the best compiler options for your system. Known issues or things to check are:
\begin{description}
\item [{\texttt{GCC gfortran compiler}}] The code makes use of Fortran 2008 features, e.g., the \texttt{contiguous} array attribute. We thus recommend using a gfortran version 4.6.0 or higher.
\item [{\texttt{Intel ifort compiler}}] See if you need to add \texttt{-assume byterecl} for your machine. In the case of that compiler, we have noticed that initial release versions sometimes have bugs or issues that can lead to wrong results when running the code, thus we \emph{strongly} recommend using a version for which at least one service pack or update has been installed.
In particular, for version 17 of that compiler, users have reported problems (making the code crash at run time) with the \texttt{-assume buffered\_io} option; if you notice problems,
remove that option from file \texttt{flags.guess} or change it to \texttt{-assume nobuffered\_io} and try again.
\item [{\texttt{IBM compiler}}] See if you need to add \texttt{-qsave} or \texttt{-qnosave} for your machine.
\item [{\texttt{Mac OS}}] You will probably need to install \texttt{XCODE}.
\end{description}
When compiling on an IBM machine with the \texttt{xlf} and \texttt{xlc} compilers, we suggest running the \texttt{configure} script
with the following options:
{\small
\begin{verbatim}
./configure FC=xlf90_r MPIFC=mpif90 CC=xlc_r CFLAGS="-O3 -q64" FCFLAGS="-O3 -q64"
\end{verbatim}
}
If you have problems configuring the code on a Cray machine, i.e. for instance if you get an error message from the \texttt{configure} script, try exporting these two variables:
\texttt{MPI\_INC=\${CRAY\_MPICH2\_DIR}/include and FCLIBS=" "}, and for more details if needed you can refer to the \texttt{utils/Cray\_compiler\_information} directory.
You can also have a look at the configure script called:\\
\texttt{utils/Cray\_compiler\_information/configure\_SPECFEM\_for\_Piz\_Daint.bash}.
On SGI systems, \texttt{flags.guess} automatically informs \texttt{configure}
to insert ``\texttt{TRAP\_FPE=OFF}'' into the generated \texttt{Makefile}
in order to turn underflow trapping off.\\
If you run very large meshes on a relatively small number
of processors, the static memory size needed on each processor might become
greater than 2 gigabytes, which is the upper limit for 32-bit addressing
(dynamic memory allocation is always OK, even beyond the 2 GB limit; only static memory has a problem).
In this case, on some compilers you may need to add \texttt{``-mcmodel=medium}'' (if you do not use the Intel ifort / icc compiler)
or \texttt{``-mcmodel=medium -shared-intel}'' (if you use the Intel ifort / icc compiler)
to the configure options of CFLAGS, FCFLAGS and LDFLAGS otherwise the compiler will display an error
message (for instance \texttt{``relocation truncated to fit: R\_X86\_64\_PC32 against .bss''} or something similar);
on an IBM machine with the \texttt{xlf} and \texttt{xlc} compilers, using \texttt{-q64} is usually sufficient.\\
We recommend that you add {\texttt{ulimit -S -s unlimited}} to your {\texttt{.bash\_profile}} file and/or {\texttt{limit stacksize unlimited }} to your {\texttt{.cshrc}} file to suppress any potential limit to the size of the Unix stack.\\
%-----------------------------------------------------------------------------------------------------------------------------------%
\section{Using the GPU version of the code}
%-----------------------------------------------------------------------------------------------------------------------------------%
\noindent
SPECFEM3D\_GLOBE now supports CUDA, OpenCL and HIP GPU acceleration.
CUDA configuration can be enabled with \texttt{-{}-with-cuda} flag and
\texttt{CUDA\_FLAGS=}, \texttt{CUDA\_LIB=}, \texttt{CUDA\_INC=}
and \texttt{ MPI\_INC=} variables like
{\small
\begin{verbatim}
./configure --with-cuda CUDA_FLAGS= CUDA_LIB= CUDA_INC= MPI_INC= ..
\end{verbatim}
}
\noindent
When compiling for specific GPU cards, you can enable the corresponding Nvidia GPU card architecture version with:
{\small
\begin{verbatim}
./configure --with-cuda=cuda9 ..
\end{verbatim}
}
\noindent
where for example \texttt{cuda4,cuda5,cuda6,cuda7,..} specifies the target GPU architecture of your card,
(e.g., with CUDA 9 this refers to Volta V100 cards), rather than the installed version of the CUDA toolkit.
Before CUDA version 5, one version supported basically one new architecture and needed a different kind of compilation.
Since version 5, the compilation has stayed the same, but newer versions supported newer architectures.
However at the moment, we still have one version linked to one specific architecture:
{\small
\begin{verbatim}
- CUDA 4 for Tesla, cards like K10, Geforce GTX 650, ..
- CUDA 5 for Kepler, like K20
- CUDA 6 for Kepler, like K80
- CUDA 7 for Maxwell, like Quadro K2200
- CUDA 8 for Pascal, like P100
- CUDA 9 for Volta, like V100
- CUDA 10 for Turing, like GeForce RTX 2080
- CUDA 11 for Ampere, like A100
\end{verbatim}
}
\noindent
So even if you have the new CUDA toolkit version 11, but you want to run on say a K20 GPU, then you would still configure with:
{\small
\begin{verbatim}
./configure --with-cuda=cuda5
\end{verbatim}
}
\noindent
The compilation with the cuda5 setting chooses then the right architecture (\texttt{-gencode=arch=compute\_35,code=sm\_35} for K20 cards).\\
SPECFEM3D\_GLOBE also supports CUDA-aware MPI. This code feature can be enabled by adding the flag \texttt{-{}-enable-cuda-aware-mpi} to
the configuration, like:
{\small
\begin{verbatim}
./configure --with-cuda=cuda9 --enable-cuda-aware-mpi ..
\end{verbatim}
}
\noindent
Please make sure beforehand that your MPI installation supports CUDA-aware MPI.
For example, with OpenMPI installed, check the output of the command
{\small
\begin{verbatim}
ompi_info --parsable --all | grep mpi_built_with_cuda_support:value
\end{verbatim}
}
\noindent
Without a CUDA-aware MPI installation, the code will fall back to its default handling, i.e., passing MPI buffers through the CPU.
In case available, test if this feature will improve the overall performance of your simulation.\\
\noindent
The same applies to compilation for AMD cards with HIP:
{\small
\begin{verbatim}
./configure --with-hip ..
\end{verbatim}
}
or
{\small
\begin{verbatim}
./configure --with-hip=MI8 ..
\end{verbatim}
}
\noindent
where for example \texttt{MI8,MI25,MI50,MI100,MI250,..} specifies the target GPU architecture of your card.
Additional compilation flags can be added by specifying \texttt{HIP\_FLAGS}, as for example:
{\small
\begin{verbatim}
./configure --with-hip=MI250 \
HIP_FLAGS="-fPIC -ftemplate-depth-2048 -fno-gpu-rdc -std=c++17 \
-O2 -fdenormal-fp-math=ieee -fcuda-flush-denormals-to-zero -munsafe-fp-atomics" \
..
\end{verbatim}
}
OpenCL can be enabled with the \texttt{-{}-with-opencl} flag, and the
compilation can be controlled through three variables: \texttt{OCL\_LIB=},
\texttt{OCL\_INC=} and \texttt{OCL\_GPU\_FLAGS=}.
{\small
\begin{verbatim}
./configure --with-opencl OCL_LIB= OCL_INC= OCL_GPU_FLAGS=..
\end{verbatim}
}
Both CUDA and OpenCL environments can be compiled simultaneously by merging these two lines.
For the runtime configuration, the \texttt{GPU\_MODE} flag must be set
to \texttt{.true.}. In addition, we use three parameters to select the
environments and GPU:
{\small
\begin{verbatim}
GPU_RUNTIME = 0|1|2|3
GPU_PLATFORM = filter|*
GPU_DEVICE = filter|*
\end{verbatim}
}
\begin{description}
\item[\texttt{GPU\_RUNTIME}] sets the runtime environments: $1$ for CUDA, $2$ for OpenCL, $3$ for HIP
and $0$ for compile-time decision (hence, SPECFEM should
have been compiled with only one of \texttt{-{}-with-cuda}, \texttt{-{}-with-opencl} or \texttt{-{}-with-hip}).
\item[\texttt{GPU\_PLATFORM} and \texttt{GPU\_DEVICE}] are both (case-insensitive)
filters on the platform and device name in OpenCL, device name only in
CUDA. In multiprocessor (MPI)runs, each process will pick a GPU in
this filtered subset, in round-robin. The star filter (\texttt{*})
will match the first platform and all its devices.
\end{description}
\texttt{GPU\_RUNTIME}, \texttt{GPU\_PLATFORM} and \texttt{GPU\_DEVICE}
are not read if \texttt{GPU\_MODE} is not activated.
Regarding the code, \texttt{-{}-with-opencl} defines the
macro-processor flag \texttt{USE\_OPENCL}, \texttt{-{}-with-cuda}
defines \texttt{USE\_CUDA}, and \texttt{-{}-with-hip}
defines \texttt{USE\_HIP}; and \texttt{GPU\_RUNTIME} set the global
variable \texttt{run\_cuda}, \texttt{run\_opencl} or \texttt{run\_hip}.
Texture support has not been validated in OpenCL, but works as
expected in CUDA.\\
Note about the CUDA/OpenCL/HIP kernel versions: the CUDA/OpenCL/HIP device kernels were
created using a software package called BOAST \citep{Videau2013} by Brice Videau and Kevin Pouget from Grenoble, France.
This source-to-source translation tool reads the kernel definitions (written in ruby) in directory \texttt{src/gpu/boast}
and generates the corresponding device kernel files provided in directory \texttt{src/gpu/kernels.gen}.
%-----------------------------------------------------------------------------------------------------------------------------------%
\section{Adding OpenMP support in addition to MPI}
%-----------------------------------------------------------------------------------------------------------------------------------%
OpenMP support can be enabled in addition to MPI. However, in many
cases performance will not improve because our pure MPI implementation
is already heavily optimized and thus the resulting code will in fact
be slightly slower. A possible exception could be IBM BlueGene-type
architectures.\\
\noindent
To enable OpenMP, add the flag \texttt{-{}-enable-openmp} to the configuration:
{\small
\begin{verbatim}
./configure --enable-openmp ..
\end{verbatim}
}
\noindent
This will add the corresponding OpenMP flag for the chosen Fortran compiler.\\
The DO-loop using OpenMP threads has a SCHEDULE property. The \texttt{OMP\_SCHEDULE}
environment variable can set the scheduling policy of that DO-loop.
Tests performed by Marcin Zielinski at SARA (The Netherlands) showed
that often the best scheduling policy is DYNAMIC with the size of
the chunk equal to the number of OpenMP threads, but most preferably
being twice as the number of OpenMP threads (thus chunk size = 8 for
4 OpenMP threads etc). If \texttt{OMP\_SCHEDULE} is not set or is empty, the
DO-loop will assume generic scheduling policy, which will slow down
the job quite a bit.
%-----------------------------------------------------------------------------------------------------------------------------------%
\section{Configuration summary}
%-----------------------------------------------------------------------------------------------------------------------------------%
\noindent
A summary of the most important configuration variables follows.
\begin{description}
\item [{\texttt{FC}}] Fortran compiler command name. By default, \texttt{configure}
will execute the command names of various well-known Fortran compilers
in succession, picking the first one it finds that works.
\item [{\texttt{MPIFC}}] MPI Fortran command name. The default is \texttt{mpif90}.
This must correspond to the same underlying compiler specified by
\texttt{FC}; otherwise, you will encounter compilation or link errors
when you attempt to build the code. If you are unsure about this,
it is usually safe to set both \texttt{FC} and \texttt{MPIFC} to the
MPI compiler command for your system:
{\small
\begin{verbatim}
./configure FC=mpif90 MPIFC=mpif90
\end{verbatim}
}
\end{description}
\begin{description}
\item [{\texttt{FLAGS\_CHECK}}] Compiler flags.
\item [{\texttt{LOCAL\_PATH\_IS\_ALSO\_GLOBAL}}]
If you want the parallel mesher to write a parallel (i.e., split) database for the solver on the
local disks of each of the compute nodes, set this flag to \texttt{.false.}.
Some systems have no local disks
(e.g., BlueGene) and other systems have a fast
parallel file system (LUSTRE, GPFS) that is easy and reliable to use, in which case this variable should be set to
\texttt{.true.}. Note that this flag is not used by the mesher nor
the solver; it is only used for some of the (optional) post-processing.
If you do not know what is best on your system, setting it to \texttt{.true.} is usually fine; or else, ask your system administrator.
\end{description}
In addition to reading configuration variables, \texttt{configure}
accepts the following options:
\begin{description}
\item [{\texttt{-{}-enable-double-precision}}] The package can run either
in single or in double precision mode. The default is single precision
because for almost all calculations performed using the spectral-element method
using single precision is sufficient and gives the same results (i.e. the same seismograms);
and the single precision code is faster and requires exactly half as much memory. To specify
double precision mode, simply provide \texttt{-{}-enable-double-precision}
as a command-line argument to \texttt{configure}.
On many current processors (e.g., Intel, AMD, IBM Power), single precision calculations
are significantly faster; the difference can typically be 10\%
to 25\%. It is therefore better to use single precision.
What you can do once for the physical problem you want to study is run the same calculation in single precision
and in double precision on your system and compare the seismograms.
If they are identical (and in most cases they will), you can select single precision for your future runs.
\item [{\texttt{-{}-help}}] Directs \texttt{configure} to print a usage
screen which provides a short description of all configuration variables
and options. Note that the options relating to installation directories
(e.g., \texttt{-{}-prefix}) do not apply to SPECFEM3D\_GLOBE.
\end{description}
The \texttt{configure} script runs a brief series of checks. Upon
successful completion, it generates the files \texttt{Makefile}, \texttt{constants.h},
and \texttt{precision.h} in the working directory.
\begin{description}
\item [{Note:}] If the \texttt{configure} script fails, and you don't know
what went wrong, examine the log file \texttt{config.log}. This file
contains a detailed transcript of all the checks \texttt{configure}
performed. Most importantly, it includes the error output (if any)
from your compiler.
\end{description}
The \texttt{configure} script automatically runs the script \texttt{flags.guess}.
This helper script contains a number of suggested flags for various
compilers; e.g., Portland, Intel, Absoft, NAG, Lahey, NEC, IBM and
SGI. The software has run on a wide variety of compute platforms,
e.g., various PC clusters and machines from Sun, SGI, IBM, Compaq,
and NEC. The \texttt{flags.guess} script attempts to guess which compiler
you are using (based upon the compiler command name) and choose the
related optimization flags. The \texttt{configure} script then automatically
inserts the suggested flags into \texttt{Makefile}. Note that \texttt{flags.guess}
may fail to identify your compiler; and in any event, the default
flags chosen by \texttt{flags.guess} are undoubtedly not optimal for
your system. So, we encourage you to experiment with these flags (by
editing the generated \texttt{Makefile} by hand) and to solicit advice
from your system administrator. Selecting the right compiler and compiler
flags can make a tremendous difference in terms of performance. We
welcome feedback on your experience with various compilers and flags.\\
When using a slow or not too powerful shared disk system or when running extremely large simulations
(on tens of thousands of processor cores), one can add \texttt{-DUSE\_SERIAL\_CASCADE\_FOR\_IOs} to the compiler flags
in file \texttt{flags.guess} before running \texttt{configure} to make the mesher output mesh data
to the disk for one MPI slice after the other, and to make the solver do the same thing when reading the files back from disk.
Do not use this option if you do not need it because it will slow down the mesher and the beginning of the solver if your
shared file system is fast and reliable.
If you run scaling benchmarks of the code, for instance to measure its performance on a new machine, and are not interested in the physical results
(the seismograms) for these runs, you can set \texttt{DO\_BENCHMARK\_RUN\_ONLY} to \texttt{.true.} in file \texttt{setup/constants.h.in} before running the \texttt{configure} script.
If your compiler has problems with the \texttt{use mpi} statements that are used in the code, use the script called
\texttt{replace\_use\_mpi\_with\_include\_mpif\_dot\_h.pl} in the root directory to replace all of them with \texttt{include 'mpif.h'} automatically.
We recommend that you ask for exclusive use of the compute nodes when running on a cluster or a supercomputer, i.e., make sure that no other users
are running on the same nodes at the same time. Otherwise your run could run out of memory if the memory of some nodes is used by other users, in particular
when undoing attenuation using the UNDO\_ATTENUATION option in DATA/Par\_file.
To do so, ask your system administrator for the option to add to your batch submission script; it is for instance
\texttt{\#BSUB -x} with SLURM and \texttt{\#\$ -l exclusive=TRUE} with Sun Grid Engine (SGE).
%-----------------------------------------------------------------------------------------------------------------------------------%
\section{Compiling on an IBM BlueGene}
%-----------------------------------------------------------------------------------------------------------------------------------%
\noindent
Installation instructions for IBM BlueGene (from October 2012):\\
\noindent
Edit file \texttt{flags.guess} and put this for \texttt{FLAGS\_CHECK}:
{\small
\begin{verbatim}
-g -qfullpath -O2 -qsave -qstrict -qtune=qp -qarch=qp -qcache=auto -qhalt=w \
-qfree=f90 -qsuffix=f=f90 -qlanglvl=95pure -Q -Q+rank,swap_all -Wl,-relax
\end{verbatim}
}
\noindent
The most relevant are the -qarch and -qtune flags, otherwise if these flags are set to ``auto'' then they are wrongly assigned to
the architecture of the frond-end node, which is different from that on the compute nodes.
You will need to set these flags to the right architecture for your BlueGene compute nodes, which is not necessarily ``qp'';
ask your system administrator.
On some machines if is necessary to use -O2 in these flags instead of -O3 due to a compiler bug of the XLF version installed.
We thus suggest to first try -O3, and then if the code does not compile or does not run fine then switch back to -O2.
The debug flags (-g, -qfullpath) do not influence performance but are useful to get at least some insights in case of problems.\\
\noindent
Before running \texttt{configure}, select the XL Fortran compiler by typing \texttt{module load bgq-xl/1.0}
or \texttt{module load bgq-xl} (another, less efficient option is to load the GNU compilers using \texttt{module load bgq-gnu/4.4.6} or similar).\\
\noindent
Then, to configure the code, type this:
{\small
\begin{verbatim}
./configure FC=bgxlf90_r MPIFC=mpixlf90_r CC=bgxlc_r LOCAL_PATH_IS_ALSO_GLOBAL=true
\end{verbatim}
}
\noindent
\underline{Older installation instruction for IBM BlueGene, from 2011:}\\
\noindent
To compile the code on an IBM BlueGene, Laurent L\'eger from IDRIS, France, suggests the following: compile the code with
{\small
\begin{verbatim}
FLAGS\_CHECK="-O3 -qsave -qstrict -qtune=auto -qarch=450d -qcache=auto \
-qfree=f90 -qsuffix=f=f90 -g -qlanglvl=95pure -qhalt=w -Q -Q+rank,swap_all -Wl,-relax"
\end{verbatim}
}
\noindent
Option "-Wl,-relax" must be added on many (but not all) BlueGene systems to be able to link the binaries \texttt{xmeshfem3D}
and \texttt{xspecfem3D} because the final link step is done by the GNU \texttt{ld} linker even if
one uses \texttt{FC=bgxlf90\_r, MPIFC=mpixlf90\_r} and \texttt{CC=bgxlc\_r} to create all the object files.
On the contrary, on some BlueGene systems that use the native AIX linker option "-Wl,-relax" can lead to problems and must be suppressed from \texttt{flags.guess}.
\noindent
One then just needs to pass the right commands to the \texttt{configure} script:
{\small
\begin{verbatim}
./configure --prefix=/path/to/SPECFEM3DG_SP --host=Babel --build=BGP \
FC=bgxlf90_r MPIFC=mpixlf90_r CC=bgxlc_r \
LOCAL_PATH_IS_ALSO_GLOBAL=false
\end{verbatim}
}
\noindent
This trick can be useful for all hosts on which one needs to cross-compile.
\noindent
On BlueGene, one also needs to run the \texttt{xcreate\_header\_file} binary file manually rather than in the Makefile:
{\small
\begin{verbatim}
bgrun -np 1 -mode VN -exe ./bin/xcreate_header_file
\end{verbatim}
}
%-----------------------------------------------------------------------------------------------------------------------------------%
\section{Compiling on an Intel Xeon Phi (Knights Landing KNL)}
%-----------------------------------------------------------------------------------------------------------------------------------%
\noindent
In case you want to run simulations on a KNL chip, the compilation doesn't require much more effort than with any other CPU system.
All you could add is the flag \texttt{-xMIC-AVX512} to your Fortran flags in the \texttt{Makefile} and use \texttt{-{}-enable-openmp} for configuration.
Since there are different memory types available with a KNL, make sure to use fast memory allocations, i.e. MCDRAM, which has a higher memory bandwidth.
Assuming you use a flat mode setup of the KNL chip, you could use the Linux tool \texttt{numactl} to specify which memory node to bind to.
For example, check with
{\small
\begin{verbatim}
numactl --hardware
\end{verbatim}
}
\noindent
which node contains CPU cores and which one only binds to MCDRAM ($\sim$16GB). In flat mode setup, most likely node~1 does.
For a small example on a single KNL with 4~MPI processes and 16~OpenMP threads each, you would run the solver with a command like
{\small
\begin{verbatim}
OMP_NUM_THREADS=16 mpirun -np 4 numactl --membind=1 ./bin/xspecfem3D
\end{verbatim}
}
\noindent
The ideal setup of MPI processes and OpenMP threads per KNL depends on your specific hardware and simulation setup. We see good results when using a combination of both, with a total number of threads slightly less than the total count of cores on the chip.
As a side remark for developers, another possibility would be to add following compiler directives in the source code
(in file \texttt{src/specfem3D/specfem3D\_par.F90}):
{\small
\begin{verbatim}
real(kind=CUSTOM_REAL), dimension(:,:), allocatable :: &
displ_crust_mantle,veloc_crust_mantle,accel_crust_mantle
! FASTMEM attribute: note this attribute needs compiler flag -lmemkind to work...
!DEC$ ATTRIBUTES FASTMEM :: displ_crust_mantle,veloc_crust_mantle,accel_crust_mantle
\end{verbatim}
}
\noindent
These directives will work with Intel ifort compilers and will need the additional linker/compiler flag \texttt{-lmemkind} to work properly.
We omitted these directives for now to avoid confusion with other possible simulation setups.
%-----------------------------------------------------------------------------------------------------------------------------------%
\section{Using a cross compiler}
%-----------------------------------------------------------------------------------------------------------------------------------%
\noindent
The \texttt{``configure''} script assumes that you will compile the code on the same kind of hardware
as the machine on which you will run it. On some systems (for instance IBM BlueGene, see also the previous section) this might not be the case
and you may compile the code using a cross compiler on a frontend computer that does not have the same
architecture. In such a case, typing \texttt{``make all''} on the frontend will fail, but you can use one of these two solutions: \\
\noindent
1/ create a script that runs \texttt{``make all''} on a node instead of on the frontend, if the compiler is also installed on the nodes \\
\noindent
2/ in case of static compilation, after running the \texttt{``configure''} script, create two copies of the Makefiles:
\\
\red{TODO: this has not been tested out yet, any feedback is welcome}
\\
\noindent
In \texttt{src/create\_header\_file/Makefile} put this instead of the current values:
{\small
\begin{verbatim}
FLAGS_CHECK = -O0
\end{verbatim}
}
\noindent
and replace
{\small
\begin{verbatim}
create_header_file: $O/create_header_file.o $(XCREATE_HEADER_OBJECTS)
${FCCOMPILE_CHECK} -o ${E}/xcreate_header_file $O/create_header_file.o $(XCREATE_HEADER_OBJECTS)
\end{verbatim}
}
\noindent
with
{\small
\begin{verbatim}
xcreate_header_file: $O/create_header_file.o $(XCREATE_HEADER_OBJECTS)
${MPIFCCOMPILE_CHECK} -o ${E}/xcreate_header_file $O/create_header_file.o $(XCREATE_HEADER_OBJECTS)
\end{verbatim}
}
\noindent
and comment out the line calling the executable:
{\small
\begin{verbatim}
${OUTPUT}/values_from_mesher.h: $E/xcreate_header_file $B/DATA/Par_file
# $E/xcreate_header_file
\end{verbatim}
}
\noindent
Then:
{\small
\begin{verbatim}
make clean
make create_header_file
./bin/xcreate_header_file
make clean
make meshfem3D
make specfem3D
\end{verbatim}
}
\noindent
should work.
%-----------------------------------------------------------------------------------------------------------------------------------%
\section{Visualizing the subroutine calling tree of the source code}
%-----------------------------------------------------------------------------------------------------------------------------------%
Packages such as \texttt{doxywizard} can be used to visualize the subroutine calling tree of the source code.
\texttt{Doxywizard} is a GUI front-end for configuring and running \texttt{doxygen}.
To visualize the call tree (calling tree) of the source code, you can see the Doxygen tool available in directory \texttt{doc/call\_trees\_of\_the\_source\_code}.
%-----------------------------------------------------------------------------------------------------------------------------------%
\section{Becoming a developer of the code, or making small modifications in the source code}
%-----------------------------------------------------------------------------------------------------------------------------------%
If you want to develop new features in the code, and/or if you want to make small changes, improvements, or bug fixes, you are very welcome to contribute. To do so, i.e. to access the development branch of the source code with read/write access (in a safe way, no need to worry too much about breaking the package, there are CI tests based on BuildBot, Travis-CI and Jenkins in place that are checking and validating all new contributions and changes), please visit this Web page:\\
\url{https://github.com/geodynamics/specfem3d_globe/wiki}