danielpeter
diff --git a/‎doc/USER_MANUAL/02_getting_started.tex
Lines changed: 95 additions & 22 deletions b/‎doc/USER_MANUAL/02_getting_started.tex
Lines changed: 95 additions & 22 deletions
diff --git a/‎doc/USER_MANUAL/03_mesh_generation.tex
Lines changed: 10 additions & 0 deletions b/‎doc/USER_MANUAL/03_mesh_generation.tex
Lines changed: 10 additions & 0 deletions
diff --git a/‎doc/USER_MANUAL/04_running_the_solver.tex
Lines changed: 7 additions & 3 deletions b/‎doc/USER_MANUAL/04_running_the_solver.tex
Lines changed: 7 additions & 3 deletions
diff --git a/‎doc/USER_MANUAL/05_adjoint_simulations.tex
Lines changed: 5 additions & 0 deletions b/‎doc/USER_MANUAL/05_adjoint_simulations.tex
Lines changed: 5 additions & 0 deletions
@@ -9,9 +9,6 @@ \chapter{Getting Started}\label{cha:Getting-Started}
 git clone --recursive --branch devel https://github.com/SPECFEM/specfem2d.git
 \end{verbatim}
 
-Note: for people who would like to run the package on Windows rather than on Unix machines, you can install Docker or VirtualBox (installing a Linux in VirtualBox in that latter case) and run it easily from inside that.
-
-We recommend that you add \texttt{ulimit -S -s unlimited} to your \texttt{.bash\_profile} file and/or \texttt{limit stacksize unlimited} to your \texttt{.cshrc} file to suppress any potential limit to the size of the Unix stack.
 
 Then, to configure the software for your system, run the
 \texttt{configure} shell script. This script will attempt to guess
@@ -32,22 +29,24 @@ \chapter{Getting Started}\label{cha:Getting-Started}
 
 You can replace the GNU compilers above (gfortran and gcc) with other compilers if you want to; for instance for Intel ifort and icc use FC=ifort CC=icc instead.
 
-Before running the \texttt{configure} script, you should probably edit file \texttt{flags.guess} to make sure that it contains the best compiler options for your system. Known issues or things to check are:
-
+Before running the \texttt{configure} script, you should probably edit file \texttt{flags.guess} to make sure that it contains the best
+compiler options for your system. Known issues or things to check are:
 \begin{description}
-\item [Intel ifort compiler] See if you need to add \texttt{-assume byterecl} for your machine.\newline
-
+\item [{\texttt{Intel ifort compiler}}] See if you need to add \texttt{-assume byterecl} for your machine.
 In the case of that compiler, we have noticed that initial release versions sometimes have bugs or issues that can lead to wrong results when running the code, thus we \emph{strongly} recommend using a version for which at least one service pack or update has been installed.
-In particular, for \red{version 17} of that compiler, users have reported problems (making the code crash at run time) with the \texttt{-assume buffered\_io} option; if you notice problems,
+In particular, for version 17 of that compiler, users have reported problems (making the code crash at run time) with the \texttt{-assume buffered\_io} option; if you notice problems,
 remove that option from file \texttt{flags.guess} or change it to \texttt{-assume nobuffered\_io} and try again.
-\item [IBM compiler] See if you need to add \texttt{-qsave} or \texttt{-qnosave} for your machine.
-\item [Mac OS] You will probably need to install \texttt{Xcode}.
-\item [IBM Blue Gene machines] Please refer to the manual of SPECFEM3D\_Cartesian, which contains detailed instructions on how to run on Blue Gene.
+
+\item [{\texttt{IBM compiler}}] See if you need to add \texttt{-qsave} or \texttt{-qnosave} for your machine.
+
+\item [{\texttt{Mac OS}}] You will probably need to install \texttt{Xcode}.
+
+%\item [{\texttt{IBM Blue Gene machines}] Please refer to the manual of SPECFEM3D\_Cartesian, which contains detailed instructions on how to run on Blue Gene.
 \end{description}
 
 The SPECFEM2D software package relies on the SCOTCH library to partition meshes.
-The SCOTCH library \citep{PeRo96}
-provides efficient static mapping, graph and mesh partitioning routines. SCOTCH is a free software package developed by
+The SCOTCH library \citep{PeRo96} provides efficient static mapping,
+graph and mesh partitioning routines. SCOTCH is a free software package developed by
 Fran\c{c}ois Pellegrini et al. from LaBRI and Inria in Bordeaux, France, downloadable from the web page \url{https://gitlab.inria.fr/scotch/scotch}.
 In case no SCOTCH libraries can be found on the system, the configuration will bundle the version provided with the source code for compilation.
 The path to an existing SCOTCH installation can to be set explicitly with the option \texttt{-{}-with-scotch-dir}.
@@ -68,7 +67,13 @@ \chapter{Getting Started}\label{cha:Getting-Started}
 
 When compiling the SCOTCH source code, if you get a message such as: "ld: cannot find -lz",
 the Zlib compression development library is probably missing on your machine and you will need to install it or ask your system administrator to
-do so. On Linux machines the package is often called "zlib1g-dev" or similar. (thus "sudo apt-get install zlib1g-dev" would install it)
+do so. On Linux machines the package is often called "zlib1g-dev" or similar (thus "sudo apt-get install zlib1g-dev" would install it).\newline
+
+You can add \texttt{-{}-enable-vectorization} to the configuration options to speed up the code in the fluid (acoustic) and elastic parts.
+This works fine if (and only if) your computer always allocates a contiguous memory block for each allocatable array;
+this is the case for most machines and most compilers, but not all.
+To check if that option works fine on your machine, run the code with and without it for an acoustic/elastic model and make sure the seismograms are identical.\newline
+
 
 You may edit the \texttt{Makefile} for more specific modifications. Especially, there are several options available:
 %
@@ -83,7 +88,7 @@ \chapter{Getting Started}\label{cha:Getting-Started}
     make
 \end{verbatim}
 %
-to create all executables which will be placed into the folder \texttt{./bin/}.
+to create all executables which will be placed into the folder \texttt{./bin/}.\newline
 
 By default, the solver runs in single precision. This is fine for most application, but if for some reason
 you want to run the solver in double precision, run the \texttt{configure} script with option ``\texttt{-{}-enable-double-precision}''.
@@ -94,23 +99,91 @@ \chapter{Getting Started}\label{cha:Getting-Started}
 \texttt{replace\_use\_mpi\_with\_include\_mpif\_dot\_h.pl} in the root directory to replace all of them with \texttt{include `mpif.h'} automatically.
 
 If you have problems configuring the code on a Cray machine, i.e. for instance if you get an error message from the \texttt{configure} script, try exporting these two variables:
-\texttt{MPI\_INC=\${CRAY\_MPICH2\_DIR}/include and FCLIBS=" "}, and for more details if needed you can refer to the \texttt{utils/Cray\_compiler\_information} directory.
+\texttt{MPI\_INC=\${CRAY\_MPICH2\_DIR}/include and FCLIBS=" "}, and for more details if needed you can refer to the \texttt{utils/infos/Cray\_compiler\_information} directory.
 You can also have a look at the configure script called\newline
-\texttt{utils/Cray\_compiler\_information/configure\_SPECFEM\_for\_Piz\_Daint.bash}.
+\texttt{utils/infos/Cray\_compiler\_information/configure\_SPECFEM\_for\_Piz\_Daint.bash}.\newline
+
+For people who would like to run the package on Windows rather than on Unix machines, you can install Docker or VirtualBox (installing a Linux in VirtualBox in that latter case) and run it easily from inside that.\newline
 
+We recommend that you add \texttt{ulimit -S -s unlimited} to your \texttt{.bash\_profile} file and/or \texttt{limit stacksize unlimited} to your \texttt{.cshrc} file to suppress any potential limit to the size of the Unix stack.\newline
+
+
+%------------------------------------------------------------------------------------------------%
+\section{Using the GPU version of the code}
+%------------------------------------------------------------------------------------------------%
+
+\noindent
+SPECFEM2D supports GPU acceleration by CUDA.
+When compiling for GPU cards, you can enable the CUDA version with:
+\begin{verbatim}
+  ./configure --with-cuda ..
+\end{verbatim}
+or
+\begin{verbatim}
+  ./configure --with-cuda=cuda9 ..
+\end{verbatim}
+where for example \texttt{cuda4,cuda5,cuda6,cuda7,..} specifies the target GPU architecture of your card,
+(e.g., with CUDA 9 this refers to Volta V100 cards), rather than the installed version of the CUDA toolkit.
+Before CUDA version 5, one version supported basically one new architecture and needed a different kind of compilation.
+Since version 5, the compilation has stayed the same, but newer versions supported newer architectures.
+However at the moment, we still have one version linked to one specific architecture:
+\begin{verbatim}
+- CUDA 4 for Tesla,   cards like K10, Geforce GTX 650, ..
+- CUDA 5 for Kepler,  like K20
+- CUDA 6 for Kepler,  like K80
+- CUDA 7 for Maxwell, like Quadro K2200
+- CUDA 8 for Pascal,  like P100
+- CUDA 9 for Volta,   like V100
+- CUDA 10 for Turing, like GeForce RTX 2080
+- CUDA 11 for Ampere, like A100
+- CUDA 12 for Hopper, like H100
+\end{verbatim}
+So even if you have the new CUDA toolkit version 11, but you want to run on say a K20 GPU, then you would still configure with:
+\begin{verbatim}
+  ./configure --with-cuda=cuda5
+\end{verbatim}
+The compilation with the cuda5 setting chooses then the right architecture (\texttt{-gencode=arch=compute\_35,code=sm\_35} for K20 cards).\newline
+
+
+%------------------------------------------------------------------------------------------------%
+\section{Adding OpenMP support in addition to MPI}
+%------------------------------------------------------------------------------------------------%
+
+OpenMP support can be enabled in addition to MPI. However, in many
+cases performance will not improve because our pure MPI implementation
+is already heavily optimized and thus the resulting code will in fact
+be slightly slower. A possible exception could be IBM BlueGene-type
+architectures.\newline
+
+\noindent
+To enable OpenMP, add the flag \texttt{-{}-enable-openmp} to the configuration:
+\begin{verbatim}
+./configure --enable-openmp ..
+\end{verbatim}
+This will add the corresponding OpenMP flag for the chosen Fortran compiler.
+Please note that only the elastic domain solver has OpenMP support for now.\newline
+
+
+%------------------------------------------------------------------------------------------------%
 \section{Visualizing the subroutine calling tree of the source code}
+%------------------------------------------------------------------------------------------------%
 
 Packages such as \texttt{doxywizard} can be used to visualize the subroutine calling tree of the source code.
 \texttt{Doxywizard} is a GUI front-end for configuring and running \texttt{doxygen}.
 
-\section{Becoming a developer of the code, or making small modifications in the source code}
+\noindent
+To visualize the call tree (calling tree) of the source code, you can see the Doxygen tool available in directory \texttt{doc/call\_trees\_of\_the\_source\_code}.
 
-If you want to develop new features in the code, and/or if you want to make small changes, improvements, or bug fixes, you are very welcome to contribute.\newline
 
-To do so, i.e. to access the development branch of the source code with read/write access (in a safe way, no need to worry too much about breaking the package, there is a robot called BuildBot that is in charge of checking and validating all new contributions and changes), please visit this Web page:\newline
+%------------------------------------------------------------------------------------------------%
+\section{Becoming a developer of the code, or making small modifications in the source code}
+%------------------------------------------------------------------------------------------------%
+
+If you want to develop new features in the code, and/or if you want to make small changes, improvements, or bug fixes, you are very welcome to contribute!
+To do so, i.e. to access the development branch of the source code with read/write access (in a safe way, no need to worry too much about breaking the package,
+there are CI tests based on BuildBot, Travis-CI and Jenkins in place that are checking and validating all new contributions and changes),
+please visit this Web page:\newline
 \url{https://github.com/SPECFEM/specfem2d/wiki}\newline
 
-\noindent
-To visualize the call tree (calling tree) of the source code, you can see the Doxygen tool available in directory \texttt{doc/call\_trees\_of\_the\_source\_code}.
 
 
@@ -162,7 +162,9 @@ \section*{Notes about \texttt{DATA/Par\_file} parameters}
 \end{figure}
 
 
+%------------------------------------------------------------------------------------------------%
 \section{How to use Gmsh to generate an external mesh}
+%------------------------------------------------------------------------------------------------%
 
 Gmsh%
 \footnote{freely available at the following address : \url{http://gmsh.info/}}
@@ -267,7 +269,10 @@ \section{How to use Gmsh to generate an external mesh}
 In addition, four files like \texttt{free\_surface\_file} corresponding
 to the sides of the model are generated.
 
+
+%------------------------------------------------------------------------------------------------%
 \section{How to use Cubit/Trelis to generate an external mesh}
+%------------------------------------------------------------------------------------------------%
 Trelis (that was known as Cubit)%
 \footnote{available at \url{http://www.csimsoft.com/}}
 is a 2D/3D finite element grid generator distributed by Csimsoft which can be
@@ -309,7 +314,9 @@ \section{How to use Cubit/Trelis to generate an external mesh}
 This tab will allow you to play the scripts one line after another directly in Cubit/Trelis.
 With this you should be able to understand how to create meshes and export them under SPECFEM2D format.
 
+%------------------------------------------------------------------------------------------------%
 \subsection{Note about Cubit/Trelis built-in Python}
+%------------------------------------------------------------------------------------------------%
 Beware, there are some (annoying) differences between cubit built-in Python and the actual Python langage:
 \begin{itemize}
 \item \texttt{"aString" + 'anotherString'} can cause problems even after being stored: \newline
@@ -334,6 +341,7 @@ \subsection{Note about Cubit/Trelis built-in Python}
 \end{itemize}
 And probably many others! Think about that before getting mad.
 
+
 %------------------------------------------------------------------------------------------------%
 \section{Notes about absorbing PMLs}
 %------------------------------------------------------------------------------------------------%
@@ -449,6 +457,7 @@ \section{Notes about absorbing PMLs}
 \end{figure}
 %%
 
+
 %------------------------------------------------------------------------------------------------%
 \section{Controlling the quality of an external mesh}
 %------------------------------------------------------------------------------------------------%
@@ -471,6 +480,7 @@ \section{Controlling the quality of an external mesh}
 %
 This tool is useful to estimate the mesh quality and to see it evolve along the successive corrections.
 
+
 %------------------------------------------------------------------------------------------------%
 \section{Controlling how the mesh samples the wave field}
 %------------------------------------------------------------------------------------------------%
 
@@ -50,6 +50,7 @@ \chapter{Running the Solver xspecfem2D}
 
 \end{itemize}
 
+
 %------------------------------------------------------------------------------------------------%
 \section*{Notes about \texttt{DATA/Par\_file} parameters}
 %------------------------------------------------------------------------------------------------%
@@ -96,7 +97,6 @@ \section*{Notes about \texttt{DATA/Par\_file} parameters}
 \section*{Notes about \texttt{DATA/SOURCE} parameters}
 %------------------------------------------------------------------------------------------------%
 
-
 The \texttt{SOURCE} file located in the \texttt{DATA/} directory should be edited in the following way:
 %
 \begin{description}
@@ -230,7 +230,7 @@ \section{How to run elastic wave simulations}
 
 An optional useful Python script called \texttt{SEM\_save\_dir.py} is provided.
 It allows one to automatically save all the parameters and results of a given simulation.
-%------------------------------------------------------------------------------------------------%
+
 
 %------------------------------------------------------------------------------------------------%
 \section{How to run axisymmetric wave simulations}
@@ -312,7 +312,7 @@ \section{How to run axisymmetric wave simulations}
 This simple code is useful to learn how the spectral-element method works in both plane-strain and axisymmetric geometries.
 Have a look to it if interested. Once in its directory, type \texttt{./make\_Fortran\_2D\_axisymmetric.csh} and then \texttt{./xspecfem2D}
 to compile and run. The bug discussed above is not present in this small code.
-%------------------------------------------------------------------------------------------------%
+
 
 %------------------------------------------------------------------------------------------------%
 \section{How to run anisotropic wave simulations}
@@ -454,6 +454,7 @@ \section{How to run poroelastic wave simulations}
 several sources with different frequencies and if you consider anistropic
 permeability.
 
+
 %------------------------------------------------------------------------------------------------%
 \section{How to run electromagnetic wave simulations}
 %------------------------------------------------------------------------------------------------%
@@ -610,6 +611,7 @@ \section{How to choose the time step}
 % is used to refer this table in the text
 \end{table}
 
+
 %------------------------------------------------------------------------------------------------%
 \section{How to set plane waves as initial conditions}
 %------------------------------------------------------------------------------------------------%
@@ -631,6 +633,7 @@ \section{How to set plane waves as initial conditions}
   \end{itemize}
 \end{description}
 
+
 %------------------------------------------------------------------------------------------------%
 \section{Note on the viscoelastic model used}
 %------------------------------------------------------------------------------------------------%
@@ -644,6 +647,7 @@ \section{Note on the viscoelastic model used}
 and thus the code outputs the band in which the approximation is very good, outside of that range it can be less accurate.
 The logarithmic center of that frequency band is the \texttt{f0} parameter defined (in Hz) in input file \texttt{DATA/SOURCE}.
 
+
 %------------------------------------------------------------------------------------------------%
 \section{Note on viscoelasticity in the 2D plane strain approximation}
 %------------------------------------------------------------------------------------------------%
 
@@ -62,7 +62,10 @@ \section{How to obtain finite sensitivity kernels}
 
 \end{enumerate}
 
+
+%------------------------------------------------------------------------------------------------%
 \section{Remarks about adjoint runs and solving inverse problems}
+%------------------------------------------------------------------------------------------------%
 
 % This from Carl Tape:
 SPECFEM2D can produce the gradient of the misfit function for a
@@ -88,7 +91,9 @@ \section{Remarks about adjoint runs and solving inverse problems}
 ``\texttt{ac}'' denotes acoustic (``\texttt{el}'' for elastic),
 ``\texttt{kl}'' means kernel (and you may find ``\texttt{k}'' as well, which is the interaction at each time step, i.e., before doing time integration).
 
+%------------------------------------------------------------------------------------------------%
 \section{Caution}
+%------------------------------------------------------------------------------------------------%
 
 Please note that:
 %