forked from daverio/LATfield2
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathexamples_IOserver.dox
88 lines (47 loc) · 5.86 KB
/
examples_IOserver.dox
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
/*! \example IOserver
This is very simple example to show the usage of the I/O server. It writes two very simple files in ASCII, in which each compute core writes a single line. Such usage are clearly unrealistic, and speed gain are only found at much larger files.
The server works as follow. It needs to be started only on the I/O cores. Then the compute core will process their usual computation. Once some I/O has to be performed, the compute cores have to open an "ostream" to the I/O cores, then files can be created. Only one file can be open simultaneously currently, and there is no method to reopen a file which have been closed. This feature will be added, in the next update of the server. Once all data have been transferred to the I/O cores, one can close the "ostream". This operation will launch the writing to disks procedure and the server will be in a busy state until all data has been written to disks.
\include IOserver.cpp
\section Compile_IOserver Compile and Run
Go to the LATfield2/examples folder, and compile this example with e.g. mpic++:
\verbatim
mpic++ -o ioserver_exec IOserver.cpp -I../ -DEXTERNAL_IO
\endverbatim
It can be executed using (here using "mpirun -np 4" to run with 4 process):
\verbatim
mpirun -np 24 ./ioserver_exec -n 4 -m 4 -i 8 -g 4
\endverbatim
The n and m parameter are as usual parameters to initialize the parallel object. Then 2 additional parameters are passed. First -i which is the total number of MPI processes of the IO server, then -g is the number of IO process which write data in a single file. Therefor n*m+i need to be equal to the total number of MPI processes used by the job and i/g must be an integer.
\section code_IOserver Going through the code
\dontinclude IOserver.cpp
\subsection code_parallel_IOserver Parallel initialization and server launch
The initialization of the parallel object is performed as without the IO server, the only difference is that two addition parameter are passed to the parallel object. The number of process of the IO server, and the size of one group of the server, which is the number of process which write in the same disk. It is advised to set the group size to be an integer multiple of the number of core on a node.
\until parallel
Once the parallel object is initialized, the parallel object contains a list of compute and IO cores. The method isIO() return true for IO process and false for compute ones. Basicaly, a IO process has to perform only one operation, launching the server. This is perform using the start() method.
\until ;
\subsection code_compute_IOServer Compute processes
Then the part of the code executed by the compute process need to be in the else of the previous if, or in a block within
\verbatim
if(!parallel.isIO()){…}.
\endverbatim
In this simple example, each process will write a simple sentence which contains its position in the process grid.
\line else
\until {
\until \n
In real applications the construction of the data which need to be send should be done only when the file is open, and sent to the server by several message (order 5 is the most efficient). This allow a much better usage of the IOserver.
The first step to start a transfer from to the IOserver is to open an ostream to the server on the compute processes. The server has no method to wait until ready. Ostream can be open only if the server is in the ready state. The server can be in busy state for two reasons. First, it has not finishin launching, and secondly the server is currently writing a file.
\line openOstream
Once a stream is open one can open a file. Currently it is possible to open 5 file within one ostream, but not simultaneously (the next version of the server will be able to deal with multiple file simultaneously). Currently, there is no method to open an existing file. The only method which open a file is createFile, which will create a file or trunk it if existing.
\line create
Once a file is open, data can be transferred to the IOserver using the write method. In this method the size of the sendBuffer is given in bytes, and the buffer is expected to be a pointer to a char array. (so if not a char array, typecast it: (char*))
\line write
When all data have been send to the file, one can close the file. Which is very important, as when a file is open, the server will continuously look for data send by the compute process to this file.
\line close
Once every data has been sent to the file where it should be written, the ostream must be closed, which launches the transfer from the server to the disks. At that moment the server will turn its state to busy. Once the data are written on the disk a new ostream can be open.
\line Ostream
One need to be aware that to correctly terminate MPI, the server has to shut down. Otherwise the process of the IOserver will never stop to listen for an instruction to open a ostream.
The call to stop the server is performed by the compute process:
\line stop
\section issues_IOserver Known Issues
There is one known issue which arises from the desire to maximise speed. The server can start to write a file before it has finished receiving all the messages from the compute nodes. However, it cannot start the writing procedure before all message have been sent by the compute nodes. It means that some messages can still be in the network memory during the writing procedure. This can end with some data which should be written in the file of snapshot n instead being written into the file of snapshot n+1. We regard this as an acceptable risk for the rewards of increased efficiency of the server. It can be a problem for last snapshot, as there is not a subsequent write procedure. To avoid losing some data, one should open a file and call a write just before shutting down the server, without sending any message from the compute node. This issue is only present when the compute node executable is not very well balanced, and therefore should be very rare.
*/