- 
                Notifications
    You must be signed in to change notification settings 
- Fork 47
MPI Split communicator. #318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
MPI Split communicator. #318
Conversation
* hotfix/1.6.2: For CI lumi-g cce, set CMAKE_PARALLEL_BUILD_LEVEL=1 (already so in develop) More resources for CI lumi-g cce etrans benchmark: specify LINKER_LANGUAGE Fortran for static linking transi: link against etrans if enabled etrans: fix library target names in CPU build Version bump to 1.6.2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(From discussion with @MarekWlasak)
| ! which can be obtained at http://www.apache.org/licenses/LICENSE-2.0. | ||
| ! | ||
|  | ||
| program test_example | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| program test_example | |
| program test_split_mpi_comm | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also the name of the file then.
|  | ||
| integer(kind=JPIM) :: num_spectral_elements, num_grid_points | ||
| integer(kind=JPIM) :: g_num_spectral_elements, g_num_grid_points ! global | ||
| integer(kind=JPIM) :: mode_index | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| integer(kind=JPIM) :: mode_index | |
| integer(kind=JPIM) :: local_spectral_coefficient_index | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should use type prefix? E.g K. Standards
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps not strictly needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Transi changes in sep PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now I like it here.
| real(kind=JPRM), allocatable :: spectral_field(:,:) | ||
| real(kind=JPRM), allocatable :: grid_point_field(:,:,:) | ||
|  | ||
| real(kind=JPRM), allocatable :: g_grid_point_field(:) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment on dimension difference? Local testing variable - grid point field to output to file.
| call MPI_Comm_rank(MPI_COMM_WORLD, rank, ierror) | ||
|  | ||
| split_colour = get_split_group() | ||
| split_key = rank | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| split_key = rank | |
| split_key = rank ! Global rank | 
| ! Get function space sizes | ||
| call trans_inq(KSPEC2=num_spectral_elements, KGPTOT=num_grid_points) | ||
| call trans_inq(KSPEC2G=g_num_spectral_elements, KGPTOTG=g_num_grid_points) | ||
| print*,"Num spec = ", num_spectral_elements, "| Num grid points = ", num_grid_points, g_num_grid_points | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| print*,"Num spec = ", num_spectral_elements, "| Num grid points = ", num_grid_points, g_num_grid_points | |
| print*,"R", world_rank, "C", split_rank, "Num spec = ", num_spectral_elements, "| Num grid points = ", num_grid_points, g_num_grid_points | 
| ! Allocate a global field. | ||
| allocate(g_grid_point_field(g_num_grid_points)) | ||
|  | ||
| ! Make displacement arrays | ||
| allocate(displs(num_ranks)) | ||
| displs = 0 | ||
| do i=2, num_ranks | ||
| displs(i) = displs(i - 1) + grid_partition_sizes(i - 1) | ||
| end do | ||
| print*,"displs => ", displs(:) | ||
|  | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ! Allocate a global field. | |
| allocate(g_grid_point_field(g_num_grid_points)) | |
| ! Make displacement arrays | |
| allocate(displs(num_ranks)) | |
| displs = 0 | |
| do i=2, num_ranks | |
| displs(i) = displs(i - 1) + grid_partition_sizes(i - 1) | |
| end do | |
| print*,"displs => ", displs(:) | |
| if (rank == 1) then | |
| ! Allocate a global field. | |
| allocate(g_grid_point_field(g_num_grid_points)) | |
| ! Make displacement arrays | |
| allocate(displs(num_ranks)) | |
| displs = 0 | |
| do i=2, num_ranks | |
| displs(i) = displs(i - 1) + grid_partition_sizes(i - 1) | |
| end do | |
| print*,"displs => ", displs(:) | |
| end if | 
| #include "dir_trans.h" | ||
| #include "trans_inq.h" | ||
|  | ||
| integer(kind=JPIM), parameter, dimension(2) :: truncations = [79, 188] | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check if memory usage increases with same resolution on both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove (?) once understood.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly it can be merged with the split_comm case later, where you would also have more than one handle available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename from example
| Looks like a good start Josh, thanks. I should have mentioned before, but all PRs must be from branches started from develop (as well as targeting develop). It looks like you've started from main, hence the apparent conflicts with CMakeLists.txt. I'd recommend rebasing all of your commits on top of develop before continuing to save yourself some headaches later. | 
| Thanks @samhatfield - we are working off of the tagged release we are using currently, to avoid any potential complications with using develop in our codebase... (Happy to deal with the headaches at a later date... :) ) Just while we get things working. I don't intend for this to get merged, just for sharing at the moment. We are pleased so far that the changes needed seem to be quite minimal! | 
| 
 That's fair enough! Okay then, let's discuss what you've done so far in our next call. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice progress on building up some test cases.
You can keep the transi developments here for now. I find it useful as a discussion point. We can always split the PR in pieces later.
| */ | ||
| int trans_init(void); | ||
|  | ||
| int trans_set_mpi_comm(const MPI_Fint mpi_user_comm); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please change this to use a standard int as to not #include <mpi.h>. We don't want to expose MPI in transi; especially as ectrans can be completely compiled without MPI.
| int trans_set_mpi_comm(const MPI_Fint mpi_user_comm); | |
| int trans_set_mpi_comm(int); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no need to link transi with MPI; changes in this file can be reverted.
| call MPI_Gather(grid_partition_size_local, 1, MPI_INT, & | ||
| grid_partition_sizes, 1, MPI_INT, & | ||
| 0, MPI_COMM_WORLD, ierror) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be possible to turn this into an MPL_GATHERV call, and I don't think you need
use mpl_data_module ! probably you can add what is needed to the ONLY part of MPL_MODULE
use mpl_mpif
| call MPI_Gatherv(grid_point_field(:,1,1), num_grid_points, MPI_FLOAT, & | ||
| g_grid_point_field, grid_partition_sizes, displs, MPI_FLOAT, & | ||
| 0, MPI_COMM_WORLD, ierror) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also here replace with MPL_GATHERV should be used.
| LIBS ectrans_test | ||
| MPI 1 | ||
| LINKER_LANGUAGE C | ||
| ENVIRONMENT TRANS_USE_MPI=1 ) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because MPI is used within this file, we need add MPI::MPI_C to the LIBS argument here.
I would only add the searching for MPI C library here:
find_package( MPI COMPOMENTS C )
if (MPI_C_FOUND)
  ecbuild_add_test( TARGET ectrans_test_transi_example 
      ...
      LIBS ectrans_test MPI::MPI_C
      ...
  )
endif()| ! which can be obtained at http://www.apache.org/licenses/LICENSE-2.0. | ||
| ! | ||
|  | ||
| program test_example | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also the name of the file then.
| ecbuild_add_option( FEATURE MPI | ||
| DESCRIPTION "Support for MPI distributed memory parallelism" | ||
| REQUIRED_PACKAGES "MPI COMPONENTS Fortran CXX" | ||
| REQUIRED_PACKAGES "MPI COMPONENTS Fortran C CXX" | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be reverted with find_package( MPI COMPONENTS C ) further down for the tests only. (see another comment)
No description provided.