-
Notifications
You must be signed in to change notification settings - Fork 353
Initial OpenACC port of mpas_atm_update_bdy_tend #1301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Initial OpenACC port of mpas_atm_update_bdy_tend #1301
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes make sense and GPU runs of this PR and a reference before these changes match.
I'll hold off on full approval for any changes Michael requests or for any changes to the commit history that may come.
!$acc end kernels | ||
|
||
!$acc parallel | ||
! Compute lbc_rho_zz |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could either remove this comment, or remove the same comment on line 217. I'd suggest just removing this line.
!$acc loop gang vector collapse(2) | ||
do iEdge=1,nEdges+1 | ||
do k=1,nVertLevels | ||
lbc_tend_rho_edge(k,iEdge) = (rho_edge(k,iEdge) - lbc_tend_rho_edge(k,iEdge)) * dt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was there a specific reason for computing lbc_tend_u
and lbc_tend_ru
in the same loop, but using a separate loop for lbc_tend_rho_edge
?
do iCell=1,nCells+1 | ||
do k=1,nVertLevels | ||
lbc_tend_rho_zz(k,iCell) = (rho_zz(k,iCell) - lbc_tend_rho_zz(k,iCell)) * dt | ||
lbc_tend_rho(k,iCell) = (rho(k,iCell) - lbc_tend_rho(k,iCell)) * dt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the comment above, would there be an advantage, or is there a disadvantage, to combining the computation of lbc_tend_rho_zz
and lbc_tend_rho
in the same loops where we compute lbc_tend_theta
and lbc_tend_rtheta_m
?
!$acc loop vector collapse(2) | ||
do k=1,nVertLevels | ||
do j = 1,nScalars | ||
lbc_tend_scalars(j,k,iCell) = (scalars(j,k,iCell) - lbc_tend_scalars(j,k,iCell)) * dt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's indent the body of this loop.
end do | ||
!$acc end parallel | ||
|
||
!$acc parallel default(present) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this parallel
region be combined with the parallel
region above it?
call mpas_pool_get_array(lbc, 'lbc_rtheta_m', lbc_tend_rtheta_m, 1) | ||
call mpas_pool_get_array(lbc, 'lbc_rho_zz', lbc_tend_rho_zz, 1) | ||
call mpas_pool_get_array(lbc, 'lbc_rho', lbc_tend_rho, 1) | ||
call mpas_pool_get_array(lbc, 'lbc_scalars', lbc_tend_scalars, 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we now get pointers to arrays here, I think it would be fine to include in this PR the removal of redundant calls to mpas_pool_get_array
below (beginning around line 270).
@@ -167,6 +170,7 @@ subroutine mpas_atm_update_bdy_tend(clock, streamManager, block, firstCall, ierr | |||
! Compute any derived fields from those that were read from the lbc_in stream | |||
! | |||
call mpas_pool_get_array(lbc, 'lbc_u', u, 2) | |||
call mpas_pool_get_array(lbc, 'lbc_w', w, 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a minor detail, but can we move this below the call to mpas_pool_get_array
for lbc_rho_edge
so that the order of variables is consistent in this block and in the block beginning at line 192?
Thanks for the review! I'll address the comments, but just wanted to note that this PR may not be a priority for merging to |
Initial OpenACC port of
mpas_atm_update_bdy_tend
.This port is required to keep
state
andtend
variables from LBCs eventually resident on GPUs.