Skip to content

Initial OpenACC port of mpas_atm_update_bdy_tend #1301

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from

Conversation

abishekg7
Copy link
Collaborator

Initial OpenACC port of mpas_atm_update_bdy_tend.

This port is required to keep state and tend variables from LBCs eventually resident on GPUs.

@mgduda mgduda requested review from mgduda and gdicker1 May 6, 2025 23:43
@mgduda mgduda added Atmosphere OpenACC Work related to OpenACC acceleration of code labels May 6, 2025
Copy link
Collaborator

@gdicker1 gdicker1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes make sense and GPU runs of this PR and a reference before these changes match.

I'll hold off on full approval for any changes Michael requests or for any changes to the commit history that may come.

!$acc end kernels

!$acc parallel
! Compute lbc_rho_zz
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could either remove this comment, or remove the same comment on line 217. I'd suggest just removing this line.

!$acc loop gang vector collapse(2)
do iEdge=1,nEdges+1
do k=1,nVertLevels
lbc_tend_rho_edge(k,iEdge) = (rho_edge(k,iEdge) - lbc_tend_rho_edge(k,iEdge)) * dt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was there a specific reason for computing lbc_tend_u and lbc_tend_ru in the same loop, but using a separate loop for lbc_tend_rho_edge?

do iCell=1,nCells+1
do k=1,nVertLevels
lbc_tend_rho_zz(k,iCell) = (rho_zz(k,iCell) - lbc_tend_rho_zz(k,iCell)) * dt
lbc_tend_rho(k,iCell) = (rho(k,iCell) - lbc_tend_rho(k,iCell)) * dt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the comment above, would there be an advantage, or is there a disadvantage, to combining the computation of lbc_tend_rho_zz and lbc_tend_rho in the same loops where we compute lbc_tend_theta and lbc_tend_rtheta_m?

!$acc loop vector collapse(2)
do k=1,nVertLevels
do j = 1,nScalars
lbc_tend_scalars(j,k,iCell) = (scalars(j,k,iCell) - lbc_tend_scalars(j,k,iCell)) * dt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's indent the body of this loop.

end do
!$acc end parallel

!$acc parallel default(present)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this parallel region be combined with the parallel region above it?

call mpas_pool_get_array(lbc, 'lbc_rtheta_m', lbc_tend_rtheta_m, 1)
call mpas_pool_get_array(lbc, 'lbc_rho_zz', lbc_tend_rho_zz, 1)
call mpas_pool_get_array(lbc, 'lbc_rho', lbc_tend_rho, 1)
call mpas_pool_get_array(lbc, 'lbc_scalars', lbc_tend_scalars, 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we now get pointers to arrays here, I think it would be fine to include in this PR the removal of redundant calls to mpas_pool_get_array below (beginning around line 270).

@@ -167,6 +170,7 @@ subroutine mpas_atm_update_bdy_tend(clock, streamManager, block, firstCall, ierr
! Compute any derived fields from those that were read from the lbc_in stream
!
call mpas_pool_get_array(lbc, 'lbc_u', u, 2)
call mpas_pool_get_array(lbc, 'lbc_w', w, 2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a minor detail, but can we move this below the call to mpas_pool_get_array for lbc_rho_edge so that the order of variables is consistent in this block and in the block beginning at line 192?

@abishekg7
Copy link
Collaborator Author

Thanks for the review! I'll address the comments, but just wanted to note that this PR may not be a priority for merging to develop with host-device memory transfers consolidated around every call to dynamics. But it will be needed eventually.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Atmosphere OpenACC Work related to OpenACC acceleration of code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants