Skip to content

Conversation



@for_each_kernel
def _single_grid_work_group_transform(kernel, cl_device):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@inducer said in a personal meeting that this logic should also handle reductions kernels.

Copy link
Owner

@inducer inducer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! A few quick drive-by thoughts below.

Comment on lines +338 to +368
kernel = lp.split_iname(kernel, iname,
ngroups * l_zero_size * l_one_size)
kernel = lp.split_iname(kernel, f"{iname}_inner",
l_zero_size, inner_tag="l.0")
kernel = lp.split_iname(kernel, f"{iname}_inner_outer",
l_one_size, inner_tag="l.1",
outer_tag="g.0")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this assume that each iname is only used by exactly one statement?

if len(insn.within_inames) == 0:
continue

if len(insn.within_inames) == 1:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Which type of kernels are those?
  • I would like it if this were more guided by metadata, along the lines of what the pyopencl actx does.

Copy link
Collaborator Author

@kaushikcfd kaushikcfd Mar 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could borrow the unifier from FusionContractorArrayContext, but that would add a dependency on #284 and inducer/pytato#224. If those go in before this PR, I will put that here.

t_unit = _single_grid_work_group_transform(t_unit, self.queue.device)
t_unit = lp.set_options(t_unit, "insert_gbarriers")
t_unit = lp.linearize(lp.preprocess_kernel(t_unit))
t_unit = _alias_global_temporaries(t_unit)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My recollection of our discussion was that we'd do this without aliasing... am I remembering wrong? If not, what made you change your mind?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we wouldn't alias at the pytato's CodeGenMapper stage, but post-linearization grabbing hold of dead-temporaries would be trivial. I.e. I had thought we would alias the global temporaries not at the pytato-level but downstream as loopy transformation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matthiasdiener
Copy link
Collaborator

When trying this on the mirgecom examples, I'm getting this type of error:

  File "/Users/mdiener/Work/emirge/grudge/grudge/discretization.py", line 623, in nodes
    return self.discr_from_dd(dd).nodes()
  File "/Users/mdiener/Work/emirge/meshmode/meshmode/discretization/__init__.py", line 610, in nodes
    result = make_obj_array([
  File "/Users/mdiener/Work/emirge/meshmode/meshmode/discretization/__init__.py", line 611, in <listcomp>
    _DOFArray(None, tuple([
  File "/Users/mdiener/Work/emirge/meshmode/meshmode/discretization/__init__.py", line 612, in <listcomp>
    actx.freeze(resample_mesh_nodes(grp, iaxis)) for grp in self.groups
  File "/Users/mdiener/Work/emirge/arraycontext/arraycontext/impl/pytato/__init__.py", line 133, in freeze
    pt_prg = pt_prg.with_transformed_program(self.transform_loopy_program)
  File "/Users/mdiener/Work/emirge/pytato/pytato/target/loopy/__init__.py", line 124, in with_transformed_program
    return self.copy(program=f(self.program))
  File "/Users/mdiener/Work/emirge/meshmode/meshmode/array_context.py", line 482, in transform_loopy_program
    t_unit = lp.linearize(lp.preprocess_kernel(t_unit))
  File "/Users/mdiener/Work/emirge/loopy/loopy/schedule/__init__.py", line 2201, in linearize
    knl = get_one_linearized_kernel(knl,
  File "/Users/mdiener/Work/emirge/loopy/loopy/schedule/__init__.py", line 2178, in get_one_linearized_kernel
    result = _get_one_scheduled_kernel_inner(kernel,
  File "/Users/mdiener/Work/emirge/loopy/loopy/schedule/__init__.py", line 2146, in _get_one_scheduled_kernel_inner
    return next(iter(generate_loop_schedules(kernel, callables_table)))
  File "/Users/mdiener/Work/emirge/loopy/loopy/schedule/__init__.py", line 1953, in generate_loop_schedules
    yield from generate_loop_schedules_inner(kernel,
  File "/Users/mdiener/Work/emirge/loopy/loopy/schedule/__init__.py", line 2084, in generate_loop_schedules_inner
    gen_sched = insert_barriers(kernel, gen_sched,
  File "/Users/mdiener/Work/emirge/loopy/loopy/schedule/__init__.py", line 1919, in insert_barriers
    result = insert_barriers_at_outer_level(result)
  File "/Users/mdiener/Work/emirge/loopy/loopy/schedule/__init__.py", line 1838, in insert_barriers_at_outer_level
    append_barrier_or_raise_error(
  File "/Users/mdiener/Work/emirge/loopy/loopy/schedule/__init__.py", line 1760, in append_barrier_or_raise_error
    raise MissingBarrierError(
loopy.diagnostic.MissingBarrierError: _pt_kernel: Dependency '_pt_out_store depends on call_lp_nodes' (for variable '_pt_temp') requires synchronization by a global barrier (add a 'no_sync_with' instruction option to state that no synchronization is needed)

@kaushikcfd
Copy link
Collaborator Author

@matthiasdiener: Sorry, the transformations in this branch depend on some PRs on the pytato/loopy/arraycontext end. I've edited the description to record that.

@kaushikcfd kaushikcfd force-pushed the pytato-array-context-transforms branch from 32a7d59 to bd964da Compare August 6, 2021 22:44
@kaushikcfd kaushikcfd force-pushed the pytato-array-context-transforms branch from c166b2f to 0c55627 Compare August 17, 2021 20:10
@kaushikcfd kaushikcfd force-pushed the pytato-array-context-transforms branch 2 times, most recently from 9fb8370 to 8c65687 Compare September 4, 2021 16:10
@thomasgibson thomasgibson changed the title Pytato Array Context with transformations [Lazy evaluation] Pytato Array Context with transformations Oct 18, 2021
@thomasgibson thomasgibson added the lazy evaluation Anything related to lazy evaluation label Oct 19, 2021
@kaushikcfd kaushikcfd force-pushed the pytato-array-context-transforms branch from 824caee to 405bb8b Compare January 3, 2022 18:42
@kaushikcfd kaushikcfd force-pushed the pytato-array-context-transforms branch 4 times, most recently from 32b0a78 to 0dcc37c Compare March 12, 2022 17:19
@kaushikcfd
Copy link
Collaborator Author

Closing this as inducer/arraycontext#216 provides a cleaner implementation.

@kaushikcfd kaushikcfd closed this Jan 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lazy evaluation Anything related to lazy evaluation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants