You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It would be awesome for the following code to work (written in 3.12 syntax, but would be great in <=3.11 syntax too)
T = TypeVar("T")
@dataclass.dataclass()
class NeatOutputClass[T]:
data: T
class ReusableTransform[T](beam.PTransform):
def __init__(self, value_extractor: Callable[[Any], T):
# This is a bit contrived, but not unreasonable for a transform acting
# on slightly different data. Could be done as a generic DoFn too.
self.value_extractor = value_extractor
def expand(self, pcoll: beam.PCollection) -> NeatOutputClass[T]:
# Overly simplified for simple reproducibility.
return pcoll | beam.Map(lambda x: self.value_extractor(x))
But unfortunately this yields TypeError: Subscripted generics cannot be used with class and instance checks
I believe this runs into a similar problem as #33189, the output of NeatOutputClass[T] doesn't play nice with our type hinting infrastructure. I'm not sure I've ever seen someone try to do a fully generic PTransform definition before, you may be able to get away with a similar solution as outlined in that issue - #33189 (comment)
The .with_output_types() call to parameterize NeatOutputClass in some way might work, although I'm not 100% sure on if it would be happy.
From my experimentation, it seems like the workaround in #33189 requires the DoFn or PTransform to only operate on a concrete instance of a generic type, but doesn't work for a fully generic DoFn or PTransform (e.g. if a DoFn needs to accept a fully generic type).
It would be cool to support fully generic PTransforms. FWIW, here's my use case:
I'm building out a pipeline that processes data from various historical tables and basically groups historical values by key and computes intervals of values across time for each key.
The logic there to handle the intervalization is complex enough that I don't want to repeat it, so I have my PTransform that computes the intervals accept Callables to go from each history row to a key+value pair. Then the output is:
tuple[K, list[Interval[V]]]
Without support for generics, I need to throw a ton of Anys in here and slowly but surely chip away at the value of the typechecking. With generics I can say:
K = TypeVar("K")
V = TypeVar("V")
R = TypeVar("R")
@dataclasses.dataclass
class Interval[V]:
start: datetime.datetime
end: datetime.datetime
value: V | None
class Intervalizer[K, V, R](beam.PTransform):
def __init__(self, key_fn: Callable[[R], K], value_fn: Callable[[R], V]):
self._key_fn = key_fn
self._value_fn = value_fn
def process(self, pcoll) -> list[tuple[K, Interval[V]]]:
# Fancy implementation.
return pcoll
class FooIntervalizer(beam.PTransform):
def process(self, history_rows_pcoll) -> list[tuple[int, Interval[str]]]:
return history_rows_pcoll | Intervalizer[int, str](key_fn=lambda x: x.id, value_fn=lambda x: x.val)
What would you like to happen?
It would be awesome for the following code to work (written in 3.12 syntax, but would be great in <=3.11 syntax too)
But unfortunately this yields
TypeError: Subscripted generics cannot be used with class and instance checks
Full Stack Trace
Issue Priority
Priority: 2 (default / most feature requests should be filed as P2)
Issue Components
The text was updated successfully, but these errors were encountered: