-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Persist emitTime from IngestionTrackingContext to the new entity tables and use emitTime during backfill #329
Conversation
…e emitTime field when during backfill logic if available
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall lgtm, small comments
* | ||
* This will be null in the following scenarios: | ||
* - The record is still on the old schema | ||
* - The record inserted before we started persisting emitTime to the new schema |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: was?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"was still on the old schema"?
or "record was inserted"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the latter makes more sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Latter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
* | ||
* This will be null in the following scenarios: | ||
* - The record is still on the old schema | ||
* - The record inserted before we started persisting emitTime to the new schema |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so I guess we won't backfill these values and just fall back on the naive approach if this val doesn't exist?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah that's the plan since most rows will be missing emitTime
anyway. Based on the last discussion, we'll use lastmodifiedtime
in place of emitTime
for comparison.
@Nonnull Class<ASPECT> aspectClass, | ||
@Nonnull AuditStamp auditStamp, | ||
@Nullable Timestamp oldTimestamp, | ||
@Nullable IngestionTrackingContext ingestionTrackingContext) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
messageId was never used before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nope. Intellij didn't have much trouble removing that param.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it mean we'll have a new messageId during the backfill? or the backfill's messageId is always null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the param was before I added the backfill feature. It was an unused param.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
talked offline with @yangyangv2 . For the method add
, I will switch param messageId
with the ingestionTrackingContext
@jsdonn should we also reset |
@derekpham oh yeah that's a good idea. Even though it should be default to false (right?), definitely wouldn't hurt to add it explicitly there during regular MAE emission |
@jsdonn It's default is false but for backfill MCE we should explicitly set it back to |
@@ -555,6 +559,10 @@ private <ASPECT extends RecordTemplate> ASPECT_UNION unwrapAddResultToUnion(URN | |||
|
|||
private <ASPECT extends RecordTemplate> ASPECT unwrapAddResult(URN urn, AddResult<ASPECT> result, @Nonnull AuditStamp auditStamp, | |||
@Nullable IngestionTrackingContext trackingContext) { | |||
if (trackingContext != null) { | |||
trackingContext.setBackfill(false); // reset backfill since MAE won't be a backfill event |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in what scenario trackingContext will be null and we want to enforce it's setBackfill to be false?
Can we log such as enforcement activity to capture unexpected cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's null when someone calls ingest
instead of ingestWithTracking
. I can't think of a case where we don't want to set backfill
to false.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^^ this isBackfill field in the ingestionTrackingContext is ONLY used for MCE backfills right? I believe Jinxin added a separate field for MAE backfill. If that's the case then yeah we always want to set it to false during MAE emission. But I hope that this is clear enough for ourselves and future devs...might be a little confusing to have 2 backfill related fields in an MAE.
@@ -37,6 +37,11 @@ record AuditedAspect { | |||
* This value is different from lastmodifiedon / the timestamp in AuditStamp since auditStamp | |||
* is created when the restli resource receives the ingestion request. | |||
* This is set by the MCE producers (or MCE consumers if not set by producers) | |||
* | |||
* This will be null in the following scenarios: | |||
* - The record is still on the old schema |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: do you mean "The record is from the old schema"? to how understand "still on"?
@Nonnull Class<ASPECT> aspectClass, | ||
@Nonnull AuditStamp auditStamp, | ||
@Nullable Timestamp oldTimestamp, | ||
@Nullable IngestionTrackingContext ingestionTrackingContext) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it mean we'll have a new messageId during the backfill? or the backfill's messageId is always null?
@@ -151,6 +156,9 @@ public <ASPECT extends RecordTemplate> int addWithOptimisticLocking(@Nonnull URN | |||
.setLastmodifiedby(actor) | |||
.setLastmodifiedon(new Timestamp(timestamp).toString()) | |||
.setCreatedfor(impersonator, SetMode.IGNORE_NULL); | |||
if (ingestionTrackingContext != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be safer and more clear to still create IngestionTrackingContext in the input and here we check
} ```
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think your comment is missing some code?
&& trackingContext.getEmitTime() > oldAuditStamp.getTime()); | ||
|
||
log.info("Encounter backfill event. Tracking context: {}. Urn: {}. Aspect class: {}. Old audit stamp: {}. " | ||
// ingestionTrackingContext if not null should always have emitTime. If emitTime doesn't exist within |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the expectation in IngestionTrackingContext? the model definition is optional for emitTime.
Can we make the logic/expectation to be consistent with the model definition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated the comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall. One comment on having 2 backfill-related fields in the MAE but I guess we're kinda past the modeling phase.
Please get @yangyangv2 's approval for ship-it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
last minor comment. The rest LGTM. Thanks for addressing all the comments!
@@ -93,8 +92,8 @@ public void ensureSchemaUpToDate() { | |||
@Override | |||
@Transactional | |||
public <ASPECT extends RecordTemplate> int add(@Nonnull URN urn, @Nullable ASPECT newValue, @Nonnull Class<ASPECT> aspectClass, | |||
@Nonnull AuditStamp auditStamp, @Nullable UUID messageId) { | |||
return addWithOptimisticLocking(urn, newValue, aspectClass, auditStamp, null, null); | |||
@Nonnull AuditStamp auditStamp, IngestionTrackingContext ingestionTrackingContext) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please put a @nullable tag on ingestionTrackingContext.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
Context
This is continuation of #328
This PR makes 2 changes:
emitTime
to the new entity table if IngestionTrackingContext is providedemitTime
when checking whether we should backfill the entityIf
oldEmitTime
is available and smaller than new event's emit time -> backfillIf
oldEmitTime
is not available and last modified time smaller than new event's emit time -> backfillChecklist