-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Step Metadata Update on Index Rollover Timeout #1174
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -91,6 +91,7 @@ | |
import org.opensearch.indexmanagement.spi.indexstatemanagement.model.PolicyRetryInfoMetaData | ||
import org.opensearch.indexmanagement.spi.indexstatemanagement.model.StateMetaData | ||
import org.opensearch.indexmanagement.spi.indexstatemanagement.model.StepContext | ||
import org.opensearch.indexmanagement.spi.indexstatemanagement.model.StepMetaData | ||
import org.opensearch.jobscheduler.spi.JobExecutionContext | ||
import org.opensearch.jobscheduler.spi.LockModel | ||
import org.opensearch.jobscheduler.spi.ScheduledJobParameter | ||
|
@@ -330,14 +331,18 @@ | |
if (action?.hasTimedOut(currentActionMetaData) == true) { | ||
val info = mapOf("message" to "Action timed out") | ||
logger.error("Action=${action.type} has timed out") | ||
val updated = | ||
updateManagedIndexMetaData( | ||
managedIndexMetaData | ||
.copy(actionMetaData = currentActionMetaData?.copy(failed = true), info = info), | ||
) | ||
|
||
val updatedMetaData = managedIndexMetaData.copy( | ||
Check warning on line 335 in src/main/kotlin/org/opensearch/indexmanagement/indexstatemanagement/ManagedIndexRunner.kt
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: we can rename this to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, I've made the change and updated the PR. |
||
actionMetaData = currentActionMetaData?.copy(failed = true), | ||
stepMetaData = step?.let { StepMetaData(it.name, System.currentTimeMillis(), Step.StepStatus.TIMED_OUT) }, | ||
info = info, | ||
Check warning on line 338 in src/main/kotlin/org/opensearch/indexmanagement/indexstatemanagement/ManagedIndexRunner.kt
|
||
) | ||
|
||
val updated = updateManagedIndexMetaData(updatedMetaData) | ||
Check warning on line 341 in src/main/kotlin/org/opensearch/indexmanagement/indexstatemanagement/ManagedIndexRunner.kt
|
||
|
||
if (updated.metadataSaved) { | ||
disableManagedIndexConfig(managedIndexConfig) | ||
publishErrorNotification(policy, managedIndexMetaData) | ||
publishErrorNotification(policy, updatedMetaData) | ||
Check warning on line 345 in src/main/kotlin/org/opensearch/indexmanagement/indexstatemanagement/ManagedIndexRunner.kt
|
||
} | ||
return | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a new field could cause problem when cluster having old and new version of code during upgrade. Maybe just use the "failed" state is good enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @bowenlan-amzn for taking a look. Not sure if I fully understand this. What could be that scenario where adding a new step state can fail. I'm asking in case we need to add a new step state in the future.
PS: I'm okay with failed as well for this case. And in the failure message, we would have timed out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's my thinking: this new field of Step status is saved in the metadata document of ISM system index. When user do a explain API using the old node with old software, it cannot understand this new field value so probably fail.
So the impact is not big, only during the upgrading when cluster is mixed with old and new software.
I just found this potential problem is already safened, new software won't be used if old software exists in cluster
index-management/src/main/kotlin/org/opensearch/indexmanagement/indexstatemanagement/ManagedIndexRunner.kt
Lines 229 to 232 in 4d8ef69
So no problem now 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is awesome @bowenlan-amzn
thanks for checking