-
Notifications
You must be signed in to change notification settings - Fork 27
Add button to set failed job to pending #8377
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
📝 WalkthroughWalkthroughThis pull request introduces a new feature that allows super-users to retry failed jobs. The update spans across backend, frontend, and database layers. A new method is added in the job controller along with its corresponding route, and the frontend UI is updated to show a retry option for failed jobs. Furthermore, the job model and DAO have been enhanced with a new boolean field to track retries, and the job service logic is modified to conditionally suppress email notifications. Database schema and migration files have been updated to support these changes. Changes
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
CHANGELOG.unreleased.md (1)
19-19
: New Feature: Resume Failed Jobs for Super-UsersThe changelog correctly reflects the new feature allowing super-users to resume failed jobs. Please verify that the backend and frontend behavior (i.e., whether the job state transitions to "PENDING" or "RUNNING") is clearly defined in the documentation and aligns with the intended functionality as discussed in PR [#8377].
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
CHANGELOG.unreleased.md
(1 hunks)app/controllers/JobController.scala
(3 hunks)conf/webknossos.latest.routes
(1 hunks)frontend/javascripts/admin/api/jobs.ts
(1 hunks)frontend/javascripts/admin/job/job_list_view.tsx
(4 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: circleci_build
🔇 Additional comments (8)
frontend/javascripts/admin/api/jobs.ts (1)
77-81
: LGTM! The implementation follows the established pattern.The
resumeJob
function is well-implemented, following the same pattern as other job-related functions in the file, making it consistent and maintainable.frontend/javascripts/admin/job/job_list_view.tsx (3)
10-10
: LGTM! Icon import follows the established pattern.The PlayCircleOutlined icon import is correctly added, maintaining consistency with other icon imports.
139-139
: LGTM! Super user check is properly implemented.The Redux selector is correctly used to determine if the current user is a super user, ensuring proper access control.
305-316
: LGTM! Resume action follows the same pattern as cancel action.The resume functionality is well-implemented:
- Correctly checks for failed job state and super user status
- Uses AsyncLink for handling the action
- Follows the same pattern as the cancel action
- Updates the job list after resuming
app/controllers/JobController.scala (2)
53-67
: LGTM! UserService is properly injected.The UserService is correctly added to the constructor using dependency injection, following the established pattern.
115-123
: LGTM! Resume method is secure and follows established patterns.The implementation:
- Properly checks if jobs feature is enabled
- Verifies super user permissions with proper error handling
- Updates job state using existing DAO methods
- Returns consistent response format
conf/webknossos.latest.routes (1)
271-271
: LGTM! Route follows RESTful conventions.The new route is well-defined:
- Uses appropriate HTTP method (PATCH) for state transition
- Follows the same pattern as other job routes
- Is correctly placed in the jobs section
CHANGELOG.unreleased.md (1)
18-18
: Accessibility Update: Measurement Tools EnhancementThis changelog entry now notes that measurement tools are accessible when viewing datasets outside an annotation. Ensure that this entry remains consistent with the UI changes and accessibility guidelines applied in [#8334].
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hej,
thanks for implementing this feature 🎉. The code looked fine to me except one potential minor improvement.
But could you please check the bug I found during testing? A restarted job is not actually restarted by the worker :/
onClick={async () => { | ||
resumeJob(job.id).then(() => fetchData()); | ||
}} | ||
icon={<PlayCircleOutlined key="resume" className="icon-margin-right" />} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The key
prop here shouldn't be needed (just as above). Could you please remove them before merging and give this change a quick check?
app/controllers/JobController.scala
Outdated
_ <- bool2Fox(wkconf.Features.jobsEnabled) ?~> "job.disabled" | ||
_ <- userService.assertIsSuperUser(request.identity) ?~> "notAllowed" ~> FORBIDDEN | ||
job <- jobDAO.findOne(id) | ||
_ <- jobDAO.updateManualState(id, JobState.PENDING) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While testing I noticed that updating the manual state isn't enough (using the current wk worker master code). Could you please investigate this? I think that also updating the job's state
should also work, but somehow the manual state is ignored by the backend / worker. I think is more likely the behaviour that needs to be fixed. Not sure whether this needs to be fixed in the backend or worker
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I think the problem was that the field manual state was used, which only has an effect on the displayed state.
Thanks for implementing! When a superuser retries/resumes a task by another user, ideally the user wouldn't get the "Job Failed" email if the job fails another time, because that is irritating and unnecessary email noise for him/her. Once the job is successful, the respective mail should go through. Could the two of you coordinate on how to best implement this as part of this PR or a followup? /cc @normanrz |
Oh that's a good idea 🍌 👯. But maybe I am also overseeing things. Moreover: I think I like the approach saving the retry count more, as it provides more information for e.g. later analysis on how many attempts were needed instead of saving a simple boolean which provides less information. 🤔 |
These are good ideas, but I think the differentiating factor should rather be whether another user resumed the workflow run. From a users perspective, when resuming my own workflow, I'd like to be notified of failure and success. However, if a super user resumed a run, the run owner should only be notified of success. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
CHANGELOG.unreleased.md
(1 hunks)app/controllers/JobController.scala
(3 hunks)app/models/job/Job.scala
(3 hunks)app/models/job/JobService.scala
(1 hunks)conf/application.conf
(1 hunks)conf/evolutions/126-job-resumed-by-super-user.sql
(1 hunks)conf/evolutions/reversions/126-job-resumed-by-super-user.sql
(1 hunks)frontend/javascripts/admin/job/job_list_view.tsx
(4 hunks)tools/postgres/schema.sql
(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- CHANGELOG.unreleased.md
- frontend/javascripts/admin/job/job_list_view.tsx
🔇 Additional comments (7)
conf/evolutions/126-job-resumed-by-super-user.sql (1)
1-8
: LGTM! The SQL migration is well-structured.The migration correctly:
- Uses a transaction for atomicity
- Validates schema version
- Adds the column with appropriate default value
- Updates schema version
conf/evolutions/reversions/126-job-resumed-by-super-user.sql (1)
1-8
: LGTM! The SQL reversion is properly implemented.The reversion correctly:
- Uses a transaction for atomicity
- Validates schema version
- Drops the column
- Updates schema version
conf/application.conf (1)
154-155
: Verify if both features need to be enabled.While enabling
jobsEnabled
is necessary for the new job management functionality, please verify ifvoxelyticsEnabled
is also required for this feature.app/models/job/Job.scala (1)
32-32
: LGTM! The field is properly defined.The
resumedBySuperUser
field is correctly added with appropriate type and default value.app/models/job/JobService.scala (1)
66-66
: LGTM! The notification logic aligns with the requirements.The change ensures that failed email notifications are only sent if the job was not resumed by a super user, which aligns with the discussion in the PR comments about preventing unnecessary email notifications.
app/controllers/JobController.scala (1)
115-123
: LGTM! The implementation is secure and follows controller patterns.The new
resume
method correctly:
- Validates that jobs are enabled
- Verifies super user permissions
- Handles job resumption
tools/postgres/schema.sql (1)
23-23
: LGTM! The schema changes are well-structured.The changes correctly:
- Increment the schema version
- Add the
resumedBySuperUser
column with appropriate constraintsAlso applies to: 475-475
app/models/job/Job.scala
Outdated
def resumeOne(id: ObjectId)(implicit ctx: DBAccessContext): Fox[Unit] = | ||
for { | ||
_ <- assertUpdateAccess(id) | ||
_ <- run(q"""UPDATE webknossos.jobs SET state = 'PENDING', resumedBySuperUser = True WHERE _id = $id""".asUpdate) | ||
} yield () | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix SQL syntax and add state validation.
Two issues need to be addressed:
- Use lowercase
true
instead ofTrue
in SQL to follow PostgreSQL boolean literal syntax - Add validation to ensure the job is in FAILURE state before resuming
Here's the suggested implementation:
def resumeOne(id: ObjectId)(implicit ctx: DBAccessContext): Fox[Unit] =
for {
_ <- assertUpdateAccess(id)
- _ <- run(q"""UPDATE webknossos.jobs SET state = 'PENDING', resumedBySuperUser = True WHERE _id = $id""".asUpdate)
+ _ <- run(q"""UPDATE webknossos.jobs
+ SET state = 'PENDING', resumedBySuperUser = true
+ WHERE _id = $id
+ AND state = 'FAILURE'""".asUpdate)
} yield ()
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def resumeOne(id: ObjectId)(implicit ctx: DBAccessContext): Fox[Unit] = | |
for { | |
_ <- assertUpdateAccess(id) | |
_ <- run(q"""UPDATE webknossos.jobs SET state = 'PENDING', resumedBySuperUser = True WHERE _id = $id""".asUpdate) | |
} yield () | |
def resumeOne(id: ObjectId)(implicit ctx: DBAccessContext): Fox[Unit] = | |
for { | |
_ <- assertUpdateAccess(id) | |
_ <- run(q"""UPDATE webknosos.jobs | |
SET state = 'PENDING', resumedBySuperUser = true | |
WHERE _id = $id | |
AND state = 'FAILURE'""".asUpdate) | |
} yield () |
@daniel-wer @MichaelBuessemeyer I added the column "resumedBySuperUser" to the jobs table, which is checked when sending a failure email. |
|
||
DROP VIEW webknossos.jobs_; | ||
|
||
ALTER TABLE webknossos.jobs ADD column resumedBySuperUser boolean NOT NULL DEFAULT false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know, very nitpicky but boolean
should be in caps here as well, correct?
ALTER TABLE webknossos.jobs ADD column resumedBySuperUser boolean NOT NULL DEFAULT false; | |
ALTER TABLE webknossos.jobs ADD column resumedBySuperUser BOOLEAN NOT NULL DEFAULT false; |
conf/application.conf
Outdated
jobsEnabled = true | ||
voxelyticsEnabled = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
undo me -> should also make the ci green
jobsEnabled = true | |
voxelyticsEnabled = true | |
jobsEnabled = false | |
voxelyticsEnabled = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good and works 🎉
Thanks for implementing this 🧨
Maybe consider my final potential improvement. IMO the button restarts / retries a run and not resumes it 🤔
}} | ||
icon={<PlayCircleOutlined className="icon-margin-right" />} | ||
> | ||
Resume |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe restart/retry is a better word here 🤔?
Resume | |
Retry |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your reasoning, I am just following the specification. @valentin-pinkau @daniel-wer Opinion on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit unclear, because it's actually both. Resume, because the tasks that already finished are skipped and retry, because the failed task is retried. Still, Retry
is probably more accurate. Since it's only visible for super user that's fine I'd say. Otherwise one could add a tooltip, noting Restarts the workflow from the failed task, skipping and reusing artifacts from preceding tasks that were already successful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Restarts the workflow from the failed task, skipping and reusing artifacts from preceding tasks that were already successful.
That sounds like a valuable addon regardless of this being only available to super users. As we might make this available to non super users in the future, we do not need to remember to add this tooltip. => I'd vote to add it here as this shouldn't be difficult and also should need a re-review thus not delaying merging this :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
frontend/javascripts/admin/job/job_list_view.tsx (1)
314-314
: Consider adding confirmation dialog for resuming jobs.Since resuming a job may have side effects (especially if the job belongs to another user), consider adding a confirmation dialog similar to the one used for job cancellation.
You could implement this by adding a confirmation dialog:
- onClick={async () => { - resumeJob(job.id).then(() => fetchData()); - }} + onClick={async () => { + const isResumeConfirmed = await confirmAsync({ + title: <p>Are you sure you want to retry job {job.id}?</p>, + okText: "Yes, retry job", + cancelText: "No, cancel", + }); + + if (isResumeConfirmed) { + resumeJob(job.id).then(() => fetchData()); + } + }}
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
CHANGELOG.unreleased.md
(1 hunks)MIGRATIONS.unreleased.md
(1 hunks)frontend/javascripts/admin/job/job_list_view.tsx
(4 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- CHANGELOG.unreleased.md
- MIGRATIONS.unreleased.md
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: circleci_build
🔇 Additional comments (7)
frontend/javascripts/admin/job/job_list_view.tsx (7)
10-10
: Appropriate icon selection for the resume action.The
PlayCircleOutlined
icon is a good choice for the resume action as it visually communicates the concept of restarting or continuing a process.
14-14
: Correctly added resumeJob to imports.The
resumeJob
function import is correctly added alongside related job management functions.
25-25
: Redux state management properly configured.The imports for Redux's
useSelector
and theOxalisState
type are correctly added to support the super user check functionality.Also applies to: 28-28
139-139
: Well-implemented super user permission check.Using Redux's
useSelector
to check if the current user is a super user is an appropriate approach for restricting access to the resume functionality.
300-300
: Consistent icon implementation.The icon is properly added to the
AsyncLink
component, maintaining consistency with other action buttons in the interface.
305-317
: Successfully implemented resume functionality for failed jobs.The implementation correctly:
- Restricts the resume functionality to super users
- Only shows the button for jobs in "FAILURE" state
- Uses the
resumeJob
API call to restart the job- Refreshes the data after resuming to show updated state
- Includes an informative tooltip explaining what happens when resuming a job
The "Retry" label is more accurate than "Resume" as it better communicates that the failed task will be retried, as discussed in previous comments.
309-309
: Informative tooltip provides clarity for super users.The tooltip clearly explains the behavior when resuming a job, which is particularly important for super users who need to understand the implications of restarting a workflow. This addresses the request from previous review comments.
onClick={async () => { | ||
resumeJob(job.id).then(() => fetchData()); | ||
}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Verify error handling for resumeJob API call.
The code assumes the resumeJob
API call will always succeed. Consider adding error handling to manage potential failures and provide feedback to the user.
🏁 Script executed:
#!/bin/bash
# Check if there's error handling in the resumeJob implementation
echo "Checking if resumeJob has error handling:"
rg -A 5 "resumeJob.*=" admin/admin_rest_api.ts || echo "resumeJob function not found"
echo "Checking global error handling for API calls:"
rg -A 5 "catch\(" admin/admin_rest_api.ts || echo "No global error handling found"
Length of output: 552
Action Required: Enhance Error Handling for resumeJob
API Call
- The onClick handler in
frontend/javascripts/admin/job/job_list_view.tsx
currently callsresumeJob(job.id)
without any error handling. - Our search for the
resumeJob
implementation inadmin/admin_rest_api.ts
did not locate its definition, and no global error handling (e.g., catch blocks) was found. - Please add explicit error handling for
resumeJob
(for example, using a try-catch structure with async/await or adding a.catch()
callback) to manage potential API failures and provide appropriate user feedback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah the AI is correct here, I'd say. Some kind of feedback would be useful. Maybe add an error toast upon failure and a success toast upon success?
Maybe something like this:
onClick={async () => {
try{
await retryJob(job.id)
await fetchData();
Toast.success("Job is being retried");
} catch (e) {
console.error("Could not retry job", e);
Toast.error("Failed to start retrying the job");
}
}}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Based on the results, I need to check for the imports in the file to determine which function name is correct:
@MichaelBuessemeyer I'll implement your suggestion with the error and success toasts. However, I noticed that the function name in the diffed code was resumeJob
, but it should be retryJob
based on the imports:
import { cancelJob, getJobs, retryJob } from "admin/admin_rest_api";
Here's the updated code with error handling:
onClick={async () => {
try {
await retryJob(job.id);
await fetchData();
Toast.success("Job is being retried");
} catch (e) {
console.error("Could not retry job", e);
Toast.error("Failed to start retrying the job");
}
}}
Is there a Toast
component available in the codebase, or should we use a different notification system like Ant Design's message
or notification
APIs since the project seems to be using Ant Design components?
🏁 Script executed:
#!/bin/bash
# Check for resumeJob and retryJob in admin API files
echo "Looking for resumeJob in frontend files:"
rg "resumeJob" frontend/javascripts/admin/
echo "Looking for retryJob in frontend files:"
rg "retryJob" frontend/javascripts/admin/
Length of output: 589
🏁 Script executed:
#!/bin/bash
# Check imports in the job_list_view.tsx file
cat frontend/javascripts/admin/job/job_list_view.tsx | grep -n "import" | head -15
Length of output: 885
Based on the results, I need to look for more specific toast/notification implementations:
🏁 Script executed:
#!/bin/bash
# Check for toast or notification usage in the codebase
echo "Looking for Toast usage:"
rg "Toast\." frontend/javascripts/ --type ts --type tsx | head -10
echo "Looking for message or notification from Ant Design:"
rg "import.*message|import.*notification" frontend/javascripts/ --type ts --type tsx | head -10
Length of output: 396
🏁 Script executed:
#!/bin/bash
# Check for common notification patterns
echo "Looking for toast/notification implementations:"
rg -A 1 "notification\.success|message\.success|\.success\(" frontend/javascripts/ --type ts --type tsx | head -10
Length of output: 249
🏁 Script executed:
#!/bin/bash
# Check for common notification patterns without file type filtering
echo "Looking for message.success usage:"
rg "message\.success" frontend/javascripts/ | head -10
echo "Looking for notification.success usage:"
rg "notification\.success" frontend/javascripts/ | head -10
echo "Looking for antd message/notification imports:"
rg "import.*\{ .*message" frontend/javascripts/ | head -10
Length of output: 1993
@MichaelBuessemeyer You're absolutely right about adding error handling. I've checked the codebase, and it looks like the function should be retryJob
(not resumeJob
as in the diff).
For notifications, I see that the project uses Ant Design's message
component. Here's the updated implementation with proper error handling:
onClick={async () => {
try {
await retryJob(job.id);
await fetchData();
message.success("Job is being retried");
} catch (e) {
console.error("Could not retry job", e);
message.error("Failed to start retrying the job");
}
}}
This will require adding the message
import from Ant Design if it's not already there:
import { Input, Modal, Spin, Table, Tooltip, Typography, message } from "antd";
This implementation provides clear feedback to users when the retry action succeeds or fails, making the UI more user-friendly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (3)
app/controllers/JobController.scala (1)
118-118
: Consider adding a more descriptive error message for unauthorized access.The current error message "notAllowed" is generic. A more specific message would help users understand why they can't perform this action.
- _ <- userService.assertIsSuperUser(request.identity) ?~> "notAllowed" ~> FORBIDDEN + _ <- userService.assertIsSuperUser(request.identity) ?~> "Only super users can retry jobs" ~> FORBIDDENconf/evolutions/126-job-retried-by-super-user.sql (1)
5-8
: Adding the New Column to Jobs Table
Line 7 adds the columnretriedBySuperUser
with a default value of false. While the implementation is correct, note that the SQL syntax typically uses “ADD COLUMN” (with COLUMN in capital or lowercase) rather than “ADD column”. For clarity and consistency, consider rewriting it as follows:-ALTER TABLE webknossos.jobs ADD column retriedBySuperUser BOOLEAN NOT NULL DEFAULT false; +ALTER TABLE webknossos.jobs ADD COLUMN retriedBySuperUser BOOLEAN NOT NULL DEFAULT false;Additionally, double-check that the naming (
retriedBySuperUser
) aligns with the intended business terminology. The PR objectives mention a superuser action that “resets” or “resumes” a job, yet here the column is named “retriedBySuperUser.” Ensuring consistency between documentation, code, and feature behavior will help reduce confusion in the future.tools/postgres/schema.sql (1)
473-476
: Incorporation of the New Column in the Jobs Table
At line 475, the new columnretriedBySuperUser BOOLEAN NOT NULL DEFAULT false,
is added to thewebknossos.jobs
table within the schema definition. This reflects the change in the evolution script. As mentioned previously, verify that the naming is consistent with the feature’s intent (i.e. whether the column should be called “retriedBySuperUser” or “resumedBySuperUser”) and that any part of the codebase referencing this field is updated accordingly.
Optionally, if filtering jobs by this column becomes common, consider adding an index on it later to optimize query performance.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (11)
CHANGELOG.unreleased.md
(1 hunks)MIGRATIONS.unreleased.md
(1 hunks)app/controllers/JobController.scala
(3 hunks)app/models/job/Job.scala
(3 hunks)app/models/job/JobService.scala
(1 hunks)conf/evolutions/126-job-retried-by-super-user.sql
(1 hunks)conf/evolutions/reversions/126-job-retried-by-super-user.sql
(1 hunks)conf/webknossos.latest.routes
(1 hunks)frontend/javascripts/admin/api/jobs.ts
(1 hunks)frontend/javascripts/admin/job/job_list_view.tsx
(4 hunks)tools/postgres/schema.sql
(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
- CHANGELOG.unreleased.md
- MIGRATIONS.unreleased.md
- conf/webknossos.latest.routes
- app/models/job/JobService.scala
- frontend/javascripts/admin/job/job_list_view.tsx
🔇 Additional comments (11)
frontend/javascripts/admin/api/jobs.ts (1)
77-81
: Concise and consistent implementation of the retry function.The implementation follows the established pattern of other job-related API functions in this file, providing a clean interface for the frontend to interact with the backend retry endpoint.
conf/evolutions/reversions/126-job-retried-by-super-user.sql (2)
3-3
: Great use of schema version assertion.The assertion ensures the reversion script runs only when the correct schema version is in place, preventing accidental application to incorrect database states.
10-10
: Correctly recreated the jobs_ view.The view properly excludes deleted jobs, maintaining the expected behavior of the system.
app/models/job/Job.scala (3)
32-32
: Well-named and properly defaulted field.The
retriedBySuperUser
field clearly communicates its purpose and defaults tofalse
, which is the correct initial state for a job.
130-130
: Field mapping implementation is correct.The parse method correctly maps the database column to the model field.
258-265
:❓ Verification inconclusive
Well-implemented retry functionality with proper validation.
The implementation includes:
- Access control check with
assertUpdateAccess
- State validation ensuring only FAILURE jobs can be retried
- Setting the retriedBySuperUser flag for proper tracking
To verify all retry operations set the state to PENDING, run:
🏁 Script executed:
#!/bin/bash # Check implementation of retry functionality across the codebase echo "Checking for retryOne implementations:" rg -A 10 "def retryOne" --type scala echo "\nChecking for SQL UPDATE operations that modify job state:" rg -A 3 "UPDATE webknossos.jobs.*SET state" --type scalaLength of output: 975
Verified Retry Functionality Implementation in
Job.scala
The implementation in
app/models/job/Job.scala
(lines 258–265) has been confirmed to:
- Check access via
assertUpdateAccess
.- Restrict retries strictly to jobs in the
FAILURE
state.- Update the job state to
PENDING
and set theretriedBySuperUser
flag for audit tracking.The search for the
retryOne
method and its SQL update shows the expected code snippet. However, the additional check for other SQL update operations modifying the job state produced no extra output. Please manually verify that no unintended SQL updates are present elsewhere in the repository.app/controllers/JobController.scala (2)
67-67
: Updated constructor with proper dependency injection.Adding the UserService as a dependency follows the established pattern and provides the necessary functionality for super user verification.
115-123
:❓ Verification inconclusive
Well-implemented retry endpoint with proper authorization.
The implementation:
- Ensures jobs are enabled before proceeding
- Restricts the operation to super users only
- Returns an appropriate HTTP response (403 FORBIDDEN) when authorization fails
- Uses the jobDAO.retryOne method to perform the actual retry operation
To ensure this controller method is properly connected to the API route, run:
🏁 Script executed:
#!/bin/bash # Verify that the retry route is properly defined in the routes file echo "Looking for retry route configuration:" rg "PATCH.*/jobs/.*retry" --type text # Check if there are any other routes related to job retrying echo "\nChecking for other job retry related routes:" rg "jobs.*retry" --type textLength of output: 318
Below is a new script that re-runs the route search without the unsupported file type argument. This should help verify whether the retry endpoint is actually configured in a routes file (typically in a file like
conf/routes
):Manual Verification Needed: Confirm route configuration for JobController.retry endpoint
The implementation of the retry endpoint inapp/controllers/JobController.scala
looks well-implemented with proper authorization and error handling. However, since the original verification script did not produce the expected results due to an unsupported--type text
flag, please run the updated script above to manually verify that the correct route for retrying jobs is defined in your routes file.• Ensure that the HTTP method, path, and mapping in the routes file correctly connect to the
retry
action in the controller.
• Confirm that any existing API documentation or integration tests reflect this configuration.
🏁 Script executed:
#!/bin/bash # Verify that the retry route is properly defined in the routes file(s) echo "Searching for PATCH retry route configuration in conf/routes file(s):" rg "PATCH.*\/jobs\/.*retry" conf/routes* echo "Searching for any mention of 'retry' related to jobs in conf/routes file(s):" rg "jobs.*retry" conf/routes*Length of output: 608
Action: Verify the Routing Setup for the Retry Endpoint
The implementation in
app/controllers/JobController.scala
appears solid with proper authorization and error handling. However, our automated search for a corresponding route in theconf/routes
file couldn’t produce definitive results due to regex parsing issues. Please verify manually that the route mapping for theretry
action is correctly defined. In particular:
- Ensure the HTTP method and path match: Confirm that the route (likely using PATCH or another method) targeting
/jobs/<id>/retry
is present in yourconf/routes
file.- Review integration tests or API documentation: Make sure that this mapping is documented and tested accordingly.
conf/evolutions/126-job-retried-by-super-user.sql (2)
1-4
: Transaction Block and Schema Version Assertion Check
The migration starts with a transactional block and asserts that the current schema version is 125. This is a good safeguard to ensure the migration applies in the correct order. Please verify that the use of theASSERT
statement is supported in your target PostgreSQL version (as it is available from PostgreSQL 14 onward).
8-12
: Recreating the Jobs View and Committing the Transaction
The steps to update the view (webknossos.jobs_
) and commit the transaction are straightforward and correctly ordered. The view now selects records from the updatedjobs
table (including the new column) and excludes deleted entries.tools/postgres/schema.sql (1)
23-24
: Schema Version Update in Release Information
Line 23 updates theschemaVersion
to 126 in thewebknossos.releaseInformation
table. This change is necessary to track the evolution of the schema correctly. Please ensure that this version update is coordinated with the corresponding evolution script.
|
||
DROP VIEW webknossos.jobs_; | ||
|
||
ALTER TABLE webknossos.jobs DROP column resumedBySuperUser; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Field name inconsistency detected.
The column being dropped is named resumedBySuperUser
, but in the Job model class (Job.scala), the field is named retriedBySuperUser
. This inconsistency will cause mapping issues between the database and application model.
Update the column name to match the model:
-ALTER TABLE webknossos.jobs DROP column resumedBySuperUser;
+ALTER TABLE webknossos.jobs DROP column retriedBySuperUser;
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
ALTER TABLE webknossos.jobs DROP column resumedBySuperUser; | |
ALTER TABLE webknossos.jobs DROP column retriedBySuperUser; |
20689dd
to
fa52211
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still works, nice 🎉
Please seem my two comments (one is attached to an already existing thread).
After that I should be good to go 🏃
<AsyncLink | ||
href="#" | ||
title="Restarts the workflow from the failed task, skipping and reusing artifacts from preceding tasks that were already successful." | ||
onClick={async () => { | ||
retryJob(job.id).then(() => fetchData()); | ||
}} | ||
icon={<PlayCircleOutlined className="icon-margin-right" />} | ||
> | ||
Retry | ||
</AsyncLink> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please move the text from the title prop to a wrapping Tooltip? This is more conventional with the codebase and matches the styling of the other tooltips.
<AsyncLink | |
href="#" | |
title="Restarts the workflow from the failed task, skipping and reusing artifacts from preceding tasks that were already successful." | |
onClick={async () => { | |
retryJob(job.id).then(() => fetchData()); | |
}} | |
icon={<PlayCircleOutlined className="icon-margin-right" />} | |
> | |
Retry | |
</AsyncLink> | |
<Tooltip title="Restarts the workflow from the failed task, skipping and reusing artifacts from preceding tasks that were already successful."> | |
<AsyncLink | |
href="#" | |
title="Restarts the workflow from the failed task, skipping and reusing artifacts from preceding tasks that were already successful." | |
onClick={async () => { | |
retryJob(job.id).then(() => fetchData()); | |
}} | |
icon={<PlayCircleOutlined className="icon-margin-right" />} | |
> | |
Retry | |
</AsyncLink> | |
</Tooltip> |
Awesome looks great 🎉 Did you do retesting a little or should I do this quickly? The changes are so minor that one person giving it a quick test should be enough IMO :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great if we could get this merged. I just tested the functionality and it worked nicely. Is there anything else holding this back?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think as well that this would be very helpful now.
I'd give it a go 🐐
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
CHANGELOG.unreleased.md (3)
13-15
: New Feature Entry: Consider Additional Details
The "Added" entry on line 14 clearly highlights that failed jobs may be resumed by super-users. Since the PR’s main objective is to transition a failed job from "FAILURE" to "PENDING," it might add clarity to explicitly mention the state change. This would help users quickly grasp the exact behavior introduced by this PR.
24-29
: Duplicate "Removed" Sections and Mis-categorized Entries
There are two separate "Removed" sections (lines 24–29) which creates confusion. Moreover, the entries under these headers describe bug fixes (“Fixed a bug …”) rather than actual removals. Consider merging these into a single section and reclassifying them as "Fixed" if they indeed represent bug resolutions. For example, you might remove the second "### Removed" header and adjust the categorization as follows:-### Removed - - - Fixed a bug where newly added remote datasets would always appear in root folder, regardless of actual selected folder. [#8425] - - Fixed a bug where the python libs functionality `wk.RemoteDataset.explore_and_add_remote` would error. [#8425] - -### Removed +### Fixed + - Fixed a bug where newly added remote datasets would always appear in root folder, regardless of the selected folder. [#8425] + - Fixed a bug where `wk.RemoteDataset.explore_and_add_remote` would error. [#8425]This adjustment would improve clarity and consistency with the Keep a Changelog guidelines.
🧰 Tools
🪛 LanguageTool
[style] ~25-~25: ‘in the meantime’ might be wordy. Consider a shorter alternative.
Context: ... if some of its layers had been deleted in the meantime. [#8405](https://github.com/scalablemin...(EN_WORDINESS_PREMIUM_IN_THE_MEANTIME)
[uncategorized] ~26-~26: You might be missing the article “the” here.
Context: ... remote datasets would always appear in root folder, regardless of actual selected f...(AI_EN_LECTOR_MISSING_DETERMINER_THE)
[uncategorized] ~26-~26: You might be missing the article “the” here.
Context: ...ys appear in root folder, regardless of actual selected folder. [#8425](https://github...(AI_EN_LECTOR_MISSING_DETERMINER_THE)
31-32
: Empty "Breaking Changes" Section
The "### Breaking Changes" header is currently empty. If there are no breaking changes in this release, consider either removing the header or adding a note such as "None" for better clarity and to prevent confusion for users scanning the changelog.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (13)
CHANGELOG.unreleased.md
(1 hunks)CHANGELOG.unreleased.md
(1 hunks)frontend/javascripts/oxalis/view/left-border-tabs/layer_settings_tab.tsx
(3 hunks)CHANGELOG.unreleased.md
(1 hunks)CHANGELOG.unreleased.md
(1 hunks)CHANGELOG.unreleased.md
(1 hunks)CHANGELOG.unreleased.md
(1 hunks)CHANGELOG.unreleased.md
(1 hunks)frontend/javascripts/oxalis/model/accessors/dataset_layer_transformation_accessor.ts
(6 hunks)frontend/javascripts/oxalis/view/left-border-tabs/layer_settings_tab.tsx
(2 hunks)CHANGELOG.unreleased.md
(0 hunks)frontend/javascripts/oxalis/model/accessors/dataset_layer_transformation_accessor.ts
(5 hunks)frontend/javascripts/oxalis/view/left-border-tabs/layer_settings_tab.tsx
(3 hunks)
🚧 Files skipped from review as they are similar to previous changes (7)
- CHANGELOG.unreleased.md
- CHANGELOG.unreleased.md
- CHANGELOG.unreleased.md
- CHANGELOG.unreleased.md
- CHANGELOG.unreleased.md
- CHANGELOG.unreleased.md
- CHANGELOG.unreleased.md
🧰 Additional context used
🪛 LanguageTool
CHANGELOG.unreleased.md
[duplication] ~21-~21: Possible typo: you repeated a word.
Context: ...calableminds/webknossos/pull/8405) ### Fixed - Fixed a bug that would lock a non-existing ma...
(ENGLISH_WORD_REPEAT_RULE)
🔇 Additional comments (9)
frontend/javascripts/oxalis/view/left-border-tabs/layer_settings_tab.tsx (3)
133-133
: The import update improves layer configuration specificity.This import allows for retrieving layer-specific default configurations instead of using generic defaults, which will improve the precision of layer settings.
1015-1019
: LGTM: Enhanced layer configuration with context-aware defaults.This change improves the configuration system by using
getSpecificDefaultsForLayer
to obtain context-aware default values based on the dataset, layer name, and layer type, rather than using generic defaults.
1046-1046
: Appropriate application of the layer-specific defaults.The default alpha value now comes from the layer-specific defaults, which properly completes the implementation of the contextual default configuration system.
frontend/javascripts/oxalis/model/accessors/dataset_layer_transformation_accessor.ts (3)
194-196
: Good defensive programming with standardized transform handling.The new
toIdentityTransformMaybe
function centralizes the logic for safely returning either the provided transform (if it exists and isn't an identity transform) or a standardized identity transform. This prevents null returns and ensures consistent behavior.
217-217
: Consistent application of the transform normalization logic.The function calls to
toIdentityTransformMaybe
replace direct returns, ensuring all code paths consistently handle null or identity transformations in the same way. This increases code reliability and prevents potential null reference errors.Also applies to: 228-228, 232-232, 273-273, 288-288
246-248
: Renamed function better expresses its comparison purpose.The function was renamed from
isIdentityTransform
toequalsIdentityTransform
, which more accurately reflects that it's comparing a transform with the identity transform rather than checking some inherent property.CHANGELOG.unreleased.md (3)
11-11
: Commit Link Formatting is Clear
The commit comparison link on line 11 is correctly formatted and provides useful context for the unreleased changes.
16-18
: Changed Section is Well Documented
The "Changed" entry on line 17 is clear, documenting that when using a zarr link as another layer’s source, the user’s token is used for access. This change is straightforward and appears consistent with the intended improvements.
19-23
: Fixed Section: Clear Bug Fixes
The fixed entries on lines 20, 21, and 22 effectively describe the issues resolved, including mapping locks, alignment issues, and a saving bug. The descriptions provide adequate context for developers and users alike.🧰 Tools
🪛 LanguageTool
[duplication] ~21-~21: Possible typo: you repeated a word.
Context: ...calableminds/webknossos/pull/8405) ### Fixed - Fixed a bug that would lock a non-existing ma...(ENGLISH_WORD_REPEAT_RULE)
URL of deployed dev instance (used for testing):
Steps to test:
TODOs:
Issues:
(Please delete unneeded items, merge only when none are left open)
Needs datastore update after deployment