Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support old-style TensorFlow events (tensorboard) #2467

Merged
merged 6 commits into from
Feb 15, 2025

Conversation

garymm
Copy link
Contributor

@garymm garymm commented Dec 18, 2024

Fixes: #2466

@garymm
Copy link
Contributor Author

garymm commented Dec 18, 2024

I didn't see a place to add tests for this code. Is there a good place to add tests? If so, I'm happy to do it.

Also unrelated to this bug, but while testing I found it pretty surprising that you consider an event to match as long as the tag name starts with the metric name (e.g. a TF Event with tag "foobar" will match a metric name "foo"). Is this intended? Seems sort of surprising. If that's not intended, I can open another PR to fix.

@garymm garymm force-pushed the garymm/tfevent-scalar branch from d1c626e to 2942637 Compare February 4, 2025 22:23
@garymm
Copy link
Contributor Author

garymm commented Feb 4, 2025

@Electronic-Waste @andreyvelich could someone please review?

Copy link
Member

@Electronic-Waste Electronic-Waste left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically LGTM. Thanks for this @garymm! Just a few comments.

/assign @andreyvelich @helenxie-bit @mahdikhashan

Copy link

@Electronic-Waste: GitHub didn't allow me to assign the following users: mahdikhashan.

Note that only kubeflow members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

Basically LGTM. Thanks for this @garymm! Just a few comments.

/assign @andreyvelich @helenxie-bit @mahdikhashan

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Member

@mahdikhashan mahdikhashan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would you please make sure that this funcationality is also tested? there is an existing unit test at test/unit/metricscollector/test_tfevent_metricscollector.py.

Thanks for your time @garymm

@google-oss-prow google-oss-prow bot added size/L and removed size/M labels Feb 6, 2025
@garymm garymm force-pushed the garymm/tfevent-scalar branch 2 times, most recently from 800be03 to e6b5888 Compare February 6, 2025 17:51
@garymm
Copy link
Contributor Author

garymm commented Feb 6, 2025

@mahdikhashan @Electronic-Waste done

@mahdikhashan
Copy link
Member

@mahdikhashan @Electronic-Waste done

thanks for your contribution @garymm . I kindly ask @andreyvelich @Electronic-Waste for their help by triggering the ci so then we can make sure that everything is fine.

@Electronic-Waste
Copy link
Member

/rerun-all

@garymm garymm force-pushed the garymm/tfevent-scalar branch from ed698f1 to a230fd5 Compare February 12, 2025 01:01
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@google-oss-prow google-oss-prow bot added size/XXL and removed size/L labels Feb 12, 2025
Signed-off-by: Gary Miguel <[email protected]>
Signed-off-by: Gary Miguel <[email protected]>
Signed-off-by: Gary Miguel <[email protected]>
Signed-off-by: Gary Miguel <[email protected]>
@garymm garymm force-pushed the garymm/tfevent-scalar branch from a230fd5 to 7b4ae33 Compare February 12, 2025 01:04
@google-oss-prow google-oss-prow bot added size/L and removed size/XXL labels Feb 12, 2025
@garymm
Copy link
Contributor Author

garymm commented Feb 12, 2025

@Electronic-Waste I fixed the pre-commit issues. I doubt the e2e test failure is my fault.

@Electronic-Waste
Copy link
Member

/rerun-all

@Electronic-Waste
Copy link
Member

@garymm could you please resolve these conflicts?

@Electronic-Waste
Copy link
Member

/rerun-all

@garymm
Copy link
Contributor Author

garymm commented Feb 13, 2025

@Electronic-Waste done

@andreyvelich
Copy link
Member

@Electronic-Waste @mahdikhashan @garymm @kubeflow/wg-training-leads Do we want to cherry-pick this PR to the release-0.18 branch ?

@garymm
Copy link
Contributor Author

garymm commented Feb 13, 2025

Given how rarely there are releases it seems like it'd be a shame to not include this in 0.18, but up to you

@mahdikhashan
Copy link
Member

@Electronic-Waste @mahdikhashan @garymm @kubeflow/wg-training-leads Do we want to cherry-pick this PR to the release-0.18 branch ?

yes, i agree with you. do we need to add any document for it?

cc: @garymm

@andreyvelich
Copy link
Member

Yes, I would appreciate if @garymm or @mahdikhashan can update docs for the TensorFlowEvent Metrics Collector here:
https://www.kubeflow.org/docs/components/katib/user-guides/metrics-collector/

garymm added a commit to garymm/website-1 that referenced this pull request Feb 13, 2025
@garymm
Copy link
Contributor Author

garymm commented Feb 13, 2025

kubeflow/website#3999

Copy link
Member

@andreyvelich andreyvelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@google-oss-prow google-oss-prow bot added the lgtm label Feb 15, 2025
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andreyvelich

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-prow google-oss-prow bot merged commit c18035e into kubeflow:master Feb 15, 2025
66 checks passed
google-oss-prow bot pushed a commit to kubeflow/website that referenced this pull request Feb 15, 2025
* katib metrics-collector: mention supported writers

See kubeflow/katib#2467

Signed-off-by: Gary Miguel <[email protected]>

* add 'metrics' word

Signed-off-by: Gary Miguel <[email protected]>

---------

Signed-off-by: Gary Miguel <[email protected]>
@garymm garymm deleted the garymm/tfevent-scalar branch February 15, 2025 01:14
@andreyvelich
Copy link
Member

/cherry-pick release-0.18

@andreyvelich andreyvelich added this to the v0.18 milestone Feb 15, 2025
@google-oss-robot
Copy link

@andreyvelich: #2467 failed to apply on top of branch "release-0.18":

Applying: Support old-style TensorFlow events (tensorboard)
Applying: format
Applying: test
Using index info to reconstruct a base tree...
M	test/unit/v1beta1/requirements.txt
Falling back to patching base and 3-way merge...
Auto-merging test/unit/v1beta1/requirements.txt
CONFLICT (content): Merge conflict in test/unit/v1beta1/requirements.txt
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0003 test
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

In response to this:

/cherry-pick release-0.18

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tenzen-y
Copy link
Member

@Electronic-Waste @mahdikhashan @garymm @kubeflow/wg-training-leads Do we want to cherry-pick this PR to the release-0.18 branch ?

I'm ok

@andreyvelich
Copy link
Member

@garymm @mahdikhashan @Electronic-Waste Please can you help us to resolve conflicts and manually cherry-pick this commit to the release-branch ?

@mahdikhashan
Copy link
Member

@garymm @mahdikhashan @Electronic-Waste Please can you help us to resolve conflicts and manually cherry-pick this commit to the release-branch ?

I'll do so.

mahdikhashan pushed a commit to mahdikhashan/katib that referenced this pull request Feb 17, 2025
* Support old-style TensorFlow events (tensorboard)

Fixes: kubeflow#2466
Signed-off-by: Gary Miguel <[email protected]>

* format

Signed-off-by: Gary Miguel <[email protected]>

* test

Signed-off-by: Gary Miguel <[email protected]>

* don't continue loops

Signed-off-by: Gary Miguel <[email protected]>

* format

Signed-off-by: Gary Miguel <[email protected]>

---------

Signed-off-by: Gary Miguel <[email protected]>
@mahdikhashan
Copy link
Member

@garymm @mahdikhashan @Electronic-Waste Please can you help us to resolve conflicts and manually cherry-pick this commit to the release-branch ?

PTAL: #2517

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TensorFlowEvent metrics collector doesn't find events written by torch / tensorboardX
7 participants