Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adapt to new (v2) VFIO migration interface #741
Adapt to new (v2) VFIO migration interface #741
Changes from 1 commit
e31b36a
f753886
2d38f2f
4bf29bc
ddf1f0a
3228e38
45e4246
dcc535f
7c68798
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is probing for one or more of these three things:
right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the idea is if you don't set GET/SET, it is successful if the feature is supported at all, but if you set GET/SET it is only successful if whatever commands you probed for are also supported. How would you suggest I word this more clearly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add a whole section on probe, saying something like:
Probing
If just
VFIO_DEVICE_FEATURE_PROBE
is set, the server should return the corresponding flags that are supported.If
VFIO_DEVICE_FEATURE_PROBE
is set along withVFIO_DEVICE_FEATURE_GET
,VFIO_DEVICE_FEATURE_SET
, or both, the server should return whether the corresponding operation(s) are supported for the given flags.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This bit of the spec is pretty much just lifted from VFIO, I don't believe you can call SET on this feature...?
https://elixir.bootlin.com/linux/v6.3.8/source/include/uapi/linux/vfio.h#L814
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as this is always going to be completely meaningless, we should discuss whether we should diverge here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably, the only reason why we could keep it is to slightly simplify QEMU implementation, though I'm not sure that's a strong reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, we haven't discussed whether we're going to support memory mapped data transfers for migration (this FD is something we could use), @w-henderson have you given any thought to this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't thought about this - I quite like the idea of following VFIO as closely as we can so my preference is probably to leave it in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I said doens't make sense, the FD needs to be passed via ancillary data. I don't have a strong opinion about keeping it, perhaps qemu-devel may help us decide.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this look in vfio? Is a stream "safe", or would we ever want to support sparse read/writes in some way?
I'm not clear how this would even work if we wanted to do device state dirty tracking in PRE_COPY state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe in VFIO we never "seek", only read/write until no more reading/writing can be done. I'm also unsure about how we'd do device state dirty tracking in PRE_COPY state - I think the idea is we track the dirty pages in PRE_COPY so we can only send the necessary migration data once we get to STOP_COPY but I'm not too sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct.
I'm not sure I understand the rest of the discussion though, especially how dirty page tracking is related to device state tracking.
I have the following question (perhaps we're talking about the same thing?), regarding PRE_COPY, from VFIO:
IIUC PRE_COPY returns an FD as this starts a new copy session, so in vfio-user this means that the client can start sending
VFIO_USER_MIG_DATA_READ
messages until the server returns 0 length read (equivalnt to thedata_fd
returning no more data in VFIO). While in PRE_COPY state, the device migth generate more device data, how does the VFIO client know this? Does reading from thedata_fd
return data (it would previously return EOF).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VFIO defines
PRE_COPY
to meanwhich maybe implies that we don't actually transfer any data in that state, rather just do whatever we can in the background to prepare for transfer and track the changes to keep this up to date? I'm not sure if this makes sense though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This suggests otherwise, but I agree it's not clear at all.
Can you comment on this? Does the device stay in
RUNNING
state or does it transition toRUNNING | PRE_COPY
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By state do you mean
RUNNING
etc?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes,
RUNNING
andPRE_COPY
are distinct states that have the same behaviour from the client's perspective but do things differently behind the scenes.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused, the documentation says:
while you say:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused too, I had been thinking of them as distinct states but I guess it's supposed to be more like different behaviour in the same state? That sounds a bit dodgy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does, and I was hoping you'd look at the kernel code to figure it out 😄.