-
Notifications
You must be signed in to change notification settings - Fork 213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[super ugly maybe working code] use shim.h instead of Tensor #1548
base: main
Are you sure you want to change the base?
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1548
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ❌ 9 New FailuresAs of commit 5e2c2d0 with merge base b2fb664 ( NEW FAILURES - The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
torchao/csrc/cuda/tensor_core_tiled_layout/tensor_core_tiled_layout.cu
Outdated
Show resolved
Hide resolved
e1c39be
to
2017d7b
Compare
: lib_(&TorchLibraryOpaque(Library::Kind::IMPL, ns, k, file, line)) {} | ||
|
||
StableLibrary& StableLibrary::impl(std::string name, void (*fn)(void **, int64_t, int64_t)) { | ||
auto boxed_function = [fn](const c10::OperatorHandle &op, torch::jit::Stack *stack) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are these, torch::jit:*
and IValue, considered stable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no! I haven't gotten a chance to document what I'm trying to do, but everything in libtorch.cpp should eventually make its way back to libtorch (not stable!!)
everything in libtorch.h is supposed to be stable.
// boxed_dequantize_tensor_core_tiled_layout>()); | ||
// } | ||
|
||
class StableLibrary::TorchLibraryOpaque { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am guessing that, from design perspective, this is the interface layer with registration API that is shipped with different versions of libtorch? But the user code, for custom ops, never relies on at::Tensor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, this is to allow for registering libtorch-agnostic custom ops which cannot use at::Tensor or IValue, etc. in their schema
TORCH_LIBRARY_IMPL(torchao, CUDA, m) { | ||
m.impl("torchao::unpack_tensor_core_tiled_layout", &_unpack_tensor_core_tiled_layout); | ||
m.impl("torchao::dequantize_tensor_core_tiled_layout", &_dequantize_tensor_core_tiled_layout); | ||
void voidyvoid_boxed_ATH_unpack_tensor_core_tiled_layout(void **stack, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
voidyvoid.. lol
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on serious note though, will user have to write this function with void** stack
? It would actually be good if user just write with stable API. LIke AtenTensorHandle
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, to be clear, there are two portions we expect the user to provide for now:
- their custom op, which should directly use AtenTensorHandle (we can make that easier by wrapping AtenTensorHandle with a headeronly C++ API layer)
- how their custom op should be registered within our dispatcher stack (this is the point of the voidyvoid function). There may be a way to automatically generate this for the user given the schema of their custom op from (1) but that is a next step. Currently, we are going to expect users to provide this.
0e979be
to
5e2c2d0
Compare
this is a poc change to see what making custom ops use aoti shim.h would look like
what you should expect by the end of this exercise: