-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#2403: Updated default system descriptor attributes in compiler to handle multi-device meshes #2411
base: main
Are you sure you want to change the base?
Conversation
Upon adding in this feature, I noticed we are not getting chip coordinates for multi-device meshes inserted correctly into the system descriptor. I opened a follow-up ticket for this since it is a more involved change: #2410. |
…ndle multi-device meshes
13deec4
to
a43fb0a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor organization thing, but overall looks great!
systemDescOptions.meshShape = implicitDeviceOptions.meshShape; | ||
} | ||
|
||
pm.addPass(mlir::tt::ttir::createTTIRLoadSystemDesc(systemDescOptions)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to keep the pass options programming just before the call to the pass. I think it's not a big deal to construct a SmallVector
multiple times to keep the options and the pass next to each other. Or create the vector once at the top and copy it to each pass.
ttir::TTIRImplicitDeviceOptions implicitDeviceOptions; | ||
implicitDeviceOptions.meshShape = ::llvm::SmallVector<int64_t>( | ||
options.meshShape.begin(), options.meshShape.end()); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
@@ -206,7 +206,7 @@ def TT_SystemDescAttr : TT_Attr<"SystemDesc", "system_desc"> { | |||
let assemblyFormat = "`<` `[` $cpuDescs `]` `,` `[` $chipDescs `]` `,` `[` $chipDescIndices `]` `,` `[` $chipCapabilities `]` `,` `[` $chipCoords `]` (`,` `[` $chipChannels^ `]`)? `>`"; | |||
|
|||
let extraClassDeclaration = [{ | |||
static tt::SystemDescAttr getDefault(MLIRContext *context); | |||
static tt::SystemDescAttr getDefault(MLIRContext *context, llvm::SmallVector<int64_t> meshShape = {1}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: const & meshShape
.
mlir::tt::SystemDescAttr::getDefault(MLIRContext *context, | ||
::llvm::SmallVector<int64_t> meshShape) { | ||
// Get number of chips in mesh. | ||
int64_t numberOfChips = std::accumulate(meshShape.begin(), meshShape.end(), 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't be a problem here, but this is one of the biggest pitfalls of std::accumulate
. If you look at https://en.cppreference.com/w/cpp/algorithm/accumulate, you will see that the result is of type T
which is the accumulator's type. In this case it's an int
(which will default to int32_t
on most systems), so even though you wanted int64_t
multiplication you pretty much end up with int32_t
multiplication with some casts along the way. You should use int64_t{1}
or 1ll
(first is preferable as it's more explicit and has guaranteed semantics).
@@ -95,39 +100,74 @@ mlir::tt::SystemDescAttr::getDefault(MLIRContext *context) { | |||
dramCores.push_back(CoreCoordAttr::get(context, y + gridShape[0], x)); | |||
} | |||
} | |||
|
|||
// Get number of chips indices. | |||
std::vector<uint32_t> chip_indices_list(numberOfChips); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see it's already a mixed bag of cases in this function, but we should at least prefer camelCase for new code.
Also, prefer LLVM's ADT and algos to STL one's, i.e. you can write this as llvm::SmallVector<uint32_t> chipIndicesList = llvm::to_vector(llvm::seq<uint32_t>(numberOfChips));
for (auto i = 0; i < numberOfChips; i++) { | ||
chip_descs.push_back(tt::ChipDescAttr::get( | ||
context, tt::ArchAttr::get(context, tt::Arch::WormholeB0), gridShape, | ||
1499136, 12, (1 << 30), 16, 32, 32, 1024, 1024, 1024, (1 << 30), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should have these values in some header as symbolic values and read them from there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for naming them, but we should keep them contained to this function / this cpp file. Many of these values are just reasonable placeholders and not necessarily reflective of any real system so we wouldn't want them in a header and risking other places in the code start using them.
|
||
// Handle wrap around. Assume a default ring topology where final chip | ||
// connects with initial chip. | ||
if ((i + 1) == numberOfChips) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: you don't need branch here, device0 = i
and device1 = (i + 1) % numberOfChips
covers all cases.
if (numberOfChips != 1) { | ||
for (auto i = 0; i < numberOfChips; i++) { | ||
std::vector<int64_t> ethernet_core_coord0_vec = {0, 0}; | ||
std::vector<int64_t> ethernet_core_coord1_vec = {0, 0}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit (because of the range of numberOfChips
and size of ethernet_core_coord1_vec
): we should probably hoist this out of the loop.
tt::SystemDescAttr::name, | ||
tt::SystemDescAttr::getDefault( | ||
&getContext(), | ||
llvm::SmallVector<int64_t>(meshShape.begin(), meshShape.end()))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: llvm::to_vector(meshShape)
should also do the trick.
This change adds support for multi-device default values for the system descriptor in the compiler. It assumes a linear mesh connected in ring topology connections. This does not map to a physical layout but allows the user to run compiler passes on a multi-device configuration without having to provide a multi-device system descriptor.
With change, you can now do this