Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#2403: Updated default system descriptor attributes in compiler to handle multi-device meshes #2411

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

tapspatel
Copy link
Collaborator

@tapspatel tapspatel commented Mar 9, 2025

This change adds support for multi-device default values for the system descriptor in the compiler. It assumes a linear mesh connected in ring topology connections. This does not map to a physical layout but allows the user to run compiler passes on a multi-device configuration without having to provide a multi-device system descriptor.

With change, you can now do this

ttmlir-opt --ttir-to-ttnn-backend-pipeline="mesh-shape=2,4" test.mlir

@tapspatel
Copy link
Collaborator Author

Upon adding in this feature, I noticed we are not getting chip coordinates for multi-device meshes inserted correctly into the system descriptor. I opened a follow-up ticket for this since it is a more involved change: #2410.

@tapspatel tapspatel linked an issue Mar 9, 2025 that may be closed by this pull request
@tapspatel tapspatel force-pushed the tpatel/issue-2403 branch from 13deec4 to a43fb0a Compare March 9, 2025 14:57
@tapspatel tapspatel requested a review from wooseokTT March 9, 2025 15:01
Copy link
Contributor

@nsmithtt nsmithtt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor organization thing, but overall looks great!

systemDescOptions.meshShape = implicitDeviceOptions.meshShape;
}

pm.addPass(mlir::tt::ttir::createTTIRLoadSystemDesc(systemDescOptions));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to keep the pass options programming just before the call to the pass. I think it's not a big deal to construct a SmallVector multiple times to keep the options and the pass next to each other. Or create the vector once at the top and copy it to each pass.

ttir::TTIRImplicitDeviceOptions implicitDeviceOptions;
implicitDeviceOptions.meshShape = ::llvm::SmallVector<int64_t>(
options.meshShape.begin(), options.meshShape.end());

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

@@ -206,7 +206,7 @@ def TT_SystemDescAttr : TT_Attr<"SystemDesc", "system_desc"> {
let assemblyFormat = "`<` `[` $cpuDescs `]` `,` `[` $chipDescs `]` `,` `[` $chipDescIndices `]` `,` `[` $chipCapabilities `]` `,` `[` $chipCoords `]` (`,` `[` $chipChannels^ `]`)? `>`";

let extraClassDeclaration = [{
static tt::SystemDescAttr getDefault(MLIRContext *context);
static tt::SystemDescAttr getDefault(MLIRContext *context, llvm::SmallVector<int64_t> meshShape = {1});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: const & meshShape.

mlir::tt::SystemDescAttr::getDefault(MLIRContext *context,
::llvm::SmallVector<int64_t> meshShape) {
// Get number of chips in mesh.
int64_t numberOfChips = std::accumulate(meshShape.begin(), meshShape.end(), 1,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't be a problem here, but this is one of the biggest pitfalls of std::accumulate. If you look at https://en.cppreference.com/w/cpp/algorithm/accumulate, you will see that the result is of type T which is the accumulator's type. In this case it's an int (which will default to int32_t on most systems), so even though you wanted int64_t multiplication you pretty much end up with int32_t multiplication with some casts along the way. You should use int64_t{1} or 1ll (first is preferable as it's more explicit and has guaranteed semantics).

@@ -95,39 +100,74 @@ mlir::tt::SystemDescAttr::getDefault(MLIRContext *context) {
dramCores.push_back(CoreCoordAttr::get(context, y + gridShape[0], x));
}
}

// Get number of chips indices.
std::vector<uint32_t> chip_indices_list(numberOfChips);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see it's already a mixed bag of cases in this function, but we should at least prefer camelCase for new code.

Also, prefer LLVM's ADT and algos to STL one's, i.e. you can write this as llvm::SmallVector<uint32_t> chipIndicesList = llvm::to_vector(llvm::seq<uint32_t>(numberOfChips));

for (auto i = 0; i < numberOfChips; i++) {
chip_descs.push_back(tt::ChipDescAttr::get(
context, tt::ArchAttr::get(context, tt::Arch::WormholeB0), gridShape,
1499136, 12, (1 << 30), 16, 32, 32, 1024, 1024, 1024, (1 << 30),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have these values in some header as symbolic values and read them from there.

Copy link
Contributor

@nsmithtt nsmithtt Mar 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for naming them, but we should keep them contained to this function / this cpp file. Many of these values are just reasonable placeholders and not necessarily reflective of any real system so we wouldn't want them in a header and risking other places in the code start using them.


// Handle wrap around. Assume a default ring topology where final chip
// connects with initial chip.
if ((i + 1) == numberOfChips) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: you don't need branch here, device0 = i and device1 = (i + 1) % numberOfChips covers all cases.

if (numberOfChips != 1) {
for (auto i = 0; i < numberOfChips; i++) {
std::vector<int64_t> ethernet_core_coord0_vec = {0, 0};
std::vector<int64_t> ethernet_core_coord1_vec = {0, 0};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit (because of the range of numberOfChips and size of ethernet_core_coord1_vec): we should probably hoist this out of the loop.

tt::SystemDescAttr::name,
tt::SystemDescAttr::getDefault(
&getContext(),
llvm::SmallVector<int64_t>(meshShape.begin(), meshShape.end())));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: llvm::to_vector(meshShape) should also do the trick.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support Scalable Multi-Device Default System Descriptor Generation
3 participants