v0.56.0-rc2
Pre-release
Pre-release
·
779 commits
to main
since this release
Note
If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.
The changelog will now follow, showing the changes from last release.
This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/13043899749
📦 Uncategorized
- #0: Add a way to specify custom dispatch topology
- PR: #17102
- Initialize work_executor_ and set WorkExecutorMode::SYNCHRONOUS in MeshDevice constructor
- PR: #17120
- [UMD] Switching to new coord API
- PR: #17003
- #0: Increase create heads test coverage for Llama shapes
- PR: #16980
- #16503: Optimize semaphore and CB writes
- PR: #16944
- Quiet down the CMake output from dependencies
- PR: #17008
- #16847: update to address the unaligned noc_async_copy from DRAM to L1
- PR: #17125
- remove ND failing big shape in transpose failures that is already tracked and disabled in test_transpose_2d
- PR: #17145
- #14898: pass in pad value to transpose in reduce
- PR: #17142
- rm -rf build.yaml
- PR: #17150
- #16982: Fixing program cache issues with reshape
- PR: #17140
- Delete convd_host_weights and update all tests using conv2d
- PR: #16264
- Update CODEOWNERS for the public API
- PR: #17149
- Aliu/bug fix
- PR: #17151
- #0: Move distributed headers into the public API directory
- PR: #17161
- [Llama3.2-11b-vision] Add support for text-only inference through generator api
- PR: #17105
- Remove references to ARCH_NAME in programming example
- PR: #17182
- Enable PR Gate
- PR: #17098
- #16806: Fixed watcher assert on reshape in debug mode
- PR: #17152
- #0: Fix doc links and make them point to the new location
- PR: #17181
- Remove ARCH_NAME references in prog_examples
- PR: #17185
- Prefer MOLD over LLD over LD
- PR: #17154
- Restore build-wrapper.yaml with updated method
- PR: #17197
- [TT-Train] Fix text generation
- PR: #17195
- LightMetal - Add Flatbuffers into cmake infra/build as cpm package (#17039)
- PR: #17157
- Format broken Kernel APIs Tables
- PR: #17000
- #0: Use MeshBuffer to store MeshWorkload kernel binaries
- PR: #17113
- Increase rms_norm and layernorm coverage for Llama shapes
- PR: #17180
- #17213: update fused and matmul trace sweep tests
- PR: #17214
- Add support for reading from / writing to partial buffer regions that are page size aligned for sharded buffers
- PR: #17089
- #0: Fix clang-format for dataflow_api.h
- PR: #17234
- Kkabilar tt single card perf
- PR: #17231
- [FABRIC] ASYNC_WR_ATOMIC_INC
- PR: #17072
- #9945: Enable and fix SD device perf test
- PR: #17025
- Check context switch pointer for eth cores before resetting
- PR: #17212
- Pull llrt.hpp out of public interface
- PR: #17196
- Update perf and latest features for llm models (Jan 27)
- PR: #17188
- #0: (MINOR) Bump to generate RCs for v0.57.0
- PR: #17252
- Remove dead includes of host_api.hpp from ttnn
- PR: #17220
- Prevent UNet Shallow perf report entry from being overwritten
- PR: #17235
- Fix setup.py for Anaconda
- PR: #17111
- Do not run PR Gate on Draft PRs
- PR: #17272
- Add a timeout for docker image building
- PR: #17285
- LightMetal - New APIs LightMetalBeginCapture() and LightMetalEndCapture() and docs (#17039)
- PR: #17262
- #0: Update distributed tests build to account for arch
- PR: #17287
- #17227: Make dispatch core order match for single chip 2 CQ and multchip 2 CQ topologies
- PR: #17274
- #17215: Add explicit dealloc for mesh buffer
- PR: #17265
- #0: Add validation test for dispatched remote circular buffer config to device
- PR: #17233
- Remove
get_completion_queue_reader_core()
API from Device- PR: #17263
- Add resharding to post all gather layernorm/ rms norm op
- PR: #17156
- #0: Fix ttnn shared libs build
- PR: #17127
- #0: Schedule runs for single card new models tests
- PR: #17141
- Implement JointAttention
- PR: #17079
- Revert "Add resharding to post all gather layernorm/ rms norm op (#17156)
- PR: #17304
- Update memory config when using
view
op with height sharded tensors- PR: #17266
- #16812: Reordering cbs in reduce_init_delta
- PR: #16981
- #17083: Add support for watcher printing phys coords
- PR: #17244
- #16945: Add auto retries to post commit on branches
- PR: #16946
- Remove CommandQueue redirecting usages straight to HWCQ
- PR: #17219
- 1D support for tilize/reshape ops
- PR: #17238
- #16138: W-broadcasting for sharded tensors
- PR: #17101
- #0: Add PR Gate to data pipeline
- PR: #17325
- #15174: Re-enable mistral7b demo test after fw upgrade
- PR: #17305
- LightMetal - Add LoadTrace() API and move TraceDescriptor out of detail namespace (#17039)
- PR: #17313
- #15974: Create device tensors table in report database
- PR: #17293
- Privatize dprint_server.hpp
- PR: #17298
- Uplift Allocator to be its own class + migrate calls to Allocator APIs
- PR: #17268
- Bump CMake in the Docker image
- PR: #17273