Technical Preview
Pre-release
Pre-release
luluman
released this
01 Apr 09:16
·
800 commits
to ind_release
since this release
Features
- Added support for LLM Decoding by utilizing multi-cores to enhance processing efficiency.
- Introduced
fx2mlir
, a new functionality for enhanced MLIR conversion. - Implemented
nnvlc2.0
andnnvlc1.0
local activation and weight operations, respectively, for improved neural network performance. - Enabled
TPULANG
support for operations like sort, argsort, and additional ops, enhancing the language's functionality and flexibility. - Added
cv186x
support inrun_sensitive_layer.py
and for the TDB, expanding compatibility and debugging capabilities. - Introduced new ops and features like
Watchpoint
in TDB andactivation ops
support for scale & zero_point, broadening the range of functionalities available in thetpu-mlir
project. - Supports
BM1690
. - L2mem performs intermediate data exchange for active tensor.
Bug Fixes
- Resolved a variety of bugs affecting backend processes, including issues with the
1684x
backend,permutefuse2
,permutemulconstswap
, and more, improving overall stability and performance. - Fixed several critical issues across
tpulang
, including errors insort_by_key
operation,reshape
operations,where
operation, and more, enhancing the language's reliability for developers. - Addressed bugs in model processing, including fixes for
concat
logic,scale2conv
,scale2conv3d
,instance norm
, and several more, ensuring smoother model optimization and execution. - Corrected errors in the documentation, providing clearer and more accurate information for users and developers.
Documentation Updates
- Updated
tpulang
documentation to include new functionalities and optimizations, making it easier for users to understand and utilize the language effectively.
Performance Improvements
- Optimized TDB and
bmodel_checker
for1684x pcie
mode, significantly reducing processing times and enhancing efficiency for model analysis. - Improved the efficiency of DMA in flash attention operations, ensuring faster data handling and processing.
- Enabled IO tag mode and refined address mode for better memory management and operational flexibility.