Skip to content

Conversation

@DrRyanHuang
Copy link
Contributor

@DrRyanHuang DrRyanHuang commented Nov 29, 2023

PR types

Others

PR changes

APIs

Description

  • 新IR Python API适配升级 #58067
  • test_sgn 单测全部通过(已添加静态图pir适配)
  • test_glu 单测全部通过
  • test_take 单测全部通过
  • rank 暂时没有发现单测

DrRyanHuang and others added 2 commits December 4, 2023 11:22
Co-authored-by: WangZhen <23097963+0x45f@users.noreply.github.com>
@DrRyanHuang DrRyanHuang requested a review from 0x45f December 5, 2023 04:59
@0x45f
Copy link
Contributor

0x45f commented Dec 5, 2023

这个pr有适配sgn这个API吗?

DrRyanHuang and others added 7 commits December 7, 2023 05:28
* [Inference] New executor support input hook and fix shape file collection in trt (#59466)

* [Inference] new executor support input hook

* update

* update

* ci(cinn): update cinn ci to support dynamic shape (#58996)

* test=cinnunit

* test=cinnunit

* test=cinnunit

* test=cinnunit

* update seed in top_p_sampling (#59494)

* refine pir interpreter nccl op check (#59515)

* fix compile bug (#59487)

* [auto parallel] add softmax backward spmd rule (#59039)

* [auto parallel] add softmax backward spmd rule

* update test to new eager parallel api

* [PIR+CINN]Part-2 Pybind IrParser.ParseProgram and Polish UT into check_run (#59449)

* [PIR+CINN]Support SubGraph Exporter for Unittest Platform

add unittest

fix UT not take effect

[PIR+CINN]Pybind IrParser.ParseProgram and Polish UT into check_run

fix cmake flasgs

remove VLOG

fix code comment

* fix conflict

* remove print

* fix UT

* add list.sort to fix random

* [Docathon][Fix System Message No.3、9、14、15]  (#58664)

* [PIR] support pd_op.expand convert to cinn_opbroadcast_to   (#59437)

* pir cinn support multi group

* update

* update

* fix pir cinn pow op bug

* remove useless code

* update

* update

* [api.cc] Fix kernel_backend to actual_kernel_backend to enable CPU-fallback (#59499)

* [AutoParallel] Support view mechanism in auto parallel dygraph mode. (#59401)

* [AutoParallel] Support view mechanism in auto parallel dygraph mode.

* Polish code.

* Trans dist_tensor to contiguous.

* Add reshape backward code gen.

* Polish reshape implementation.

* Add yaml.

* Polish code.

* Fix reshape backward problems and add testcase.

* Fix some problems.

* Fix testcase.

* [PIR / Dy2static] Fix mnist - part 1 (#59447)


---------

Co-authored-by: chenzhiyang <1792266893@qq.com>
Co-authored-by: SigureMo <sigure.qaq@gmail.com>

* fit auto_parallel amp for llama (#59497)

* [PIR] Support for If grad execution of ControlFlow ops (#59200)

* support lower to kernel for if_grad op


* fix bugs and warnings

---------

Co-authored-by: zhangbo9674 <zhangbo54@baidu.com>

* [XPU] update communication context (2) (#59482)

* [XPU] update communication context (2)

this is a follow up to #59418

* bugfix

* typo

* 【Program/Backward】fix order of static backward (#59304)

* fix order of static backward

* fix some error in topo order

* remove useless breakpoint

* fix

* fix

* fix

* fix

* fix

* [XPU][PHI Kernels] support fused_rotary_position_embedding for xpu (#59480)

* add solve op into TRT GenericPlugin (#59424)

* 【PIR API adaptor No.174】 Migrate paddle.randint_like into pir (#58953)

* [CI improve] remove useless output for some unittest (#59436)

* Revert "[auto parallel] add softmax backward spmd rule (#59039)" (#59542)

This reverts commit d86f686.

* 【Hackathon 5th No.13】【关联 PR】Added uint8 support for sign kernel -part (#59514)

* ✨ Feature: added uint8 support for sign

* ♻️ Refactor: updated docs and type support

* 🎨 Refactor: updaetd code style

* Fix compiling error when setting WITH_MKL=OFF. (#59283)

* [CINN] remove_fake_test_of_args_parse (#59504)

* add check_grad && refine code (#59539)

* 【pir】modify ir_backward to build If grad (#59520)

* add if_grad_op

* add if_grad_op

* modify

* [Semi-Auto] Support parallel cross entropy in static semi-auto training (#59187)

* adapt cross_entropy_with_softmax rule to phi

* support parallel cross_entropy in auto parallel

* small fix

* temporary save

* add unit test for parallel_cross_entropy

* resolve conflicts

* small fix

* Add random op to no check list (#59483)

* add no check

* add no check

* Update pir_op_test_no_check_list

* fix syncbn nan (#59089)

* [Paddle-TRT] Enforce use new executor for trt engine memory sharing (#59495)

* enforce use new executor for trt engine memory sharing

* update

* add ut

* fix bug

* [auto parallel]Open matmul auto parallel test in OpTest (#59503)

* test framework supports to_static and prim

* open check_auto_parallel in matmul op test

---------

Co-authored-by: cyber-pioneer <chenzhuo@tju.edu.cn>

* [AutoParallel] fix converter for 0-dim tensor (#59523)

* 【Hackathon 5th No.37】为 Paddle 新增 householder_product API -part (#58214)

* add householder_product api

* fix codestyle

* fix bug, detail describe for tests

* codestyle

* fix type error desc

* codestyle, modify atol

* assert when k > n, support complex, add more test units

* codestyle

* remove unused norm func

* codestyle

* modify api param:A to x

* restore noqa

* remove unused func

* Update python/paddle/tensor/linalg.py

Co-authored-by: zachary sun <70642955+sunzhongkai588@users.noreply.github.com>

* Update python/paddle/tensor/linalg.py

Co-authored-by: zachary sun <70642955+sunzhongkai588@users.noreply.github.com>

---------

Co-authored-by: zachary sun <70642955+sunzhongkai588@users.noreply.github.com>

* [AutoParallel] set comm dist_attr for dist_matmul (#59524)

* [AutoParallel] rm infershape for dist_embedding (#59526)

* [AutoParallel] rm infershape for dist_embedding

* [AutoParallel] rm infershape for dist_embedding

* Update dist_embedding.py

* Add static graph support for "scaled_dot_product_attention" (#59498)

* Added static graph support for 'scaled_dot_product_attention'

* Add static graph support for "scaled_dot_product_attention"

* [AutoParallel] Fix optimizer InferMeta. (#59246)

* Fix optimizer infermeta.

* Add testcase.

* [XPU] Supports the different types of post dynamic quantization for conv and fc (#59307)

* fix bug (#59516)

* add linux compile requirements (#59443)

* add linux compile requirements

* update

* update

* polish code as PR 59200 review comments (#59549)

* [PIR]Fix call InterpolateInferMeta in PIR (#59550)

* [auto parallel] Shard optimizer API (#59342)

* [auto parallel] add squeeze/unsqueeze backward spmd rules (#59547)

* [PIR & Inference] Fix cf pass and mkldnn log (#59555)

* instance norm passed (#59541)

* 【Hackathon 5th No.14】Add combinations API to Paddle (#57792)

* [PIR] Support while grad exe (#59496)

* support lower to kernel for if_grad op

* add PD_DECLARE_KERNEL

* fix

* fix

* fix

* resolve conflict

* update

* update

* update

* update

* update

* update

* fix

* update

* update

* update

* update

* update

* update

* update

* update

* fix bugs and warnings

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: chen2016013 <cx2016013@163.com>

* [CINN] Translate pir::Tensor to ir::Tensor with Symbolic shape (#59196)

* [CINN] Replace fake SymbolicDimOp

* i[CINN] Translate pir::Tensor to ir::Tensor with Symbolic shape

* fix

* fix

* fix compile

* fix compile

* fix

* fix bug in static shape

* runtime(cinn): update cinn jit instruction to support dynamic shape (#59470)

* runtime(cinn): update cinn jit instruction to support dynamic shape

* runtime(cinn): update cinn jit instruction to solve conflict

* [PIR] add python api for while op (#59565)

* [CINN] Move strong constraint branch unittest directory (#59501)

* Move strong constraint branch unittest directory

* Remove CINN_ONLY

* Remove add_subdirectory

* update variable_length_mem_eff_attn's unittest (#59568)

* add comments (#59372)

* add comments

* fix bugs

* fix bugs

* add cuda place test and precision test for if_op_test (#59564)

* Solve the problem of scale saving in PTQ (#59441)

* [XPU] add some bf16 ops (#59505)

* [PIR] Refine code for while_grad execute (#59566)

* support lower to kernel for if_grad op

* add PD_DECLARE_KERNEL

* fix

* fix

* fix

* resolve conflict

* update

* update

* update

* update

* update

* update

* fix

* update

* update

* update

* update

* update

* update

* update

* update

* fix bugs and warnings

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* add debug

---------

Co-authored-by: chen2016013 <cx2016013@163.com>

* [oneDNN] optimize elementwise_add/sub for swin_transformer (#59421)

* [PIR] adjust the member fucntion name  in if_op (#59567)

* fix multi encoder adaptive seqlen wrong (#59548)

* sharding stage 1 check diff lr and use param decay fn (#59537)

* [PIR] fix ci conflict, test=document_fix. (#59595)

* [auto parallel]Add matmul auto parallel test (#59507)

* test framework supports to_static and prim

* add matmul auto parallel test in op test

---------

Co-authored-by: cyber-pioneer <chenzhuo@tju.edu.cn>

* Polish bfloat16 main_grad unittest for data parallel and sharding stage1. (#58842)

* Polish bfloat16 main_grad unittest for data parallel.

* Optimize unittest of sharding stage1.

* Polish codes and add check of weights.

* Polish unittest for sharding stage1.

* Revert some minor changes.

* Polish the compare of parameters.

* Compute loss in float32.

* [auto parallel] shard optimizer enhance (#59575)

* 【PIR API adaptor No.36】check_numerics (#58879)

* fix docs bugs (#59285)

* fix docs bugs

* modify as suggested

* feat(new-ir): support nan_to_num (#59469)

* [PIR]Remove refresh_stopgradient in backward (#59579)

* remove refresh_stopgradient()

* remove test

* 【PIR/Dy2static】Fix pir test ---- PART II (#59532)



---------

Co-authored-by: chenzhiyang <1792266893@qq.com>

* [Dy2St] lower time > 100 in dy2st unittests (#59506)

* 【Op Profiling】Add operator run time profiling feature (#58809)

* [op] operator profiling

* [op] operator profiling

* fix ci

* remove redundant code

* code cleaning

* minor fix

* minor fix

* minor fix

* major code cleaning

* minor fix

* minor fix

* minor fix

* fix ci

* minor fix

* minor code style fixes

* minor code style fixes

* minor code style fixes

* minor fix

* fix ci

* minor fix

* minor fix

* minor code style fix

* minor code style fix

* fix compile err

* [auto parallel] add softmax backward spmd rule  (#59545)

* [auto parallel] add softmax backward spmd rule

* update test to new eager parallel api

* Revert "[auto parallel] add softmax backward spmd rule (#59039)"

This reverts commit d86f686.

* [auto parallel] add softmax backward spmd rule

* update test to new eager parallel api

* [Prim][PIR] full_like forward sink (#59534)

* prim full_like sink

* merge code

* update full_like

* remove code in rules.py

* adjust softmax code

* [OneDNN] Fix accuracy for matmul+binary_add fusion (#59527)

* [Reshard] Support r to p on cross mesh (#59367)

* wip: r2p reshard

* wip: fix suitable

* feat: r2p cross mesh

* fix: strategy registry

* fix: align with new api

* [Reshard] Support p to r on cross mesh (#59621)

* fix: typo

* fix: typo

* feat: reshard p2r

* 【PIR API adaptor No.314】 Migrate vander into pir (#59573)

* [Docathon][Fix System Message No.2] (#59295)

* fix system message in website

* fix

* fix

* fix

* [xpu] Register fast_where and forbid pass if remove cast bool (#59594)

Co-authored-by: newway <liuwei345@gmail.com>

* fix behavior of put_along_axis and take_along_axis 易用性提升No.43 (#59163)

* fix behavior of put_along_axis and take_along_axis

* fix error

* fix take_along_axis used in stat

* update

* fix build error

* add test for error

* add param broadcast

* use origin example

* add param include_self

* update param name

* modify ut

* update test case

* add error UT

* update

* [PIR & Inference] Add fused_weight_only_linear_pass (#59366)

* [Inference]Add matmul_to_weight_only_linear_pass

* fix test and rename pass

* fix the comment of test

* fix ci

* fix: fix test

* refactor: refactor pass and test

* refactor: refactor pass

* refactor: add fp16 test

* refactor: refactor pass

* refactor: refactor the opt_level

* fix: fix typo

* fix: fix ci compile error when without gpu

* refactor: refactor pass and test

* fix: fix conflict

* fix: fix conflict

* refactor: refactor opt_level in pass_test to 4

* docs: 增加 docstring 内容丰富中英文文档 (#59271)

* docs(paddle.lr): 丰富 docstring 内容

增加 class: LRScheduler 中文文档中介绍的 17 种策略至 docstring

* docs(paddle.vision.transforms): 增加docstring更具体的示例

修改 RandomHorizontalFlip 和 RandomVerticalFlip docstring 的示例代码

* [Paddle Inference] modify a check statement in memory_optimize_pass.cc (#59638)

[Paddle Inference] modify a check statement in memory_optimize_pass.cc

* [auto parallel] fix pp reshard (#59598)

* [Paddle-Inference] GQA support fix mmha bug (#59351)

[Paddle-Inference] GQA support fix mmha bug

* 【pir】deal with  if build stop gradient  (#59585)

* merge

* add stop gradient

* comment

* [PIR & Inference] Add conv2dAddPass and conv2dAddActPass and conv2dAdd2ActPass (#59391)

* add conv2d_add_fuse_pass

* add all conv2d_fuse_pass and Modify passtest

* bug fix

* code style

* code style

* code style

* bug fix

* code style

* code style

* add test for new PassPattern

* [Auto Parallel]Fix coverage in distributed mode (#59560)

* test framework supports to_static and prim

* test coverage

* fix coverage

* support distribute coverage

---------

Co-authored-by: cyber-pioneer <chenzhuo@tju.edu.cn>

* rewrite master weight for amp training (#59052)

* rewrite master weight for amp training

* some optimizers does not support master weight

* cinn(dynamic): support run exp sub subgraph with dynamic shape graph (#59640)

 修改broadcast的compute,使得output shape和input shape 一致的计算支持动态形状
 联调bucket机制,在不进行op schedule、group schedule的情况下可以跑通流程
 增加exp sub动态形状的子图单测。

* fix (#59589)

* 【PIR API adaptor No.253、310】Migrate cumulative_trapezoid,trapezoid into pir (#59481)

* 【PIR API adaptor No.238、239、240、241】 Migrate nn.initializer.XavierInitializer, nn.initializer.MSRAInitializer into pir (#59419)

* 【PIR API adaptor No.261、273、283、285、286、313、315】 Migrate is_tensor/median/nanmean/nansum/neg/Unflatten/var into pir (#59509)

* Add a pass to insert QDQ nodes before skip connection (#59009)

* [PIR]  Translate TensorArray Related Ops (#59633)

* translate tensor array related ops and adapt thier executions

* fix

* fix

* fix

* to trigger CI

* fix

* fix for windwos bug

* fix jit_setitem

* test

* update dygraph auto_parallel en API docs. (#59557)

* 【auto parallel】llama attention 子图验证 (#59491)

* auto parallel:llma attention and mlp

* llama mlp、attention dp + mp

* remove log

* skip test

* polish

* polish

* polish

* [Cmake 治理] Move DDim etc. to common (#59105)

* fix conflict

* exception

* kunlun ci

* WIN_CI

* setup.py

* bug_fix

* hash

* auto_code_gen_WIN_CI

* inference_CI

* use_common_enforce

* delete pir_enforce

* delete_error

* change_cmake

* conflict

* cmake

* mac_CI

* inference_copy

* delete_pybind_common

* paddle_test

* split ddim constructor

* cc_test

* use cinn::common

* copy_infer

* delete_layer_test_new

* bug_fix

* infer

* fix inference bug

* conflict

---------

Co-authored-by: winter-wang <1030748926@qq.com>

* [Fix UT] fused_weight_only_linear_pass unittest modify (#59651)

* unittest fix

* code style

* code style

* [PIR] Add check for If grad test (#59590)

* support lower to kernel for if_grad op

* add PD_DECLARE_KERNEL

* add debug

* add precision test for if_op_test

---------

Co-authored-by: zhangbo9674 <zhangbo54@baidu.com>

* [PIR]Gen check DataType (#59354)

* [Auto Parallel] Update Gradient Synchronization in Static Mode (#59057)

* completion bw partial

* debug

* bugfix

* insert param grad allreduce by partial

* reorder allreduce for opt

* fix typoes

* add grad sync unitest

* sp unitest

* fixed unitest

* [Paddle-TRT] custom operator support generating plugin automatically (#58976)

* [Paddle-TRT] custom operator support generating plugin automatically

* [AutoParallel][PIR] Support new ir for the visualize tool (#59195)

* merge from openvino master

* add InterpreterRunTime() to record interpreter's run time

* add profiler helper static to produce json file

* add color map and support perfetto format

* recover codes

* control include env for gpu_timer.h

* fix logic for profiler_helper_static.py

* fix build error

* fix build error

* recover thirdparty

* add flag control: not support new ir now

* set auto_parallel_profiler flag to false

* fix

* add auto_parallel_profiler as command parameter

* fix value name

* support gettimeofday for win env

* fix win build error

* fix win build error

* use job_type_to_id

* Fixed repeatedly timing the same stream

* add step line for timeline

* add step timeline and fix logic when job overlap

* update time record logic

* fix bug when start profile start from none zero step

* fix note

* remove FLAGS_auto_parallel_profiler

* use run config instead FLAGS_auto_parallelxx

* fix color map logic

* fix color map logic

* fix bug when log step does not start from 0

* fix

* fix

* don't use set_enable_auto_parallel_profiler

* fix bug

* disable auto_parallel_profiler when not open flag by command line

* fix bug

* remove resettime

* fix build bug

* fix

* remove set enable

* fix build error

* fix build error

* fix build error

* fix ci error

* fix

* fix run error

* fix

* fix

* fix calculate_stream_timer logic

* remove fluid head

* fix build error

* set default value for enable_job_schedule_profiler

* support new ir

* fix is_communication_op logic

* fix

* fix build error

* recover IsCommunicationOp

* fix code_style

* [CINN]Refine StaticShapeGroupScheduler code while learning logic (#59540)

* [CINN]Refine StaticShapeGroupScheduler code while learning logic

* fix comment

* [Dy2St] Run PT in SOT mode only (#59658)

* fix clang-tidy modernize-use-nullptr error (#59626)

* [CodeStyle][ruff] clean some F401 step: 5 (#59576)

* Enhanced RNG State Management with Index-Based Control for Graph-Safe Tensor Parallelism (#58859)

* allow multiple rng state in generator

* fix get_rng_state

* Disable test for coverage cuda12 (#59556)

* Disable test for coverage cuda12

* Fix

* fix cmake

* fix cmake

* fix dist test

* fix

* fix

* [Paddle-TRT] Add size op convert (#59563)

* [Paddle-TRT] Add size op convert

* [PIR]Open more PIR UTs (#59657)

* [CINN] Make Resize Buffer Safer (#59014)

Make Resize Buffer Safer, the old buffer resize didn't consider load, current we add support for it

This PR also contain some code of safer UpdateBufferAxis of #59209

We will also clean it in the 59209 PR

* [PIR]Fix nansum fp16 ut (#59666)

* Polish the error message and check for flash_attn. (#58345)

* Polish codes of flash_attn.

* Add more log for debugging.

* Allow dq, qk, or dv to be nullptr in flash_attn_grad.

* Use temporary tensor when k or v does not have gradient and add unittest.

* Add skipIf in unitttest.

* add md5sum for tensor (#59606)

* change_cc_test_old_f (#59619)

* [Dy2St] Add `enable_to_static_guard` for dy2st uts (#59670)

* Fix block_idx bug for auto parallel (#59596)

* Fix block_idx bug for auto parallel

* Fix typos

* fix (#59645)

* support_windows_cuda12 (#59665)

* fix assign kernel (#59609)

* add backward infer log (#59543)

* fix bug (#58400)

* [PIR] Add Three OPs with ReifyReturnTypeShapes (#58368)

* Add ReifyReturnTypeShapes

* Fix UT & fix op output & DimOfShapedTypeOpInterfacePattern

* Add some to do

* Alias DDim in phi (#59671)

* [3/4] CUDNNv8 ResNet Fusion: Add fused_donv_drelu_dbn OP (#58986)

* Rename output

* Add fused_dconv_drelu_dbn_op

* Add to CI test

* Review changes

* fix typos (#59679)

* fix typos, test=document_fix

* fix typos, test=document_fix

* [PIR]Choose op by value type in PIR apis (#59605)

* [Add] test atleast_xd pir backward  (#59365)

* [Change] keep tensor from input

* [Change] atleast input for pri

* [Change] test for pir

* [Change] pir grad from z to x

* fix test_decayed_adagrad_op (#59486)

* fix sharding stage3 main_grad bug (#59611)

* [Dy2St] pir dy2st unittest verification - Part 13 (#59517)


---------

Co-authored-by: SigureMo <sigure.qaq@gmail.com>

* [CodeStyle][ruff] clean some F401 step: 6 (#59584)

* clean F401

* fix

* clean

* RollBACK `python/paddle/base/__init__.py`

* RollBACK `python/paddle/__init__.py`

* rollback

* [XPU] add some bf16 ops and update xdnn (#59653)

* [PIR]support set value attribute by value. (#59656)

* [AutoParallel] complete chunk_id attr in backward&update phase (#59522)

* [AutoParallel] complete chunk_id attr in backward&update phase

* Update backward.py

* update fill_constant complete

* update complete chunk_id

* complete loss_grad_op

* fix complete first grad op

* [Dy2St] Remove duplicate dy2st resnet test (#59492)

* [auto parallel] stack support 0d tensor (#59655)

* Wint8 gemm and gemv opt (#59291)

* fpAintB split-k

* workspace

* fix error

* just_for_llama13b_bsz64-128

* llama13 opt

* fix scale type of weight ony quant

* draft gemv batched

* accuracy fix

* m size dispatch for gemv and gemm

* fit dispatch

* refine gemv

* remove useless kernel

* refine

* fix bug for split-k-limit

* fix bug for half scale

* weight quant kernel fit for half scale

* fix bf16 compile

* fix sm70 autogen

* fix sm70 compile error

* fix code style

* update

* update

* code-style

* code-style

* windows compile fix

* code-style

* fix merge bug

---------

Co-authored-by: wwbitejotunn <wwbitejotunn@outlook.com>

* support reduce_min reduce_pro mod flood_div ops (#59650)

* [Dy2St] `enable_to_static_guard` 推全 6-15 (#59691)

* [Dy2St] pir dy2st unittest verification - Part 14 (#59546)



---------

Co-authored-by: SigureMo <sigure.qaq@gmail.com>

* [compiler opt]change_cc_test_old (#59620)

* update

* update

* update

* Update CMakeLists.txt

* Update CMakeLists.txt

* [CINN] Strong constraint branch support dynamic shape (#59309)

* Strong Constraint Branch

* NoInlineTranslator (#84)

* Adapt adt to pir

* Move FLAGS_cinn_enable_map_expr_schedule location

* Apply new group schedule

* Remove useless log

* Remove adt unittest

* Solve merge conflicts

* Fix typo

* Fix merge conflicts

* Add unit test

* Fix cmake

* Add test_cinn_sub_graph_map_expr

* transfer origin code to develop

* Fully support ShapeDialect

* TranslateDimExpr

* Fix dim_expr_simplifier

* Generate ir with symbolic

* Solve compile error

* SymbolicDim to SymbolicDimOp when Translate ir

* Solve some conflict

* Solve conflict

* ShapeAnalysis

* fix compile error

* Disable dynamic shape

* Add scale generate_equation

* Add cpp unittest

* Add cpp unittest

* Change VLOG priority

* Clean IndexDot

* Unittest

* Cancel SimplifyDotBI

* UniqueId ResetSeqNumber

* input_spec and kDynamic

* map_expr_test

* [Auto Parallel] Fix run scripts for hybrid unittests (#59701)

* Fix program_translator bug for subblock (#59724)

* 【AutoParallel】Promote fuselinear pass in auto-parallel (#59188)

* add fused_linear_promotion pass

* add promote_fusedlinear pass

* support sp without dp

* delete some log

* fix bug in process_mesh

* add sp+dp support

* fix bug when dp_group is None

* modify code according to review

* add unit_test

* add unit_test

* fix the test

* 【PIR / Dy2static】Fix pir test 3 (#59696)


---------

Co-authored-by: chenzhiyang <1792266893@qq.com>

* [SOT] Add `paddle.metric` to paddle API (#59698)

* [Dy2St] Run original partial program call to avoid CUDA error 700 (#59687)

* fix test_activation_op (#59618)

* Fix sot eval and test len (#59408)


---------

Co-authored-by: SigureMo <sigure.qaq@gmail.com>
Co-authored-by: zhangbo9674 <zhangbo54@baidu.com>

* Add introduction about Open Source Community to Readme (#59704)

* Don't Merge

* make conflict

* reset

* add community

* Update communication section in README.md

---------

Co-authored-by: jzhang533 <jzhang533@gmail.com>

* 【auto parallel】Llama decoder 子图验证 (#59580)

* auto parallel:llma attention and mlp

* llama mlp、attention dp + mp

* remove log

* remove log

* polish

* polish

* polish

* polish time out

* polish time out

* 【PIR API adaptor No.161、162】Migrate `paddle.vision.ops.nms` `paddle.nn.functional.one_hot`  into pir (#58735)

* 【PIR API adaptor No.28】Migrate `paddle.vision.ops.box_coder` into pir (#59616)

* [PHI]Open PHI shared Lib by default (#59345)

* open phi shared default

* format code

* update code

* fix bugs

* open phi shared

* fix test_lstm for pir (#59608)

* [Semi-auto]Add srp in dist_tensor (#59683)

* add srp in disttensor

* add srp in disttensor

* add srp in disttensor

* add srp in disttensor

* add srp in disttensor

* [OneDNN] Optimize fused elementwise kernel (#59663)

* [PIR] Relax the restrictions of IF Verify Region (#59689)

* fix

* Fix program_translator bug for subblock

---------

Co-authored-by: chenruibiao <chenruibiao@baidu.com>

* Merge into develop part-5 (#59644)

* part-3 cherry from: add check for cembedding (#55621)

* part-3 fix cherry from: add check for cembedding

* part-3 fix c_embedding

* fix test_gpt_with_pir caused by pir

* part-3 cherry from: [Distributed] Support dp/sharding overlap in  virtual pp (#55651)

* Add virtual pp and dp overlap

* add sharding/dp overlap

* add dp/vpp overlap

* fix code

* fix log

* part-3 cherry from: [cherry-pick] Integration flash attention 2 (#56015)

* [FlashAttn] add flash randomness control (#52902)

* add flash randomness control

* fix VLOG undefied

* [WIP] Integration flash attention 2 (#55758)

* Work for fa-2 padded fwd. Code to be cleaned.

* Work for fa2 unpadded fwd.

* Work for padded-bwd, dk get small diff on np.random.seed(0)

* Anyway I pass paddle's utest, except return softmax without dropout.

* Clean code.

* Modify interface.

* Clean code and add some check.

* Easy compile for dev.

* Fix ci.

* Fix ci-build.

* Add std c++17 option again.

* Limit max job when compiling fa2.

* Remove const_cast

* Add fwd params, to be cleaned.

* Clean code.

* Add bwd params.

* Clean code.

* Add enforce.

* Use v2.0.4

* Pass RNG state to fa2 capi

* Fix review.

* Add assert

* Skip compile for sm less than 80.

---------

Co-authored-by: Chitsing KUI <kuizhiqing@msn.com>

* part-4 cherry from: fix codestyle (#56066)

* part-4 cherry from(no change): Add assert for static and other plateform (#56044)

* part-4 cherry-pick from: dp and sharding coexist (#56096)

* dp and sharding coexist

* dp

* part-4 cherry from: [Distributed] Add debug information for processgroupnccl (#56441)

* add debug information

* fix log

* fix log

* add detach for pp

* part-4 cherry from: [BugFix]Fix bug in paddle.device.cdua.synchronize() (#56451)

* fix bug in synchronize

* fix bug in synchronize

* part-4 cherry from: add fused gradient (#57048)

* part-4 cherry from: [Distribtued] add eager_communication_connection for eager mode in nccl (#57517)

* add eager_nccl_connection

* add eager_connection

* add eager_connection

* part-4 cherry from: Add auto growth allocator for CUDA pinned allocator (#57625)

* fix h2d bandwidth

* remove useless flags

* fix cherrry pick #56066

* part-5 cherry from: Add allocation debug FLAGS (#57797)

* Add allocation debug FLAGS

* add sync after value set

* refine flags

* part-5 cherry from: fix softmax backward (#57971)

* part-5 cherry from: [Distributed]Optimize memory in processgroup (#58299)

* optimize memory in processgroupnccl

* optimize memory in processgroupnccl

* optimize memory in processgroupnccl

* optimize memory in processgroupnccl

* part-5 cherry from: [Distributed]Add unbalance batch for virtual pp (#58383)

* add unbalanced batch for vpp

* add unbalanced batch for vpp

* add unbalanced batch for vpp

* fix

* fix comments

* fix kunlun compatibility issues

* fix test_fused_rotary_position_embedding.py

* fix allocator.h

* tinyfix

* fix conflicts

* fix new ir translator c_embedding failure

---------

Co-authored-by: ShenLiang <1422485404@qq.com>
Co-authored-by: umiswing <umiswing@foxmail.com>
Co-authored-by: Chitsing KUI <kuizhiqing@msn.com>
Co-authored-by: niuliling123 <51102941+niuliling123@users.noreply.github.com>
Co-authored-by: liuzhenhai93 <liuzhenhai93@outlook.com>
Co-authored-by: sneaxiy <32832641+sneaxiy@users.noreply.github.com>

* 【Hackathon 5th No.112】move read_file to phi - part (#59359)

* move read_file to phi, but it run in dygraph, it may cause some bug

* remove the static gen

* fix bug

* fix the code stype

* move the file to cpu

* remove the #include

* [PIR] Add artificial instruction: builtin_combine (#59669)

* fix

* fix

* fix

* [auto parallel] Dist tensor set value (#59706)

* fix reshape and reshard (#59688)

* [Docathon][Fix System Message No.12] test to fix (#59445)

* Fix pir comiler name id bug (#59642)

* fix pir compiler name id bug

* remove usless code

* remove code

* fix bug

* [Dy2St] pir dy2st unittest verification - Part 12 (#59378)

* add `test_legacy_and_pir_exe_and_pir_api`

* update

* add `test_tensor_memcpy_on_cpu` and gpu

* add debug info to yolov3

* fix test_declarative.TestInputSpec

* update yolov3

* judge params by name

* update test_declarative

* restore test_yolov3

* fix place test

* `assertTrue` -> `assertIn`

* revert test_tensor_memcpy_on_cpu

* skip api check gen for `assign_out_`

---------

Co-authored-by: SigureMo <sigure.qaq@gmail.com>

* Remove `:=` and update classifiers (#59733)

* add cross entropy test case (#59693)

* [Dy2static] Fix save problem in dy2static (#59709)

* rename conv2d_fusion op to fused_conv2d_add_act (#59431)

* [PIR] add operand_index api for Operation and fix cf pass (#59738)

* [PIR] add operand_index api for Operation and fix cf pass

* update

* [PIR] fix python cond api error in test_ifelse. (#59708)

* cinn(cmake): fix cmake error in dynamic (#59711)

* cinn(cmake): fix cmake error in dynamic

* cinn(cmake): move symolic subgraph to a subdirectory

* fix typos, test=document_fix (#59754)

* 【Hackathon 5th No.27】为 Paddle 新增 select_scatter API -part (#59343)

* support select_scatter op

* fix example code

* fix sc

* update example

* remove unused files

* add name

* fix conflict

* update

* remove

* update

* add type

* update type

* [SOT]Fix Train/Eval Switch BUG in SOT (#59747)

* [SOT]Fix Train/Eval Switch BUG in SOT

* rm usless code

* add pp bug report (#59762)

* add profiler_range (#59634)

* add profiler_range

* add test cases and fix logic

* Update test_job_schedule_profiler_range.py

* Update CMakeLists.txt

* Update CMakeLists.txt

* add test case

* [PIR+CINN]Support Adapative Parse and Check Feed/Fetch in SubGraph Exporter (#59749)

* [Prim][PIR] stack prim sink (#59713)

* stack sink

* prim stack sink

* stack sink

* [CINN] Fix inline_translator_test compile error (#59737)

* Fix inline_translator_test

* cinncore -> absl

* 【PIR API adaptor No.266、269】 Migrate ldexp, logaddexp into pir (#59582)

* 【PIR API adaptor No.80、81】 Migrate fused_layer_norm and FusedDropoutAdd into pir (#59420)

* [SOT] fix sot call locals (#59710)

* [PIR] restore AST+PT test and refine code (#59668)

* [Dy2St] Run PT in SOT mode only

* restore legacy ir test and refine code

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: SigureMo <sigure.qaq@gmail.com>

* [Auto_Parallel] update path lists for pir (#59757)

* [auto parallel] CrossMeshReshard for: p2r, p2s, r2p, r2s, s2p, s2r, s2s. (#59758)

* Remove skip_transform of index_put (#59664)

* add type promotion logic for eager between tensor and tensor (#59518)

* add eager T+T logic.

* remove useless file.

* remove useless line.

* fix

* update

* fix note.

* mv common logic to common dir.

* fix

* remove deal for int.

* remove int.

* only for complie

* ignore other type promotion for now.

* new eager logic.

* fix bug, add where.

* fix

* add dtype check, warnning, rename .h

* add warnning

* add more log.

* bug fix

* fix by comment, make logic of eager_gen more readable.

* [auto parallel] embedding subgraph test (#59681)

* fix bug in xpu pp (#59753)

* fix elementwise inferspmd (#59707)

* 【Dy2static / PIR】fix apply pass + bn accuracy problem + test_resnet.py (#59774)

* fix order of static backward

* fix some error in topo order

* remove useless breakpoint

* fix

* fix

* fix

* fix

* fix

* adjustly ir backward prune routine.

* fix

* fix cross_entropy_with_softmax vjp bug

* fix pre-commit!

* fix

* fix 3 unittest

* fix code format

* fix test_tensor_memcpy_on_gpu.py

* fix test_partial_program.py

* fix test_ptb_lm_v2

* [PIR]Using inplace batch norm in PIR

* fix apply pass error

* fix bn problem.

* fix test_resnet.py uniitest

---------

Co-authored-by: chenzhiyang <1792266893@qq.com>
Co-authored-by: 0x45f <wangzhen45@baidu.com>

* optimize set_value (#59425)

* optimize set_value

* fix none shape

* Distributed SaveLoad implementation for semi-auto strategy (#59659)

* exclude xpu

* demo of running dygraph distributed save load

* support save cross mesh state_dict

* polish

* fix compute overlap bug

* test save load in dp_mp unittest

* fix get local file bug and test

* delete useless files, and rename var

* polish

* format codes

* test use_dist

* fix test

* info to debug

* fix test

* fix

* fix coverage ci

* fix docstring codes

* rename and codestyle

* get rid of use_dist argument

* fix copyright

* polish doc

* polish

* polish

* use tmp file path

* [AutoParallel] add chunk_id attr for dist_op (#59719)

* [AutoParallel] add chunk_id attr for dist_op

* update utils funcs

* update dist ops

* fix dist_ctx

* fix dist_default

* add silu as dist_elemwise

* 【pir】 modify 5/6 case of test_cond.py with append_backward  (#59732)

* first modify

* clear modify

* modify if_grad2

* append_full_like

* add new test

* modify add_n

---------

Co-authored-by: Yuanle Liu <yuanlehome@163.com>
Co-authored-by: 6clc <chaoliu.lc@foxmail.com>
Co-authored-by: lzy <569782149@qq.com>
Co-authored-by: wanghuancoder <wanghuan29@baidu.com>
Co-authored-by: risemeup1 <62429225+risemeup1@users.noreply.github.com>
Co-authored-by: Xiaoxu Chen <chenxx_id@163.com>
Co-authored-by: Aurelius84 <zhangliujie@baidu.com>
Co-authored-by: Jinyuan Huang <88757735+BernieHuang2008@users.noreply.github.com>
Co-authored-by: hong <43953930+phlrain@users.noreply.github.com>
Co-authored-by: RuohengMa <120699764+RuohengMa@users.noreply.github.com>
Co-authored-by: Ghost Screaming <mofengshenjieII@163.com>
Co-authored-by: xiongkun <xiongkun03@baidu.com>
Co-authored-by: chenzhiyang <1792266893@qq.com>
Co-authored-by: SigureMo <sigure.qaq@gmail.com>
Co-authored-by: Leo Chen <chenqiuliang@baidu.com>
Co-authored-by: chen2016013 <111894720+chen2016013@users.noreply.github.com>
Co-authored-by: zhangbo9674 <zhangbo54@baidu.com>
Co-authored-by: XiaociZhang <zhangxiaoci@baidu.com>
Co-authored-by: lijin23 <41257772+lj970926@users.noreply.github.com>
Co-authored-by: zhink <33270771+zhink@users.noreply.github.com>
Co-authored-by: cyberslack_lee <luhputu0815@gmail.com>
Co-authored-by: PommesPeter <54879512+PommesPeter@users.noreply.github.com>
Co-authored-by: Yiqun Liu <Xreki@users.noreply.github.com>
Co-authored-by: BiynXu <62832681+BiynXu@users.noreply.github.com>
Co-authored-by: xiaoguoguo626807 <100397923+xiaoguoguo626807@users.noreply.github.com>
Co-authored-by: Yichen Zhang <32740647+pkuzyc@users.noreply.github.com>
Co-authored-by: xingmingyyj <135400902+xingmingyyj@users.noreply.github.com>
Co-authored-by: ceci3 <ceci3@users.noreply.github.com>
Co-authored-by: Charles-hit <56987902+Charles-hit@users.noreply.github.com>
Co-authored-by: cyber-pioneer <chenzhuo@tju.edu.cn>
Co-authored-by: zhaoyingli <86812880+zhaoyinglia@users.noreply.github.com>
Co-authored-by: coco <69197635+cocoshe@users.noreply.github.com>
Co-authored-by: zachary sun <70642955+sunzhongkai588@users.noreply.github.com>
Co-authored-by: Chenghao Liu <chenghao1652@126.com>
Co-authored-by: Travis-Lee <lixiang.fr@hotmail.com>
Co-authored-by: Liujie0926 <44688141+Liujie0926@users.noreply.github.com>
Co-authored-by: YUNSHEN XIE <1084314248@qq.com>
Co-authored-by: WangZhen <23097963+0x45f@users.noreply.github.com>
Co-authored-by: Yuang Liu <liuyuang@baidu.com>
Co-authored-by: NetPunk <69072522+Patrick-Star125@users.noreply.github.com>
Co-authored-by: zhangbo9674 <82555433+zhangbo9674@users.noreply.github.com>
Co-authored-by: chen2016013 <cx2016013@163.com>
Co-authored-by: Zhang Zheng <32410583+ZzSean@users.noreply.github.com>
Co-authored-by: winter-wang <78149749+winter-wang@users.noreply.github.com>
Co-authored-by: HongyuJia <jiahongyu@baidu.com>
Co-authored-by: Winters Montagne <118546135+WintersMontagne10335@users.noreply.github.com>
Co-authored-by: zhouzj <41366441+zzjjay@users.noreply.github.com>
Co-authored-by: houj04 <35131887+houj04@users.noreply.github.com>
Co-authored-by: Xinyi_LI <xinyi1.li@intel.com>
Co-authored-by: csy0225 <78470701+csy0225@users.noreply.github.com>
Co-authored-by: 张春乔 <83450930+Liyulingyue@users.noreply.github.com>
Co-authored-by: zbt78 <1095497213@qq.com>
Co-authored-by: xiaoye <50870160+xiaoyewww@users.noreply.github.com>
Co-authored-by: ooo oo <106524776+ooooo-create@users.noreply.github.com>
Co-authored-by: kevin <chengyf112@gmail.com>
Co-authored-by: Zhang,Lirong <56445728+zhanglirong1999@users.noreply.github.com>
Co-authored-by: Wen Sun <35923278+HermitSun@users.noreply.github.com>
Co-authored-by: Liuyinfeng <30849840+gitliuyf@users.noreply.github.com>
Co-authored-by: newway <liuwei345@gmail.com>
Co-authored-by: YibLiu <68105073+YibinLiu666@users.noreply.github.com>
Co-authored-by: Longzhi Wang <583087864@qq.com>
Co-authored-by: HankYang <97599656+Hhankyangg@users.noreply.github.com>
Co-authored-by: 周周周 <39978853+zhoutianzi666@users.noreply.github.com>
Co-authored-by: bukejiyu <52310069+bukejiyu@users.noreply.github.com>
Co-authored-by: Zhang Ting <zhangting_2017@163.com>
Co-authored-by: Lu Qi <61354321+MarioLulab@users.noreply.github.com>
Co-authored-by: Leo Chen <39020268+leo0519@users.noreply.github.com>
Co-authored-by: kangguangli <kangguangli@hotmail.com>
Co-authored-by: wuhuachaocoding <77733235+wuhuachaocoding@users.noreply.github.com>
Co-authored-by: liuzhenhai93 <liuzhenhai93@outlook.com>
Co-authored-by: Bo Zhang <105368690+zhangbopd@users.noreply.github.com>
Co-authored-by: winter-wang <1030748926@qq.com>
Co-authored-by: Zhan Rongrui <46243324+zrr1999@users.noreply.github.com>
Co-authored-by: JZ-LIANG <jianzhongliang10@gmail.com>
Co-authored-by: ming1753 <61511741+ming1753@users.noreply.github.com>
Co-authored-by: Sonder <55493212+AndSonder@users.noreply.github.com>
Co-authored-by: Zhenghai Zhang <65210872+ccsuzzh@users.noreply.github.com>
Co-authored-by: gouzil <66515297+gouzil@users.noreply.github.com>
Co-authored-by: Frank Lin <eee4017@gmail.com>
Co-authored-by: tianshuo78520a <707759223@qq.com>
Co-authored-by: lizexu123 <39205361+lizexu123@users.noreply.github.com>
Co-authored-by: Huihuang Zheng <zhhsplendid@163.com>
Co-authored-by: ShenLiang <1422485404@qq.com>
Co-authored-by: Galaxy1458 <55453380+Galaxy1458@users.noreply.github.com>
Co-authored-by: Ruibiao Chen <chenruibiao@baidu.com>
Co-authored-by: xuxinyi389 <104957571+xuxinyi389@users.noreply.github.com>
Co-authored-by: LiYuRio <63526175+LiYuRio@users.noreply.github.com>
Co-authored-by: Tian Zheng <tizheng@nvidia.com>
Co-authored-by: Wang Xin <xinwang614@gmail.com>
Co-authored-by: megemini <megemini@outlook.com>
Co-authored-by: tianhaodongbd <137985359+tianhaodongbd@users.noreply.github.com>
Co-authored-by: Wang Bojun <105858416+wwbitejotunn@users.noreply.github.com>
Co-authored-by: wwbitejotunn <wwbitejotunn@outlook.com>
Co-authored-by: Haohongxiang <86215757+haohongxiang@users.noreply.github.com>
Co-authored-by: lzydev <lizhiyu02@baidu.com>
Co-authored-by: Ligoml <39876205+Ligoml@users.noreply.github.com>
Co-authored-by: jzhang533 <jzhang533@gmail.com>
Co-authored-by: YuanRisheng <yuanrisheng@baidu.com>
Co-authored-by: wentao yu <yuwentao126@126.com>
Co-authored-by: umiswing <umiswing@foxmail.com>
Co-authored-by: Chitsing KUI <kuizhiqing@msn.com>
Co-authored-by: niuliling123 <51102941+niuliling123@users.noreply.github.com>
Co-authored-by: sneaxiy <32832641+sneaxiy@users.noreply.github.com>
Co-authored-by: Zero Rains <linjunlu@zerorains.top>
Co-authored-by: feifei-111 <2364819892@qq.com>
Co-authored-by: zxcd <228587199@qq.com>
Co-authored-by: 0x45f <wangzhen45@baidu.com>
Co-authored-by: Difer <707065510@qq.com>
Co-authored-by: pangengzheng <117730991+pangengzheng@users.noreply.github.com>
@0x45f 0x45f merged commit 217cc54 into PaddlePaddle:develop Dec 11, 2023
@DrRyanHuang DrRyanHuang deleted the DDD branch December 11, 2023 03:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants