Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GPU] LSTMSequence and LSTMCell optimization #26767

Open
wants to merge 117 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 104 commits
Commits
Show all changes
117 commits
Select commit Hold shift + click to select a range
9ce143a
compiles lstm_seq
michal-miotk Jul 18, 2024
027f991
more kernel args
michal-miotk Jul 18, 2024
c191c58
bigger proper run chances
michal-miotk Jul 18, 2024
d461e66
19jul
michal-miotk Jul 19, 2024
01fa2ac
inference works
michal-miotk Jul 19, 2024
1f017fd
in middle of implementation
michal-miotk Jul 21, 2024
5787c7d
problems with inputs get element in kernel
michal-miotk Jul 22, 2024
837db22
not compile
michal-miotk Jul 22, 2024
d4ce531
wipx
michal-miotk Jul 23, 2024
19c268e
wip
michal-miotk Jul 23, 2024
f5273bc
solved problem with too much inputs kernel
michal-miotk Jul 23, 2024
d50b3be
wip
michal-miotk Jul 24, 2024
63a8dfd
more changes
michal-miotk Jul 24, 2024
f54ecc1
wip
michal-miotk Jul 24, 2024
3748a11
wip
michal-miotk Jul 25, 2024
fae772a
wip
michal-miotk Jul 25, 2024
c00ff8a
proper shape for 2 outputs
michal-miotk Jul 25, 2024
1c08b14
Squashed commit of the following:
michal-miotk Jul 29, 2024
6968881
Squashed commit of the following:
michal-miotk Aug 6, 2024
31fcb79
cleaning
michal-miotk Aug 6, 2024
4b16eef
Merge branch 'master' into lstm2
michal-miotk Aug 6, 2024
dcad182
updated to new primitive_base api, disabled lstm to tensor transforma…
michal-miotk Aug 6, 2024
d6aeb54
now it should compile on windows, changed kernel name
michal-miotk Aug 6, 2024
9688f63
deleted cell, deleted input_forget
michal-miotk Aug 6, 2024
5003d47
generic primitive
michal-miotk Aug 7, 2024
5937b14
fix compilation problem, smaller lws
michal-miotk Aug 7, 2024
8b31a91
wip
michal-miotk Aug 8, 2024
2ff5a7c
wip, not resolved fail on dynamic
michal-miotk Aug 8, 2024
2d9e5c6
fixed failing dynamic test
michal-miotk Aug 9, 2024
702e941
change name cldnn::rnn -> cldnn::lstm_seq
michal-miotk Aug 9, 2024
f4d3b71
fix bad order of inputs in lstm_elt constructor
michal-miotk Aug 12, 2024
0c7103c
changed input order in kernel
michal-miotk Aug 12, 2024
f37482a
Squashed commit of the following:
michal-miotk Aug 13, 2024
0058c57
Merge branch 'master' into lstm2
michal-miotk Aug 13, 2024
1ac26d3
fix bad initialization in kernel
michal-miotk Aug 13, 2024
31040bf
generic kernel
michal-miotk Aug 13, 2024
83aa74f
deleted unnecessary cancelled buffer fusing for cell
michal-miotk Aug 14, 2024
0cce00c
Merge branch 'master' into lstm2
michal-miotk Aug 14, 2024
0e37c8a
bigger local workgroup, turned off buffer fusing for lstm cell
michal-miotk Aug 14, 2024
72b48d1
speedup 1.5x after unrolling loop
michal-miotk Aug 14, 2024
7a747c5
barrier in better place
michal-miotk Aug 14, 2024
9b99f04
direction condition on macro, more macro
michal-miotk Aug 14, 2024
5052e26
reducing temp_cell_state
michal-miotk Aug 14, 2024
aa5d906
Revert "reducing temp_cell_state"
michal-miotk Aug 15, 2024
4b524fd
reducing temp cell state
michal-miotk Aug 15, 2024
c47c943
minor kernel speedup (1fps)
michal-miotk Aug 15, 2024
e486376
deleted unnecessary tab for input and hidden result
michal-miotk Aug 16, 2024
fe72cc8
fix windows compilation
michal-miotk Aug 17, 2024
d62f223
more clear kernel algorithm
michal-miotk Aug 19, 2024
0b1fa3d
wip
michal-miotk Aug 19, 2024
3e1fe20
wip vectorized
michal-miotk Aug 19, 2024
cac921c
more vector
michal-miotk Aug 20, 2024
a165f30
fix for vec size, deleted MAX_SEQ_LENGTH
michal-miotk Aug 20, 2024
8f74962
Revert "fix for vec size, deleted MAX_SEQ_LENGTH"
michal-miotk Aug 20, 2024
732eb52
fix vec_size
michal-miotk Aug 20, 2024
165dd9b
optimizations for bigger gpus
michal-miotk Aug 20, 2024
1b9cc98
fix for windows
michal-miotk Aug 20, 2024
37ab01b
fix conversion error
michal-miotk Aug 20, 2024
c99ddc0
Merge branch 'master' into lstm2
michal-miotk Aug 20, 2024
60a0675
merge most important from lstm23
michal-miotk Sep 24, 2024
1b23648
deleted cout
michal-miotk Sep 24, 2024
7c1bf37
Merge branch 'master' into lstm_with_onednn
michal-miotk Sep 24, 2024
40abc31
mainly changes from code review
michal-miotk Sep 25, 2024
56031d9
merged some_wip
michal-miotk Oct 1, 2024
d954fe8
Merge branch 'master' into lstm_with_onednn
michal-miotk Oct 1, 2024
78cc4fc
correct in registry
michal-miotk Oct 1, 2024
81ca2ed
Merge branch 'master' into lstm_with_onednn
michal-miotk Oct 2, 2024
431d937
deleted level zero, undo changes in visualize_tree
michal-miotk Oct 2, 2024
6b6800f
fix bad name in OV_GPU_PRIMITIVE_IMPL
michal-miotk Oct 2, 2024
db8d75b
returning on conversion to tensor iterator
michal-miotk Oct 3, 2024
a9cd3cf
Squashed commit of the following:
michal-miotk Oct 7, 2024
bfb80ba
Merge branch 'master' into lstm_with_onednn
michal-miotk Oct 7, 2024
57faed2
wip
michal-miotk Oct 7, 2024
7f097ba
wip
michal-miotk Oct 8, 2024
a79eca5
Merge branch 'master' into lstm_with_onednn
michal-miotk Oct 8, 2024
8d4e46b
should work, turned off forcing immad
michal-miotk Oct 8, 2024
00c6237
added lstm_seq and lstm_cell in implementation manager
michal-miotk Oct 9, 2024
31b8ef0
Merge branch 'master' into lstm_with_onednn
michal-miotk Oct 9, 2024
07c1ac2
little cleaning
michal-miotk Oct 9, 2024
a78ef3a
turnedoff immad check for onednn
michal-miotk Oct 9, 2024
5bcab62
deleted unused var
michal-miotk Oct 9, 2024
d564228
redo level_zero_ext to cdb761
michal-miotk Oct 9, 2024
b16bdac
redo mistake change to ov_subgraph
michal-miotk Oct 10, 2024
173b5b2
enabled tests for bfyx kernel
michal-miotk Oct 10, 2024
c8eb682
set to turn on onednn
michal-miotk Oct 10, 2024
43acd2b
turned of impl selection for childs and grandchilds of node, cleaning
michal-miotk Oct 10, 2024
0002e54
added cl_cache extension for *.onednn.cl_cache files
michal-miotk Oct 11, 2024
7741a46
renamed post_optimize_lstm_weights, deleted unused function select_im…
michal-miotk Oct 11, 2024
ac352ea
repair cache tests
michal-miotk Oct 14, 2024
d0fb8b4
Merge branch 'master' into lstm_with_onednn
michal-miotk Oct 14, 2024
a1497c4
initialized memory in infer_request_dynamic tests
michal-miotk Oct 14, 2024
f12aebd
fix for failing caching tests
michal-miotk Oct 14, 2024
6170710
deleted event handling as in case in in_order que it is not used
michal-miotk Oct 14, 2024
01dc7dc
preventing duplicates
michal-miotk Oct 14, 2024
e9bf370
repairs in infer_request set and get tensor
michal-miotk Oct 15, 2024
7158776
fused test repair
michal-miotk Oct 15, 2024
5e21106
set in order queue as default test config
michal-miotk Oct 15, 2024
daa83b5
only bfyx format for lstm_seq
michal-miotk Oct 15, 2024
6af1f3f
skipping conv fusion tests
michal-miotk Oct 16, 2024
02942e5
skipping f16 deconv gpu tests
michal-miotk Oct 16, 2024
f8dbec3
conv_fp32_multi_eltquant skip in conv_fusion_test
michal-miotk Oct 17, 2024
2abe8f8
Merge branch 'master' into lstm_with_onednn
michal-miotk Oct 17, 2024
00826ad
update hash as input format of weights is custom after post_optimize_…
michal-miotk Oct 17, 2024
36b4853
change format in conv_fp32_multi_eltwise_concat basic test
michal-miotk Oct 17, 2024
c358ab3
fix shape calc for onednn, only bfyx supported for lstmocl
michal-miotk Oct 18, 2024
19b1d93
Revert "optimizations for bigger gpus"
michal-miotk Oct 18, 2024
4da2df6
deleted all get_index safe in lstm bfyx kernel
michal-miotk Oct 18, 2024
303bf7d
applying review part1
michal-miotk Oct 18, 2024
bf9f13f
fix check of dimensions
michal-miotk Oct 19, 2024
459e1ad
fix check of input dim lstm cell
michal-miotk Oct 20, 2024
14e53f4
enable onednn for tests ON, LSTMSeq accept bfyx and fbyx format
michal-miotk Oct 20, 2024
063ac02
dot op, vec_size=4
michal-miotk Oct 20, 2024
892131b
Revert "skipping conv fusion tests"
michal-miotk Oct 20, 2024
b539a3f
Revert "conv_fp32_multi_eltquant skip in conv_fusion_test"
michal-miotk Oct 20, 2024
dc8ac73
lstm_weights optimization is part of post_optimize_weights
michal-miotk Oct 20, 2024
a5165a8
fix forbiddnen size_t->int conversion
michal-miotk Oct 20, 2024
cc6b4b5
Revert "update hash as input format of weights is custom after post_o…
michal-miotk Oct 20, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions src/plugins/intel_gpu/include/intel_gpu/graph/program.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@ struct program {
friend class reorder_inputs; // to be removed when possible
friend class remove_redundant_reorders; // to be removed when possible
friend class post_optimize_weights; // to be removed when possible
friend class post_optimize_lstm_weights_and_output; // to be removed when possible
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: alignment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

friend class prepare_primitive_fusing_through; // to be removed when possible
friend class reorder_transfer; // to be removed when possible
friend class fuse_constant_transposes; // to be removed when possible
Expand Down
117 changes: 23 additions & 94 deletions src/plugins/intel_gpu/include/intel_gpu/primitives/lstm.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -8,141 +8,70 @@
#include <vector>
#include <algorithm>
#include "intel_gpu/graph/serialization/activation_serializer.hpp"
#include "rnn.hpp"

namespace cldnn {

/// @brief Weights orders
/// @details Specifies the order in which the weights are concatenated.
/// e.g. [i, o, f, z] : [input, output, forget, block]
/// ONNX order: iofz
/// Caffe order: ifoz
/// pyTorch order: izof
/// OV order: fizo
enum class lstm_weights_order {
iofz,
ifoz,
izof,
fizo
};
namespace cldnn {

struct lstm_elt : public primitive_base<lstm_elt> {
CLDNN_DECLARE_PRIMITIVE(lstm_elt)

lstm_elt() : primitive_base("", {}), clip(0), input_forget(0), offset_order(lstm_weights_order::iofz), direction(0) {}
lstm_elt() : primitive_base("", {}), input_forget(0) {
params.clip = 0;
params.offset_order = lstm_weights_order::iofz;
params.direction = 0;
}

using vec_activation = std::vector<activation_func>;
using vec_activation_param = std::vector<activation_additional_params>;

/// @brief Constructs lstm layer.
/// @param id This primitive id.
/// @param input input primitive id.
/// @param input cell Primitive id containing cell data. Provide empty string if using lstm without cell values.
/// @param clip Clip threshold. Provide 0 if using lstm without activations clip threshold.
/// @param RNNParam common params for rnns
/// @param input_forget Provide 0 if using lstm without coupled input-forget gates.
/// @param offset_order. Order of the concatenated weights, recurrent, and bias. ONNX default is iofz [input, output, forget, block].
/// @param direction default = 0, bidirectional = 1.
lstm_elt(const primitive_id& id,
const input_info& input,
const primitive_id& cell = "",
const float clip = 0,
const bool input_forget = 0,
const std::vector<activation_func> activations = {activation_func::logistic,
activation_func::hyperbolic_tan,
activation_func::hyperbolic_tan},
const std::vector<activation_additional_params> activation_params = {},
const lstm_weights_order offset_order = lstm_weights_order::iofz,
const uint32_t direction = 0)
: primitive_base(id, {input}),
cell(cell),
clip(clip),
input_forget(input_forget),
activations(activations),
activation_params(activation_params),
offset_order(offset_order),
direction(direction) {}

/// @brief Primitive id containing the initial value of the cell state data.
primitive_id cell;
/// @brief Cell clip threshold T. It is applied to the input of activations [-T, T]. No clip is applied if it is not specified.
float clip;
/// @brief Couple the input and forget gates if input_forget is 1. Default is 0.
lstm_elt(const RNNParams& p,
bool input_forget)
: primitive_base(p.id, p.get_inputs(), p.num_outputs, \
{optional_data_type()}, {p.output_padding}),
params(p),
input_forget(input_forget) {}

RNNParams params;
bool input_forget;
/// @brief A list of 3 activation functions for the input, output, forget, cell, and hidden.
std::vector<activation_func> activations;
/// @brief Optional scaling values used by some activation functions. The values are consumed in the order of activation functions.
std::vector<activation_additional_params> activation_params;
/// @brief Weights, recurrent weights, and biases order. [iofz] : ONNX, [ifoz] : Caffe
lstm_weights_order offset_order;
/// @brief direction default = 0, bidirectional = 1.
uint32_t direction;

size_t hash() const override {
size_t seed = primitive::hash();
seed = hash_combine(seed, clip);
seed = hash_combine(seed, params.hash());
michal-miotk marked this conversation as resolved.
Show resolved Hide resolved
seed = hash_combine(seed, input_forget);
seed = hash_range(seed, activations.begin(), activations.end());
for (auto& act_param : activation_params) {
seed = hash_combine(seed, act_param.a);
seed = hash_combine(seed, act_param.b);
}
seed = hash_combine(seed, offset_order);
seed = hash_combine(seed, direction);
seed = hash_combine(seed, cell.empty());
return seed;
}

bool operator==(const primitive& rhs) const override {
if (!compare_common_params(rhs))
return false;

auto rhs_casted = downcast<const lstm_elt>(rhs);

bool act_params_eq = activation_params.size() == rhs_casted.activation_params.size();
for (size_t i = 0; i < activation_params.size(); ++i) {
act_params_eq &= activation_params[i].a == rhs_casted.activation_params[i].a &&
activation_params[i].b == rhs_casted.activation_params[i].b;
}

#define cmp_fields(name) name == rhs_casted.name
return act_params_eq &&
cmp_fields(clip) &&
cmp_fields(input_forget) &&
cmp_fields(activations) &&
cmp_fields(offset_order) &&
cmp_fields(direction) &&
cmp_fields(cell.empty());
#undef cmp_fields
return params == rhs_casted.params && input_forget == rhs_casted.input_forget;
}

void save(BinaryOutputBuffer& ob) const override {
primitive_base<lstm_elt>::save(ob);
ob << cell;
ob << clip;
params.save(ob);
ob << input_forget;
ob << activations;
ob << activation_params;
ob << make_data(&offset_order, sizeof(lstm_weights_order));
ob << direction;
}

void load(BinaryInputBuffer& ib) override {
primitive_base<lstm_elt>::load(ib);
ib >> cell;
ib >> clip;
params.load(ib);
ib >> input_forget;
ib >> activations;
ib >> activation_params;
ib >> make_data(&offset_order, sizeof(lstm_weights_order));
ib >> direction;
}

protected:
std::vector<input_info> get_dependencies() const override {
std::vector<input_info> ret;
if (!cell.empty())
ret.push_back(cell);
if (!params.initial_cell_state.pid.empty())
ret.push_back(params.initial_cell_state);
return ret;
}
};


} // namespace cldnn
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
// Copyright (C) 2018-2024 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//

#pragma once
#include "primitive.hpp"
#include "activation.hpp"
#include <vector>
#include <algorithm>
#include "intel_gpu/graph/serialization/activation_serializer.hpp"
#include "rnn.hpp"


namespace cldnn {

struct lstm_cell : public primitive_base<lstm_cell> {
CLDNN_DECLARE_PRIMITIVE(lstm_cell)

lstm_cell() : primitive_base("", {}), input_forget(0) {
params.clip = 0;
params.offset_order = lstm_weights_order::iofz;
params.direction = 0;
}

using vec_activation = std::vector<activation_func>;
using vec_activation_param = std::vector<activation_additional_params>;

/// @brief Constructs lstm layer.
/// @param RNNParam common params for rnns
/// @param input_forget Provide 0 if using lstm without coupled input-forget gates.
lstm_cell(const RNNParams& p,
bool input_forget)
: primitive_base(p.id, p.get_inputs(), p.num_outputs, \
{optional_data_type()}, {p.output_padding}), \
params(p),
input_forget(input_forget) {}

RNNParams params;
bool input_forget;

size_t hash() const override {
size_t seed = primitive::hash();
seed = hash_combine(seed, params.hash());
seed = hash_combine(seed, input_forget);
return seed;
}

bool operator==(const primitive& rhs) const override {
if (!compare_common_params(rhs))
return false;
auto rhs_casted = downcast<const lstm_cell>(rhs);
return params == rhs_casted.params && input_forget == rhs_casted.input_forget;
}

void save(BinaryOutputBuffer& ob) const override {
primitive_base<lstm_cell>::save(ob);
params.save(ob);
ob << input_forget;
}

void load(BinaryInputBuffer& ib) override {
primitive_base<lstm_cell>::load(ib);
params.load(ib);
ib >> input_forget;
}

protected:
std::vector<input_info> get_dependencies() const override {
return {};
}
};


} // namespace cldnn
Loading
Loading