Closed
Description
I appreciate your excellent work, especially the example https://juliamltools.github.io/shakespeare-gpt
there are some existing implementations of MultiHeadAttention and Transformer:
FluxML/Flux.jl#2146
https://github.com/chengchingwen/NeuralAttentionlib.jl
https://github.com/chengchingwen/Transformers.jl
can you give a compare with existing implementations ?
why you want to implementation this again ?
Metadata
Metadata
Assignees
Labels
No labels