Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0.2 #126

Merged
merged 101 commits into from
Feb 15, 2023
Merged

0.2 #126

Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
101 commits
Select commit Hold shift + click to select a range
34f247f
rewrite wordpiece with DAT
chengchingwen Nov 27, 2022
7c8437f
show func for wordpiece
chengchingwen Nov 27, 2022
f01f9b9
update test
chengchingwen Nov 27, 2022
341b614
correct way to get unk
chengchingwen Nov 27, 2022
a096de7
Merge branch 'wp_v2'
chengchingwen Nov 28, 2022
0787086
refactor hgf config
chengchingwen Nov 30, 2022
4f980e7
remove old codes
chengchingwen Dec 2, 2022
9e1ca4d
update to pre-0.2.0
chengchingwen Dec 3, 2022
b51d5fd
rewrite loss; borrow api from Flux and add rrule
chengchingwen Dec 3, 2022
bae559c
refine loss
chengchingwen Dec 3, 2022
5698c66
Merge remote-tracking branch 'origin/master' into 0.2
chengchingwen Dec 13, 2022
eab1087
remove include of removed files
chengchingwen Dec 13, 2022
1ce4ad9
update t5 impl with new design
chengchingwen Dec 13, 2022
6b52994
add new layer design
chengchingwen Dec 13, 2022
71c892b
small refine
chengchingwen Dec 13, 2022
faf7ff4
update bert default config
chengchingwen Dec 15, 2022
c10eab0
update bert impl with new design
chengchingwen Dec 15, 2022
1a9875f
fix new t5 impl
chengchingwen Dec 16, 2022
76859bb
new load model api
chengchingwen Dec 16, 2022
04494aa
add bias to embed decoder
chengchingwen Dec 16, 2022
8c097ee
fix embed method
chengchingwen Dec 16, 2022
e64630b
update huggingface based model validate code with new api
chengchingwen Dec 16, 2022
c9656ed
load gpt2 with new design
chengchingwen Dec 23, 2022
86b3575
load gptj
chengchingwen Dec 23, 2022
db261af
update validate code with confing overwrite
chengchingwen Dec 23, 2022
0981564
fix cfg overwrite
chengchingwen Dec 24, 2022
c25eb47
load gpt_neo
chengchingwen Dec 24, 2022
711aaf8
load model utils; force load as float32
chengchingwen Dec 24, 2022
dacaba5
move pe to NAlib
chengchingwen Dec 25, 2022
a75de7c
move atten ops to Layers
chengchingwen Dec 25, 2022
a4f8287
refine code
chengchingwen Dec 27, 2022
36f58df
remove old model code
chengchingwen Dec 27, 2022
8ebdb44
clip tokenizer w/ new cfg
chengchingwen Dec 27, 2022
4c2fb39
remove old test
chengchingwen Dec 27, 2022
f2b30d0
update huggingface task validate code
chengchingwen Dec 27, 2022
68ab3ea
update tokenizer
chengchingwen Dec 30, 2022
7633cd5
test and fix huggingface load
chengchingwen Dec 31, 2022
7f6277f
self attention construct with op
chengchingwen Dec 31, 2022
046a553
allow fast tkr to load without text encoder registed
chengchingwen Jan 4, 2023
f25378e
update huggingface validate tokenizer code
chengchingwen Jan 4, 2023
44de7cf
update test
chengchingwen Jan 4, 2023
09ccfd6
update env
chengchingwen Jan 4, 2023
4eb6149
fix test
chengchingwen Jan 4, 2023
8146e95
move all text encoders to new module TextEncoders; destruct Basic/Gen…
chengchingwen Jan 9, 2023
497f804
organize code
chengchingwen Jan 9, 2023
2fdd272
update test
chengchingwen Jan 9, 2023
dd69a59
Merge branch 'master' into 0.2
chengchingwen Jan 9, 2023
c53bf5b
refine code; remove JSON
chengchingwen Jan 10, 2023
fa4aaaa
organize code
chengchingwen Jan 10, 2023
acf6614
add set_dropout
chengchingwen Jan 10, 2023
0eab94a
zip the vocab file in test
chengchingwen Jan 10, 2023
0a334e7
remove unsed packages
chengchingwen Jan 10, 2023
0af78e8
update hgf_str and export
chengchingwen Jan 10, 2023
0c09e1f
small refine
chengchingwen Jan 10, 2023
3ba74a7
update copy example
chengchingwen Jan 10, 2023
ea8d2ac
make testmode work with Flux Dropout
chengchingwen Jan 10, 2023
1d69d62
update readme
chengchingwen Jan 10, 2023
60e9440
add missing namespace
chengchingwen Jan 10, 2023
dce90a1
improve type stability of todevice
chengchingwen Jan 11, 2023
575131c
update AIAYN examples
chengchingwen Jan 11, 2023
4392b16
update bert examples
chengchingwen Jan 11, 2023
7f0239d
remove/update some docs
chengchingwen Jan 13, 2023
753bf8f
update export
chengchingwen Jan 13, 2023
e9dd168
update changelogs
chengchingwen Jan 17, 2023
c7eaa08
update docs
chengchingwen Jan 18, 2023
d2915f0
encode for seq2seq setting
chengchingwen Jan 18, 2023
6d5baf3
Layers.WithScore for fmap NAlib.WithScore
chengchingwen Jan 18, 2023
70b6b86
refine set_dropout
chengchingwen Jan 18, 2023
cd89f05
fix & test loss
chengchingwen Jan 23, 2023
3d3518d
refine export
chengchingwen Jan 23, 2023
ca3fcee
refine example
chengchingwen Jan 23, 2023
4c71573
update env
chengchingwen Jan 23, 2023
59d254c
some docs
chengchingwen Jan 23, 2023
afbb401
argument non_differentiable
chengchingwen Jan 24, 2023
6a059c5
fix loss on 1.6
chengchingwen Jan 24, 2023
2f43283
refine & docs for textencoders
chengchingwen Jan 29, 2023
e055e13
refine tutorial
chengchingwen Jan 29, 2023
2fb89b5
docstring for HuggingFace
chengchingwen Jan 30, 2023
c7656c8
test decode
chengchingwen Jan 31, 2023
69b9490
relax Functors compat
chengchingwen Jan 31, 2023
f57e3a7
update example
chengchingwen Jan 31, 2023
e3d87f3
remove example
chengchingwen Jan 31, 2023
e27d750
refine docstring
chengchingwen Jan 31, 2023
704889e
fix hgf tokenizer
chengchingwen Jan 31, 2023
0777550
more docs
chengchingwen Jan 31, 2023
f880e39
refine Layers: organize code and better printing
chengchingwen Feb 8, 2023
e5233b7
refine huggingface show
chengchingwen Feb 8, 2023
55d0371
some docstring
chengchingwen Feb 8, 2023
d3e80df
dense without bias
chengchingwen Feb 9, 2023
8499ce2
fix sequence mask type stability with todevice
chengchingwen Feb 11, 2023
fe2caec
remove InternedString
chengchingwen Feb 11, 2023
307bb6e
use N(0, 1) for embedding init
chengchingwen Feb 14, 2023
b0ddec7
use zeros for embed decoder bias init
chengchingwen Feb 14, 2023
71330c0
make model global var
chengchingwen Feb 15, 2023
754c706
ignore indices func; allow ce passing ApplyEmbed
chengchingwen Feb 15, 2023
aaf2188
fix default text encoder construct func
chengchingwen Feb 15, 2023
f6524ae
add robeta
chengchingwen Feb 15, 2023
24b0b1b
replacce view with selectdim
chengchingwen Feb 15, 2023
a9fde55
roberta use gpt2 tokenizer
chengchingwen Feb 15, 2023
0620870
update readme
chengchingwen Feb 15, 2023
838f2a6
update tutorial
chengchingwen Feb 15, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
179 changes: 179 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,179 @@
# ChangeLogs (from 0.1.x to 0.2.0)

v0.2 is a rewrite of the whole package. Most layers and API in 0.1 is removed or changed. Some of them are replaced
with new one. The basic policy is, if a functionality is achievable with a well-maintained package easily, or there
isn't much gain by self-hosting/maintaining it, then we remove the functionality from Transformers.jl.


Here is list of the changes with brief explanation:

## Transformers.Pretrain

The `Pretrain` module is entirely removed, due to the duplication of functionality v.s. `Transformers.HuggingFace`.
We do not host the small list of the origin official released pretrained weights anymore. All use that require a
pretrained weight should refer to `HuggingFace` module. This is a table of the old pretrain name and corresponding
huggingface model name:

| old pretrain name | corresponding huggingface model name |
|--------------------------------|-----------------------------------------|
| `cased_L-12_H-768_A-12` | `bert-base-cased` |
| `uncased_L-12_H-768_A-12` | `bert-base-uncased` |
| `chinese_L-12_H-768_A-12` | `bert-base-chinese` |
| `multi_cased_L-12_H-768_A-12` | `bert-base-multilingual-cased` |
| `multilingual_L-12_H-768_A-12` | `bert-base-multilingual-uncased` |
| `cased_L-24_H-1024_A-16` | `bert-large-cased` |
| `uncased_L-24_H-1024_A-16` | `bert-large-uncased` |
| `wwm_cased_L-24_H-1024_A-16` | `bert-large-cased-whole-word-masking` |
| `wwm_uncased_L-24_H-1024_A-16` | `bert-large-uncased-whole-word-masking` |
| `scibert_scivocab_cased` | `allenai/scibert_scivocab_cased` |
| `scibert_scivocab_uncased` | `allenai/scibert_scivocab_uncased` |
| `scibert_basevocab_cased` | N/A |
| `scibert_basevocab_uncased` | N/A |
| `OpenAIftlm` | `openai-gpt` |


## Transformers.Stacks

The `Stacks` module is entirely removed. `Stacks` provide a small DSL for creating nontrivial `Chain` of layers.
However, the DSL isn't intuitive enough and it also doesn't seems worth maintaining a DSL. We don't provide
direct replacement for this, but for the specific use case of building transformer models, we have a few new
constructors/layers in `Transformers.Layers`.


## Transformers.Basic

The `Basic` module is now destructed and most of the elements in `Basic` is separated to other module/package.

1. `Transformer` and `TransformerDecoder`: The `Transformer`/`TransformerDecoder` layer is replaced with the new
implementation in `Layers` (the `Layers.TransformerBlock`, `Layers.TransformerDecoderBlock`, and friends).
2. `MultiheadAttention`: The implementation of attention operations are move out to
[NeuralAttentionlib](https://github.com/chengchingwen/NeuralAttentionlib.jl). In NeuralAttentionlib, we can use
`multihead_qkv_attention` to do the same computation. Since most transformer variant only use a modified version
of self or cross attention, we do not provied the `MultiheadAttention` layer type. One should be able to redefine
the `MultiheadAttention` layer type with Flux and NeuralAttentionlib easily. For example:

```julia
using Flux, Functors
using NeuralAttentionlib: multihead_qkv_attention, CausalMask

struct MultiheadAttention{Q,K,V,O}
head::Int
future::Bool
iqproj::Q
ikproj::K
ivproj::V
oproj::O
end
@functor MultiheadAttention (iqproj, ikproj, ivporj, oproj)
MultiheadAttention(head, hidden_size, head_size; future = true) =
MultiheadAttention(head, future,
Dense(hidden_size, head_size * head),
Dense(hidden_size, head_size * head),
Dense(hidden_size, head_size * head),
Dense(head_size * head, hidden_size),
)

(mha::MultiheadAttention)(q, k, v) = mha.oproj(multihead_qkv_attention(mha.head,
mha.iqproj(q), mha.ikproj(k), mha.ivproj(v), mha.future ? nothing : CausalMask()))
```

3. `TransformerModel`: This is just a Flux layer with embedding layer, transformer layer, and classifier layer
bundle together. One can define this easily with Flux/Functors API, thus removed.
4. `Positionwise`, `PwFFN`, and `@toNd`: This was originally designed for applying `Flux.Dense` on 3-dim arrays,
but since `Flux.Dense` support multi-dim input now. This isn't needed and thus removed.
5. `EmbeddingDecoder`: Replaced with `Layers.EmbedDecoder`. Name change and support extra trainable `bias` parameter.
6. `PositionEmbedding`: This is replace with `Layers.SinCosPositionEmbed` and `Layers.FixedLenPositionEmbed` for
the old `trainable` keyword argument setting.
7. `crossentropy` with masking: We extend `Flux.logitcrossentropy` and `Flux.crossentropy` with 3-args
input (the prediction, label, and mask) and 4-args input (`sum` or `mean`, prediciton, label, and mask).
8. `kldivergence`: In our use case (i.e. training language model), this is equivalent to cross-entropy, thus removed.
9. `logcrossentropy`/`logkldivergence`: This is a fault design. Originally I would put a `logsoftmax` at the head of
the prediction head. However, that is not only unnecessary but also increasing the amount of memory needed.
One should use `Flux.logitcrossentropy` without the `logsoftmax` directly.
10. `Vocabulary`: Replaced with `TextEncodeBase.Vocab`.
11. `with_firsthead_tail`/`segment_and_concat`/`concat`: These can be implemented with `TextEncodeBase.SequenceTemplate`
and friends thus removed.
12. `getmask`: The attention mask functionality is moved to NeuralAttentionlib. Manually construct attention mask
should use constructor in `NeuralAttentionlib.Masks`.


## Transformers.Layers (new)

The `Layers` module is a new module introduced in v0.2.0. It provide a set layer types for construct transformer
model variants.


## Transformers.TextEncoders (new)

The `TextEncoders` module is a new module introduced in v0.2.0. Basically all old functionality about text preprocessing
are moved to this module, including `WordPiece`, `Unigram`, `BertTextEncoder`, `GPT2TextEncoder`, etc.

## Transformers.BidirectionalEncoder / Transformers.GenerativePreTrain

These modules are removed since we are switching to the `Transformers.HuggingFace` for the pretrained model. The text
encoder are moved to `Transformers.TextEncoders`. Weight loading and conversion functionality are removed. If you
need that, use the tools that huggingface transformers python package provided and make sure the model can be loaded
with pytorch. Then we can use the weight in pytorch format.


## Transformers.HuggingFace

The changes in `Transformers.HuggingFace` are mainly about the configurations and models. The tokenizer/textencoder part
are mostly the same, except the process functions.

### Configuration

For the configuration, the loading mechanism is changed. In previous version, each model type need to define a specific
`HGF<XXModelType>Config` struct where `XXModelType` is the model type name. The reason for that is, for some reason,
huggingface transformers doesn't serialize all the configuration values into the file, but rely on their constructor
with pre-defined default values instead. As a result, some model only need the configuration file, while some need the
python code for the defaults as well. The hgf config struct was more like a interal data carrier. You usually
won't (and actually can't) manipulate the model with it.


In v0.2, we tried to make the process for adding model more automatic, and enable the ability to build model with
different configurations. The struct for holding the configuration is now changed to a parametric struct depending
on a `Symbol` parameter specifying the model type (e.g. `HGFConfig{:bert}`). With this, the specific
`HGF<XXModelType>config` can be constructed on the fly. The `HGFConfig` has 2 field, one for storing the read-only
deserialized object loaded from the configuration file, and another for the overwritten values. This should turn the
config struct into a user level interface.


### Model

For the model part, the main change is that we do not make a 1-1 mapping between the python model/layer class and our
julia layer struct. When one wants to add a new model type, there are actually 2 things need to be done. One is
defining a model forward method that can do the same computation as the python model, and another is defining a
mapping between the python model and the julia model (so that the model parameters/weights can be transferred between
2 language). In the previous version, we chose to make a 1-1 mapping between the model, so that the parameters/weights
loading process can be fully automatic. However, for some reason, huggingface transformers is not reusing their
attention or transformer implementation for each model type. Which means for different model type, even if they are
actually doing the same computation (i.e. the computation graph is the same), the model layout can be different
(e.g. consider the differences between `Chain(Chain(dense1, dense2), dense3)` and `Chain(dense1, dense2, dense3)`).
As a result, these make implementing the model forward method a real pain, and also it's hard to apply optimizations.


We noticed that the model forward method is more important and difficult than the model mapping. On the other hand,
though manually defining model mapping is tedious, it's less prone to go wrong. So instead of making a 1-1 mapping for
fully automatic model loading, we choose to reduce the work needed for forward method. In v0.2, the attention
implementation is switched to NeuralAttentionlib's modulated implementation and we build all internal layers with layer
from `Transformers.Layers`. As a result, layers like `FakeTH<XXLayer>` or `HGF<XXModelType>Attention/MLP/...` are
removed, only the outer-most types remain (e.g. `HGFBertModel`, `HGFGPT2LMHeadModel`...).


Since we want to make it possible to finetune a pretrained model on new dataset/task easily, the model loading would
be a combination of initialization and parameters/weights loading. In normal Flux workflow, you would build a complete
new model and then inplace load the parameter/weight values into the specific layers/arrays in the model. In v0.2, we
combine the 2 step into one `load_model` function, which take the model type, configuration, and a state dictionary (
the term comes from PyTorch, which is a `OrderedDict` of variable names to weights). `load_model` would either
lookup variable from the state dictionary, or initialize with configuration, recursively. As a result,
`load_model!` is removed.


## Behavior Changes

* All text encoder (including `HuggingFace` one) process function returned `NamedTuple`: Some field name changed,
`tok` => `token`, `mask` => `attention_mask`.
* Most layer/model from Transformers.jl would be taking and returning `NamedTuple`.
* For `HuggingFace` model: All input is basically `NamedTuple`. The returned `NamedTuple` field name from the forward
method is also changed.
36 changes: 13 additions & 23 deletions Project.toml
Original file line number Diff line number Diff line change
@@ -1,12 +1,9 @@
name = "Transformers"
uuid = "21ca0261-441d-5938-ace7-c90938fde4d4"
authors = ["chengchingwen <adgjl5645@hotmail.com>"]
version = "0.1.25"
version = "0.2.0"

[deps]
AbstractTrees = "1520ce14-60c1-5f80-bbc7-55ef81b5835c"
Adapt = "79e6a3ab-5dfb-504d-930d-738a2a938a0e"
BSON = "fbb218c0-5317-5bc6-957e-2ee96dd4b1f0"
Base64 = "2a0f44e3-6c83-55bd-87e4-b1978d98bd5f"
BytePairEncoding = "a4280ba5-8788-555a-8ca8-4a8c3d966a71"
CUDA = "052768ef-5323-5732-b1bb-66c8b64840ba"
Expand All @@ -17,25 +14,23 @@ Dates = "ade2ca70-3891-5945-98fb-dc099432e06a"
DelimitedFiles = "8bb1440f-4735-579b-a4ab-409b98df4dab"
DoubleArrayTries = "abbaa0e5-f788-499c-92af-c35ff4258c82"
Fetch = "bb354801-46f6-40b6-9c3d-d42d7a74c775"
FillArrays = "1a297f60-69ca-5386-bcde-b61e274b549b"
Flux = "587475ba-b771-5e3f-ad9e-33799f191a9c"
FuncPipelines = "9ed96fbb-10b6-44d4-99a6-7e2a3dc8861b"
Functors = "d9f16b24-f501-4c13-a1f2-28368ffc5196"
HTTP = "cd3eb016-35fb-5094-929b-558a96fad6f3"
HuggingFaceApi = "3cc741c3-0c9d-4fbe-84fa-cdec264173de"
InternedStrings = "7d512f48-7fb1-5a58-b986-67e6dc259f01"
JSON = "682c06a0-de6a-54ab-a142-c8b1cf79cde6"
JSON3 = "0f8b85d8-7281-11e9-16c2-39a750bddbf1"
LightXML = "9c8b4983-aa76-5018-a973-4c85ecc9e179"
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
MacroTools = "1914dd2f-81c6-5fcd-8719-6d5c9610ff09"
Markdown = "d6f4376e-aef5-505a-96c1-9c027394607a"
Mmap = "a63ad114-7e13-5084-954f-fe012c677804"
NNlib = "872c559c-99b0-510c-b3b7-b6c96a88d5cd"
NNlibCUDA = "a00861dc-f156-4864-bf3c-e6376f28a68d"
NeuralAttentionlib = "12afc1b8-fad6-47e1-9132-84abc478905f"
Pickle = "fbb45041-c46e-462f-888f-7c521cafbc2c"
Pkg = "44cfe95a-1eb2-52ea-b672-e2afdf69b78f"
PrimitiveOneHot = "13d12f88-f12b-451e-9b9f-13b97e01cc85"
Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
Requires = "ae029012-a4dd-5104-9daa-d747884805df"
SHA = "ea8e919c-243c-51af-8825-aaa63cd721ce"
Static = "aedffcd0-7271-4cad-89d0-dc628f76c6d3"
Statistics = "10745b16-79ce-11e8-11f9-7d13ad32a3b2"
Expand All @@ -45,45 +40,40 @@ TextEncodeBase = "f92c20c0-9f2a-4705-8116-881385faba05"
Unicode = "4ec0a83e-493e-50e2-b9ac-8f72acf5a8f5"
ValSplit = "0625e100-946b-11ec-09cd-6328dd093154"
WordTokenizers = "796a5d58-b03d-544a-977e-18100b691f6e"
ZipFile = "a5390f91-8eb1-5f08-bee0-b1d1ffed6cea"

[compat]
AbstractTrees = "0.3, 0.4.3"
Adapt = "3.3"
BSON = "0.3.4"
BytePairEncoding = "0.3"
CUDA = "3.10"
ChainRulesCore = "1.15"
DataDeps = "0.7"
DataStructures = "0.18"
DoubleArrayTries = "0.0.3"
Fetch = "0.1.3"
FillArrays = "0.13"
Flux = "0.13.4"
FuncPipelines = "0.2.3"
Functors = "0.2, 0.3"
Functors = "0.2, 0.3, 0.4"
HTTP = "0.9, 1"
HuggingFaceApi = "0.1"
InternedStrings = "0.7"
JSON = "0.21"
LightXML = "0.9"
MacroTools = "0.5"
NNlib = "0.8"
NNlibCUDA = "0.2"
NeuralAttentionlib = "0.1"
NeuralAttentionlib = "0.2.4"
Pickle = "0.3"
PrimitiveOneHot = "0.1"
Requires = "1"
Static = "0.7"
Static = "0.7, 0.8"
StringViews = "1"
StructWalk = "0.2"
TextEncodeBase = "0.5.11"
TextEncodeBase = "0.6"
ValSplit = "0.1"
WordTokenizers = "0.5.6"
ZipFile = "0.9"
julia = "1.6"

[extras]
ChainRulesTestUtils = "cdddcdb0-9152-4a09-a978-84456f9df70a"
Logging = "56ddb016-857b-54e1-b83d-db4d58db5568"
Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40"
ZipFile = "a5390f91-8eb1-5f08-bee0-b1d1ffed6cea"

[targets]
test = ["Test"]
test = ["Test", "Logging", "ZipFile", "ChainRulesTestUtils"]
88 changes: 15 additions & 73 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,26 +6,13 @@

Julia implementation of [transformer](https://arxiv.org/abs/1706.03762)-based models, with [Flux.jl](https://github.com/FluxML/Flux.jl).

*notice: The current version is almost complete different from the 0.1.x version. If you are using the old version, make sure to update the changes or stick to the old version.*

# Installation

In the Julia REPL:

]add Transformers

For using GPU, install & build:

]add CUDA

]build

julia> using CUDA

julia> using Transformers

#run the model below
.
.
.


# Example
Expand All @@ -34,75 +21,30 @@ Using pretrained Bert with `Transformers.jl`.

```julia
using Transformers
using Transformers.Basic
using Transformers.Pretrain
using Transformers.TextEncoders
using Transformers.HuggingFace

ENV["DATADEPS_ALWAYS_ACCEPT"] = true
textencoder, bert_model = hgf"bert-base-cased"

bert_model, wordpiece, tokenizer = pretrain"bert-uncased_L-12_H-768_A-12"
vocab = Vocabulary(wordpiece)
text1 = "Peter Piper picked a peck of pickled peppers"
text2 = "Fuzzy Wuzzy was a bear"

text1 = "Peter Piper picked a peck of pickled peppers" |> tokenizer |> wordpiece
text2 = "Fuzzy Wuzzy was a bear" |> tokenizer |> wordpiece
text = [[ text1, text2 ]] # 1 batch of contiguous sentences
sample = encode(textencoder, text) # tokenize + pre-process (add special tokens + truncate / padding + one-hot encode)

text = ["[CLS]"; text1; "[SEP]"; text2; "[SEP]"]
@assert text == [
"[CLS]", "peter", "piper", "picked", "a", "peck", "of", "pick", "##led", "peppers", "[SEP]",
@assert reshape(decode(textencoder, sample.token), :) == [
"[CLS]", "peter", "piper", "picked", "a", "peck", "of", "pick", "##led", "peppers", "[SEP]",
"fuzzy", "wu", "##zzy", "was", "a", "bear", "[SEP]"
]

token_indices = vocab(text)
segment_indices = [fill(1, length(text1)+2); fill(2, length(text2)+1)]

sample = (tok = token_indices, segment = segment_indices)

bert_embedding = sample |> bert_model.embed
feature_tensors = bert_embedding |> bert_model.transformers
bert_features = bert_model(sample).hidden_state
```

See `example` folder for the complete example.


# Huggingface

We have some support for the models from [`huggingface/transformers`](https://github.com/huggingface/transformers).

```julia
using Transformers.HuggingFace

# loading a model from huggingface model hub
julia> model = hgf"bert-base-cased:forquestionanswering";
┌ Warning: Transformers.HuggingFace.HGFBertForQuestionAnswering doesn't have field cls.
└ @ Transformers.HuggingFace ~/peter/repo/gsoc2020/src/huggingface/models/models.jl:46
┌ Warning: Some fields of Transformers.HuggingFace.HGFBertForQuestionAnswering aren't initialized with loaded state: qa_outputs
└ @ Transformers.HuggingFace ~/peter/repo/gsoc2020/src/huggingface/models/models.jl:52

```

Current we only support a few model and the tokenizer part is not finished yet.


# For more information

If you want to know more about this package, see the [document](https://chengchingwen.github.io/Transformers.jl/dev/)
and the series of [blog posts](https://nextjournal.com/chengchingwen) I wrote for JSoC and GSoC. You can also
tag me (@chengchingwen) on Julia's slack or discourse if you have any questions, or just create a new Issue on GitHub.


# Roadmap

## What we have before v0.2

- `Transformer` and `TransformerDecoder` support for both 2d & 3d data.
- `PositionEmbedding` implementation.
- `Positionwise` for handling 2d & 3d input.
- docstring for most of the functions.
- runable examples (see `example` folder)
- `Transformers.HuggingFace` for handling pretrains from `huggingface/transformers`

## What we will have in v0.2.0

- Complete tokenizer APIs
- tutorials
- benchmarks
- more examples
If you want to know more about this package, see the [document](https://chengchingwen.github.io/Transformers.jl/dev/)
and read code in the `example` folder. You can also tag me (@chengchingwen) on Julia's slack or discourse if
you have any questions, or just create a new Issue on GitHub.
Loading