-
Notifications
You must be signed in to change notification settings - Fork 651
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Take 2: use Hydra to build xformer model #93
Conversation
reversible: False # Optionally make these layers reversible to save memory | ||
num_layers: 3 # Optional this means that this config will repeat N times | ||
block_type: decoder | ||
dim_model: ${emb} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm new to Hydra, but this means that it will be inferred from some broader context ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, the interpolation here is absolute - meaning it will interpolate the emb
in the primary config file - examples/build_model/conf/config.yaml
. Maybe it is not the best/most obvious way to config this. This will be better supported once Hydra supports partial instantiation, which should be coming soon in the latest release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, provided CI is happy ! Looks cleaner to me, thank you @jieru-hu !
@@ -100,6 +100,8 @@ class xFormerBlockConfig: | |||
layer_norm_style: LayerNormStyle | |||
layer_position: LayerPosition | |||
use_triton: bool | |||
reversible: bool | |||
num_layers: int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree, feels like this is the right place
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thinking about it, it also means that some of the tests with respect to reversibility can be removed, since this will mean that all the layers in the same block will have the same characteristic
xFormerDecoderBlock, | ||
xFormerDecoderConfig, | ||
xFormerEncoderBlock, | ||
xFormerEncoderConfig, | ||
) | ||
|
||
|
||
@dataclass(init=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree
# Check that the reversible setting is not alternating, which | ||
# - makes little sense, since you loose all the reversible benefits | ||
# - may break | ||
# Reversible is only allowed on the encoder side | ||
|
||
reversible = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah perfect, exactly what I had in mind in a comment above
Codecov Report
@@ Coverage Diff @@
## main #93 +/- ##
==========================================
+ Coverage 87.61% 87.74% +0.13%
==========================================
Files 50 51 +1
Lines 2567 2587 +20
==========================================
+ Hits 2249 2270 +21
+ Misses 318 317 -1
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
Last minute demand @jieru-hu, would it be possible to have a look at the unit test coverage and make sure that it does not regress ? It's probably just a matter of a missing test case within the config space. Looks like the regression is in the model factory |
@@ -98,7 +98,7 @@ Building full models | |||
==================== | |||
|
|||
|
|||
This is the last example in the series, and goes one level up again, so that we consider building a whole Tranformer/xFormer model. Please note that this is just an example, because building the whole model from explicit parts is always an option, from pure PyTorch building blocks or adding some xFormers primitives. | |||
Now let's build a full Tranformer/xFormer model. Please note that this is just an example, because building the whole model from explicit parts is always an option, from pure PyTorch building blocks or adding some xFormers primitives. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now let's build a full Tranformer/xFormer model. Please note that this is just an example, because building the whole model from explicit parts is always an option, from pure PyTorch building blocks or adding some xFormers primitives. | |
Now let's build a full Transformer/xFormer model. Please note that this is just an example, because building the whole model from explicit parts is always an option, from pure PyTorch building blocks or adding some xFormers primitives. |
.. code-block:: yaml | ||
|
||
defaults: | ||
- /stack@xformer.stack_configs: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be out of scope to add some more information as to what these do in this doc page for people who are unfamiliar? This is a pretty advanced usage of Hydra which may be alright for users of Hydra, but may require some onboarding.
Or maybe I'm completely wrong cc @omry
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for taking a look @SeanNaren - yea, you are right, this is "advanced" Hydra :) I've put more context in the example config files. I can certainly add a bit more context in the doc.
hey @jieru-hu how about a more components based conf? I.e define the attention in a folder like: conf/attention/local.yaml
num_heads: 4
residual_dropout: 0
attention: local
Set the default in the higher level default list, then use interpolation to bring it into the larger stack, i.e (not sure if this works, but with finnicking I think it will) # base encoder settings that can be extended and overriden
# we leave out the attention part for other config to override
_target_: xformers.factory.block_factory.xFormerEncoderConfig
reversible: False
num_layers: 4
user_triton: True
dim_model: ${emb}
layer_norm_style: pre
position_encoding_config:
name: vocab
seq_len: 1024
vocab_size: ${vocab}
dropout: 0
multi_head_config: ${attention} You're then able to set the attention in a nicer way I think, like
is there a reason why you can't just do this btw:
|
Thanks for the suggestion Sean. we can certainly do something like this, but, if I understand your example correctly, we won't be able to add multiple stack with different attentions to the same model, right? |
a6963e9
to
d3894dc
Compare
I was hoping to see the codedev coverage report with this new commit, but somehow that didn't work. So i ended up finding the link to the report in the |
184a23b
to
4420710
Compare
@@ -16,3 +16,6 @@ pytest-cov == 2.10.0 | |||
pytest-mpi == 0.4 | |||
pytest-timeout == 1.4.2 | |||
timm >= 0.4.5 | |||
|
|||
# Dependency for factory | |||
hydra-core >= 1.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit @jieru-hu, but duplicate right ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm i don't think this is a duplicate - since i only added the hydra-core dependency to the examples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see, right ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
woop you're right @jieru-hu , I missed the fact that the other requirements file was under example.. all good !
Nice, thanks for the updates related to the unit tests @jieru-hu ! I think that it would be good to iterate with @SeanNaren on the interface, I especially like the |
Yep, definitely! These examples are really V0 and I'm definitely planning on iterating on them. Will work with @SeanNaren to gather more feedback :) |
@@ -1 +1,2 @@ | |||
hydra-core>1.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jieru-hu hydra-core is also required here, and requirements is used as part of requirements-test, so looks like a duplicate to me ?
…research#104) * Adding some helpers on SparseCS + small unit testing * unit test fix * adding a device test, checking for facebookresearch#93 * catching the padding bug, fixing
address the feedback in #59
from a higher level
see the final config
output
model built
output