Skip to content

Tags: linkedin/detext

Tags

v2.0.8

Toggle v2.0.8's commit message
Use matrix for multiple py versions: python-app-py3.yml (#1)

v2.0.6

Toggle v2.0.6's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
fix text preprocessing for inference model (#40)

* fix text preprocessing for inference model

* add filter_window_sizes override for non-cnn models

v2.0.5-alpha

Toggle v2.0.5-alpha's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Update python-publish.yml

update username and secret name

2.0.4

Toggle 2.0.4's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Create python-publish.yml

v1.2.0

Toggle v1.2.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Add embedding and MLP support for sparse wide features (#24)

# Description

Currently DeText's design for sparse feature has simple modeling power for sparse features.
1. only linear model is applied on sparse features
2. there's no interaction between sparse features and dense features (model_score = dense_score + sparse_score)

This PR resolves the above limitation on sparse feature by
1. computing dense representation of sparse features
2. allowing interactions between sparse features and wide features

More specifically, the model architecture changes from
```
dense_score = dense_ftrs -> MLP
sparse_score = sparse_ftrs -> Linear
final_score = dense_score + sparse_score
```
to
```
sparse_emb_ftrs = sparse_ftrs -> Dense(sp_emb_size)
all_ftrs = (dense_ftrs, sparse_emb_ftrs) -> Concatenate
final_score= all_ftrs -> MLP
```
## Type of change

- [ ] New feature (non-breaking change which adds functionality)

## List all changes 
Please list all changes in the commit.
* Change sp_linear_model to sp_emb_model and add an option sp_emb_size to allow the sparse matrix to have output dimension > 1
* Change structure of dense & sparse feature interaction as mentioned in the PR description
* Add and restructure unit test for sparse embedding model
* Add new data for testing
* Add a sample tfrecord generation helper function in misc_utils.py
* Add instructions in TRAINING.md

# Testing
- Successfully run run_detext.sh for data including wide_sp_val and sp_emb_size=10
- Successfully run run_detext_multitask.sh for data
- Unit test for sparse_emb_model when sp_emb_size is 1 and > 1
# Checklist

- [ ] My code follows the style guidelines of this project
- [ ] I have performed a self-review of my own code
- [ ] I have commented my code, particularly in hard-to-understand areas
- [ ] I have made corresponding changes to the documentation
- [ ] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my feature works
- [ ] New and existing unit tests pass locally with my changes
- [ ] Any dependent changes have been merged and published in downstream modules

v1.1.0

Toggle v1.1.0's commit message
update optimization test

v1.0.12

Toggle v1.0.12's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
expose tfrecord dataset transformation function for LinkedIn usage (#10)

Co-authored-by: Leon Gao <legao@linkedin.com>