Fastext upstream final #2

sburman · 2024-05-22T01:26:51Z

The upstream repo is now readonly, so this updates us to the final commit.

Summary: Replace outdated url in the scripts Reviewed By: piotr-bojanowski Differential Revision: D43464784 fbshipit-source-id: 51a98a9ad5a0939acd0d578126290909a613938b

Summary: [Word vectors](https://huggingface.co/facebook/fasttext-en-vectors) for 157 languages are now hosted on the Hugging Face Hub as well as the [language identification model](https://huggingface.co/facebook/fasttext-language-identification). (cc ajoulin) A newer language model [referred in the NLLB project](https://github.com/facebookresearch/fairseq/blob/nllb/README.md#lid-model) is not mentioned in the official website, so I updated the doc accordingly. Pull Request resolved: facebookresearch#1335 Reviewed By: Celebio Differential Revision: D46507563 Pulled By: jmp84 fbshipit-source-id: 64883a6829c68b968acd980ba77a712b8e7a1365

Summary: fbcode is migrating to LLVM-15 for safer and more up-to-date code and new compiler features. All contbuilds in your directory have passed our build test with LLVM-15, and your directory does not host any packages. This diff will migrate it to LLVM-15. If you approve of this diff, please use the "Accept & Ship" button. If you have a reason for why it should not build with LLVM 15, please make a comment and send it back to author. Otherwise we will land this on Thursday 06/15/2023. See the [FAQ post](https://fb.workplace.com/groups/llvm15platform010/posts/749154386769776/)! Please also direct any questions to [this group](https://fb.workplace.com/groups/llvm15platform010). - If you approve of this diff, please use the "Accept & Ship" button :-) Reviewed By: meyering Differential Revision: D46661531 fbshipit-source-id: 7278fbfcadec2392c94efd6deb710bdd5e9280f8

…cs.py Summary: Python3 makes the use of `(object)` in class inheritance unnecessary. Let's modernize our code by eliminating this. Reviewed By: itamaro Differential Revision: D48673901 fbshipit-source-id: 3e0ef05efe886b32a07bb58bd0725fa2ec934c14

Reviewed By: r-barnes Differential Revision: D49677606 fbshipit-source-id: ec5b375177586c76ecccb83a29b562bc6e9961f6

Summary: Adds pyproject.toml to comply with PEP-518, which fixes the building of the library by poetry - See python-poetry/poetry#6113 . This is a copy of facebookresearch#1270 , but I have signed the CLA. Pull Request resolved: facebookresearch#1292 Differential Revision: D51601444 Pulled By: alexkosau fbshipit-source-id: 357d702281ca3519c3640483eba04d124d0744b4

…1340) Summary: Due to[ header dependency changes](https://gcc.gnu.org/gcc-13/porting_to.html#header-dep-changes) in GCC 13, we need to include the <cstdint> header. Pull Request resolved: facebookresearch#1340 Reviewed By: jmp84 Differential Revision: D51602433 Pulled By: alexkosau fbshipit-source-id: cc9bffb276cb00f1db8ec97a36784c484ae4563a

Summary: I made prediction 1.9x to 4.2x faster than before. # Motivation I want to use https://tinyurl.com/nllblid218e and similarly parametrized models to run language classification on petabytes of web data. # Methodology The costliest operation is summing the rows for each model input. I've optimized this in three ways: 1. `addRowToVector` was a virtual function call for each row. I've replaced this with one virtual function call per prediction by adding `averageRowsToVector` to `Matrix` calls. 2. `Vector` and `DenseMatrix` were not 64-byte aligned so the CPU was doing a lot of unaligned memory access. I've brought in my own `vector` replacement that does 64-byte alignment. 3. Write the `averageRowsToVector` in intrinsics for common vector sizes. This works on SSE, AVX, and AVX512F. See the commit history for a breakdown of speed improvement from each change. # Experiments Test set [docs1000.txt.gz](https://github.com/facebookresearch/fastText/files/11832996/docs1000.txt.gz) which is a bunch of random documents https://data.statmt.org/heafield/classified-fasttext/ CPU: AMD Ryzen 9 7950X 16-Core Model https://tinyurl.com/nllblid218e with 256-dimensional vectors Before real 0m8.757s user 0m8.434s sys 0m0.327s After real 0m2.046s user 0m1.717s sys 0m0.334s Model https://dl.fbaipublicfiles.com/fasttext/supervised-models/lid.176.bin with 16-dimensional vectors Before real 0m0.926s user 0m0.889s sys 0m0.037s After real 0m0.477s user 0m0.436s sys 0m0.040s Pull Request resolved: facebookresearch#1341 Reviewed By: graemenail Differential Revision: D52134736 Pulled By: kpuatfb fbshipit-source-id: 42067161f4c968c34612934b48a562399a267f3b

Reviewed By: azad-meta Differential Revision: D53908330 fbshipit-source-id: b2215f0522c32a82cd876633210befefe9317d76

Summary: Pull Request resolved: facebookresearch#1366 Reviewed By: jailby Differential Revision: D54850920 Pulled By: bigfootjon fbshipit-source-id: 9a3eec7b7cb42335a786fb247cb16be9ed3c2d59

Celebio and others added 11 commits April 17, 2023 03:23

Replace outdated url in the scripts

0622aad

Summary: Replace outdated url in the scripts Reviewed By: piotr-bojanowski Differential Revision: D43464784 fbshipit-source-id: 51a98a9ad5a0939acd0d578126290909a613938b

deeplearning, dcp (2972240286315620591)

789e328

Reviewed By: r-barnes Differential Revision: D49677606 fbshipit-source-id: ec5b375177586c76ecccb83a29b562bc6e9961f6

deeplearning/fastText 2/2

ae1fe80

Reviewed By: azad-meta Differential Revision: D53908330 fbshipit-source-id: b2215f0522c32a82cd876633210befefe9317d76

Delete .circleci directory (facebookresearch#1366)

1142dc4

Summary: Pull Request resolved: facebookresearch#1366 Reviewed By: jailby Differential Revision: D54850920 Pulled By: bigfootjon fbshipit-source-id: 9a3eec7b7cb42335a786fb247cb16be9ed3c2d59

this page intentionally left blank

9cc7687

sburman enabled auto-merge (squash) May 22, 2024 01:27

sburman requested a review from markryd May 22, 2024 01:30

markryd approved these changes May 22, 2024

View reviewed changes

sburman merged commit 04fbfbd into main May 22, 2024
4 checks passed

sburman deleted the fastext_upstream_final branch May 22, 2024 01:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fastext upstream final #2

Fastext upstream final #2

sburman commented May 22, 2024

Fastext upstream final #2

Fastext upstream final #2

Conversation

sburman commented May 22, 2024