[Automating Model Tracing and Uploading] PR2 Model Listing & Uploading (

#210) * Initiate PR #2 Model Listing & Uploading Signed-off-by: Thanawan Atchariyachanvanit <latchari@amazon.com> * Add comment Signed-off-by: Thanawan Atchariyachanvanit <latchari@amazon.com> * Correct linting Signed-off-by: Thanawan Atchariyachanvanit <latchari@amazon.com> * Update CHANGELOG.md Signed-off-by: Thanawan Atchariyachanvanit <latchari@amazon.com> * Update CHANGELOG.md Signed-off-by: Thanawan Atchariyachanvanit <latchari@amazon.com> * Revert "Correct demo_ml_commons_integration.ipynb (#208)" This reverts commit c67f969. Signed-off-by: Thanawan Atchariyachanvanit <latchari@amazon.com> * Removed a modified file from pull request Signed-off-by: Thanawan Atchariyachanvanit <latchari@amazon.com> * Removed a modified file from pull request Signed-off-by: Thanawan Atchariyachanvanit <latchari@amazon.com> * Add steps.checkout_pr_branch.outcome Signed-off-by: Thanawan Atchariyachanvanit <latchari@amazon.com> * Remove old config json Signed-off-by: Thanawan Atchariyachanvanit <latchari@amazon.com> * Minor change Signed-off-by: Thanawan Atchariyachanvanit <latchari@amazon.com> * Allow non-st Signed-off-by: Thanawan Atchariyachanvanit <latchari@amazon.com> * Add tests for PR2 Signed-off-by: Thanawan Atchariyachanvanit <latchari@amazon.com> * Create __init__.py Signed-off-by: Thanawan Atchariyachanvanit <latchari@amazon.com> * Create __init__.py Signed-off-by: Thanawan Atchariyachanvanit <latchari@amazon.com> * Update noxfile.py Signed-off-by: Thanawan Atchariyachanvanit <latchari@amazon.com> * Update noxfile.py Signed-off-by: Thanawan Atchariyachanvanit <latchari@amazon.com> * Update setup.cfg Signed-off-by: Thanawan Atchariyachanvanit <latchari@amazon.com> * Update noxfile.py Signed-off-by: Thanawan Atchariyachanvanit <latchari@amazon.com> * Add more test to improve coverage Signed-off-by: Thanawan Atchariyachanvanit <latchari@amazon.com> * Improve test cov Signed-off-by: Thanawan Atchariyachanvanit <latchari@amazon.com> * Make it extensible Signed-off-by: Thanawan Atchariyachanvanit <latchari@amazon.com> * Update update_model_listing.yml Signed-off-by: Thanawan Atchariyachanvanit <latchari@amazon.com> * Avoid crashing if folder does not exists Signed-off-by: Thanawan Atchariyachanvanit <latchari@amazon.com> * Debug update_model_listing.yml Signed-off-by: Thanawan Atchariyachanvanit <latchari@amazon.com> * Update update_model_listing.yml Signed-off-by: Thanawan Atchariyachanvanit <latchari@amazon.com> * Wrap with update_pretrained_model_listing_main Signed-off-by: Thanawan Atchariyachanvanit <latchari@amazon.com> * Update update_model_listing.yml Signed-off-by: Thanawan Atchariyachanvanit <latchari@amazon.com> * Remove unused variables Signed-off-by: Thanawan Atchariyachanvanit <latchari@amazon.com> * Remove unnecessary variable Signed-off-by: Thanawan Atchariyachanvanit <latchari@amazon.com> --------- Signed-off-by: Thanawan Atchariyachanvanit <latchari@amazon.com>
opensearch-project · Aug 22, 2023 · 205a0fd · 205a0fd
1 parent ef2a01e
commit 205a0fd
Show file tree

Hide file tree

Showing 18 changed files with 552 additions and 2 deletions.
diff --git a/.github/workflows/model_listing_uploader.yml b/.github/workflows/model_listing_uploader.yml
@@ -0,0 +1,49 @@
+name: Model Listing Uploading
+on:
+  push:
+    branches:
+     - main
+    paths: 
+     - utils/model_uploader/model_listing/pretrained_model_listing.json
+  workflow_dispatch:
+
+jobs:
+  upload-model-listing:
+    runs-on: 'ubuntu-latest'
+    permissions:
+      id-token: write
+      contents: read
+    environment: opensearch-py-ml-cicd-env
+    env:
+      bucket_model_listing_file_path: ml-models/model_listing/pre_trained_models.json
+      repo_model_listing_path: ./utils/model_uploader/model_listing/pretrained_model_listing.json
+    steps:
+    - name: Fail if branch is not main
+      if: github.ref == 'refs/heads/main'
+      run: |
+         echo "This workflow should only be triggered on 'main' branch"
+         exit 1
+    - name: Checkout Repository
+      uses: actions/checkout@v3
+    - name: Configure AWS Credentials
+      uses: aws-actions/configure-aws-credentials@v2
+      with:
+        aws-region: ${{ secrets.MODEL_UPLOADER_AWS_REGION }}
+        role-to-assume: ${{ secrets.MODEL_UPLOADER_ROLE }}
+        role-session-name: upload-model-listing
+    - name: Update pre_trained_models.json in S3
+      run: aws s3 cp ${{ env.repo_model_listing_path }} s3://${{ secrets.MODEL_BUCKET }}/${{ env.bucket_model_listing_file_path }}
+
+  trigger-ml-models-release-workflow:
+    needs: upload-model-listing
+    runs-on: 'ubuntu-latest'
+    permissions:
+      contents: read
+    steps:
+      - name: Checkout Repository
+        uses: actions/checkout@v3
+      - name: Trigger Jenkins Workflow with Generic Webhook
+        run: |
+          jenkins_trigger_token=${{ secrets.JENKINS_ML_MODELS_RELEASE_GENERIC_WEBHOOK_TOKEN }}
+          jenkins_params="{\"BASE_DOWNLOAD_PATH\":\"ml-models/model_listing\"}"
+          sh utils/model_uploader/trigger_ml_models_release.sh $jenkins_trigger_token $jenkins_params
diff --git a/.github/workflows/update_model_listing.yml b/.github/workflows/update_model_listing.yml
@@ -0,0 +1,145 @@
+name: Update Pretrained Model Listing
+on:
+  workflow_dispatch:
+
+jobs:
+  update-model-listing:
+    runs-on: 'ubuntu-latest'
+    permissions:
+      id-token: write
+      contents: write
+      pull-requests: write
+    environment: opensearch-py-ml-cicd-env
+    env:
+      bucket_model_listing_file_path: ml-models/model_listing/pre_trained_models.json
+      repo_model_listing_path: ./utils/model_uploader/model_listing/pretrained_model_listing.json
+      path_prefixes: "ml-models/huggingface/" 
+      # To expand the model listing to encompass additional folders, simply adjust the path_prefixes as indicated below: 
+      # "ml-models/first_folder/ ml-models/second_folder/ ml-models/third_folder/" (Separate each folder with a space)
+    steps:
+    - name: Fail if branch is not main
+      if: github.ref == 'refs/heads/main'
+      run: |
+         echo "This workflow should only be triggered on 'main' branch"
+         exit 1
+    - name: Checkout Main Branch
+      uses: actions/checkout@v3
+    - name: Configure AWS Credentials
+      uses: aws-actions/configure-aws-credentials@v2
+      with:
+        aws-region: ${{ secrets.MODEL_UPLOADER_AWS_REGION }}
+        role-to-assume: ${{ secrets.MODEL_UPLOADER_ROLE }}
+        role-session-name: update-model-listing
+    - name: List Models
+      run: |
+        path_prefixes="${{ env.path_prefixes }}"
+        for prefix in $path_prefixes
+        do
+          if aws s3 ls s3://${{ secrets.PERSONAL_MODEL_BUCKET }}/$prefix > /dev/null
+          then
+            aws s3api list-objects --bucket ${{ secrets.MODEL_BUCKET }} --prefix $prefix --query "Contents[].{Key: Key}" --output text | grep "/config.json$" >> config_paths.txt
+          else
+            echo "Folder with prefix $prefix does not exist."
+          fi
+        done
+        echo $(cat config_paths.txt)
+    - name: Download config files
+      run: |
+        mkdir config_folder
+        path_prefixes="${{ env.path_prefixes }}"
+        for prefix in $path_prefixes
+        do
+          aws s3 cp s3://${{ secrets.MODEL_BUCKET }}/$prefix config_folder/$prefix --recursive --exclude "*" --include "*/config.json"
+        done
+        echo $(ls config_folder)
+    - name: Set Up Python
+      uses: actions/setup-python@v2
+      with:
+        python-version: '3.x'
+    - name: Update pre_trained_models.json
+      run: |
+        python utils/model_uploader/update_pretrained_model_listing.py "config_paths.txt" "config_folder"
+    - name: Create PR Body
+      id: create_pr_body
+      run: |
+        update_time=$(TZ='America/Los_Angeles' date "+%Y-%m-%d %T")
+        echo "update_time=$update_time" >> $GITHUB_OUTPUT
+        pr_body="
+        - [ ] This PR made commit to only these two files: pretrained_model_listing.json and CHANGELOG.md.
+        - [ ] CHANGELOG.md has been updated by the workflow or by you if the workflow fails to do so. 
+        - [ ] Merge conflicts have been resolved.
+          
+        ========= Workflow Details ==========
+        - Workflow Name: ${{ github.workflow }}
+        - Workflow Run ID: ${{ github.run_id }}
+        - Workflow Initiator: @${{ github.actor }}
+        - File Update Time: $update_time"
+          
+        echo "pr_body<<EOF" >> $GITHUB_OUTPUT
+        echo "${pr_body@E}" >> $GITHUB_OUTPUT
+        echo "EOF" >> $GITHUB_OUTPUT
+        echo "${pr_body@E}"
+    - name: Create a Branch & Raise a PR
+      uses: peter-evans/create-pull-request@v5
+      id: create_pr
+      with:
+        committer: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
+        commit-message: 'GitHub Actions Workflow: Update Pretrained Model Listing'
+        signoff: true
+        title: 'Update Pretrained Model Listing - ${{ steps.create_pr_body.outputs.update_time }}'
+        body: ${{ steps.create_pr_body.outputs.pr_body }}
+        labels: ModelListingUploading
+        branch: model-listing-uploader/${{ github.run_id }}
+        delete-branch: true
+        add-paths: ${{ env.repo_model_listing_path }}
+    - name: Checkout PR Branch
+      id: checkout_pr_branch
+      continue-on-error: true
+      uses: actions/checkout@v3
+      with:
+        ref: model-listing-uploader/${{ github.run_id }}
+    - name: Create a line for updating CHANGELOG.md
+      id: create_changelog_line
+      if: steps.checkout_pr_branch.outcome == 'success'
+      continue-on-error: true
+      run: |
+        pr_ref="([#${{ steps.create_pr.outputs.pull-request-number }}](${{ steps.create_pr.outputs.pull-request-url }}))"
+        changelog_line="Update pretrained_model_listing.json (${{ steps.create_pr_body.outputs.update_time }}) by @${{ github.actor }} $pr_ref"
+        echo "changelog_line=$changelog_line" >> $GITHUB_OUTPUT
+    - name: Warning Comment on PR if create_changelog_line fails
+      if: steps.checkout_pr_branch.outcome == 'success' && steps.create_changelog_line.outcome == 'failure'
+      uses: thollander/actions-comment-pull-request@v2
+      with:
+        pr_number: ${{ steps.create_pr.outputs.pull-request-number }}
+        message: "Warning:exclamation:: The workflow failed to update CHANGELOG.md. Please update CHANGELOG.md manually."
+    - name: Update CHANGELOG.md
+      if: steps.checkout_pr_branch.outcome == 'success' && steps.create_changelog_line.outcome == 'success'
+      id: update_changelog
+      continue-on-error: true
+      run: |
+        python -m pip install mdutils
+        python utils/model_uploader/update_changelog_md.py "${{ steps.create_changelog_line.outputs.changelog_line }}"
+    - name: Commit Updates
+      if: steps.checkout_pr_branch.outcome == 'success' && steps.create_changelog_line.outcome == 'success' && steps.update_changelog.outcome == 'success'
+      uses: stefanzweifel/git-auto-commit-action@v4
+      id: commit
+      with:
+        branch: model-listing-uploader/${{ github.run_id }}
+        commit_user_email: "github-actions[bot]@users.noreply.github.com"
+        commit_message: 'GitHub Actions Workflow: Update CHANGELOG.md - ${{ env.model_info }}'
+        commit_options: '--signoff'
+        file_pattern: CHANGELOG.md
+    - name: Warning Comment on PR if update_changelog fails
+      if: steps.checkout_pr_branch.outcome == 'success' && steps.create_changelog_line.outcome == 'success' && steps.update_changelog.outcome == 'failure'
+      uses: thollander/actions-comment-pull-request@v2
+      with:
+        pr_number: ${{ steps.create_pr.outputs.pull-request-number }}
+        message: |
+          Warning:exclamation:: The workflow failed to update CHANGELOG.md. Please add the following line manually.
+          >>>
+          ${{ steps.create_changelog_line.outputs.changelog_line }}
+    - name: No Change in Model Listing
+      if: steps.checkout_pr_branch.outcome == 'failure'
+      run: |
+        echo "There is no change in model listing."
+        echo "Exiting the workflow"
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -5,7 +5,9 @@ Inspired from [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
 
 ### Added
 - Add workflows and scripts for automating model tracing and uploading process by @thanawan-atc in ([#209](https://github.com/opensearch-project/opensearch-py-ml/pull/209))
+- Add workflow and scripts for automating model listing updating process by @thanawan-atc in ([#210](https://github.com/opensearch-project/opensearch-py-ml/pull/210))
 
+
 ### Changed
 
 

diff --git a/noxfile.py b/noxfile.py
@@ -125,7 +125,7 @@ def test(session, pandas_version: str):
         "-m",
         "pytest",
         "--cov-report=term-missing",
-        "--cov=opensearch_py_ml/",
+        "--cov",
         "--cov-config=setup.cfg",
         "--doctest-modules",
         "--nbval",

diff --git a/setup.cfg b/setup.cfg
@@ -7,3 +7,7 @@ exclude_lines=
     @abstractmethod
     if TYPE_CHECKING:
     raise NotImplementedError*
+[coverage:run]
+include=
+    opensearch_py_ml/*
+    utils/model_uploader/update_pretrained_model_listing.py
diff --git a/...g/samples/config_folder/ml-models/huggingface/intfloat/e5-small-v2/1.0.1/onnx/config.json b/...g/samples/config_folder/ml-models/huggingface/intfloat/e5-small-v2/1.0.1/onnx/config.json
@@ -0,0 +1 @@
+{"name": "intfloat/e5-small-v2", "version": "1.0.1", "description": "This is a sentence-transformers model: It maps sentences & paragraphs to a 384 dimensional dense vector space.", "model_format": "ONNX", "model_task_type": "TEXT_EMBEDDING", "model_config": {"model_type": "bert", "embedding_dimension": 384, "framework_type": "sentence_transformers", "pooling_mode": "MEAN", "normalize_result": true, "all_config": "{\"_name_or_path\": \"/root/.cache/torch/sentence_transformers/intfloat_e5-small-v2/\", \"architectures\": [\"BertModel\"], \"attention_probs_dropout_prob\": 0.1, \"classifier_dropout\": null, \"hidden_act\": \"gelu\", \"hidden_dropout_prob\": 0.1, \"hidden_size\": 384, \"initializer_range\": 0.02, \"intermediate_size\": 1536, \"layer_norm_eps\": 1e-12, \"max_position_embeddings\": 512, \"model_type\": \"bert\", \"num_attention_heads\": 12, \"num_hidden_layers\": 12, \"pad_token_id\": 0, \"position_embedding_type\": \"absolute\", \"torch_dtype\": \"float32\", \"transformers_version\": \"4.31.0\", \"type_vocab_size\": 2, \"use_cache\": true, \"vocab_size\": 30522}"}}
diff --git a/...ngface/sentence-transformers/clip-ViT-B-32-multilingual-v1/1.0.1/torch_script/config.json b/...ngface/sentence-transformers/clip-ViT-B-32-multilingual-v1/1.0.1/torch_script/config.json
@@ -0,0 +1 @@
+{"name": "sentence-transformers/clip-ViT-B-32-multilingual-v1", "version": "1.0.1", "description": "This is a multi-lingual version of the OpenAI CLIP-ViT-B32 model. You can map text  and images to a common dense vector space such that images and the matching texts are close. This model can be used for image search  and for multi-lingual zero-shot image classification .", "model_format": "TORCH_SCRIPT", "model_task_type": "TEXT_EMBEDDING", "model_config": {"model_type": "distilbert", "embedding_dimension": 512, "framework_type": "sentence_transformers", "pooling_mode": "MEAN", "normalize_result": false, "all_config": "{\"_name_or_path\": \"/root/.cache/torch/sentence_transformers/sentence-transformers_clip-ViT-B-32-multilingual-v1/\", \"activation\": \"gelu\", \"architectures\": [\"DistilBertModel\"], \"attention_dropout\": 0.1, \"dim\": 768, \"dropout\": 0.1, \"hidden_dim\": 3072, \"initializer_range\": 0.02, \"max_position_embeddings\": 512, \"model_type\": \"distilbert\", \"n_heads\": 12, \"n_layers\": 6, \"output_past\": true, \"pad_token_id\": 0, \"qa_dropout\": 0.1, \"seq_classif_dropout\": 0.2, \"sinusoidal_pos_embds\": false, \"tie_weights_\": true, \"torch_dtype\": \"float32\", \"transformers_version\": \"4.31.0\", \"vocab_size\": 119547}"}}
diff --git a/...odels/huggingface/sentence-transformers/multi-qa-mpnet-base-cos-v1/1.0.1/onnx/config.json b/...odels/huggingface/sentence-transformers/multi-qa-mpnet-base-cos-v1/1.0.1/onnx/config.json
@@ -0,0 +1 @@
+{"name": "sentence-transformers/multi-qa-mpnet-base-cos-v1", "version": "1.0.1", "description": "This is a sentence-transformers model: It maps sentences & paragraphs to a 768 dimensional dense vector space and was designed for semantic search. It has been trained on 215M  pairs from diverse sources.", "model_format": "ONNX", "model_task_type": "TEXT_EMBEDDING", "model_config": {"model_type": "mpnet", "embedding_dimension": 768, "framework_type": "sentence_transformers", "pooling_mode": "MEAN", "normalize_result": true, "all_config": "{\"_name_or_path\": \"/root/.cache/torch/sentence_transformers/sentence-transformers_multi-qa-mpnet-base-cos-v1/\", \"architectures\": [\"MPNetModel\"], \"attention_probs_dropout_prob\": 0.1, \"bos_token_id\": 0, \"eos_token_id\": 2, \"hidden_act\": \"gelu\", \"hidden_dropout_prob\": 0.1, \"hidden_size\": 768, \"initializer_range\": 0.02, \"intermediate_size\": 3072, \"layer_norm_eps\": 1e-05, \"max_position_embeddings\": 514, \"model_type\": \"mpnet\", \"num_attention_heads\": 12, \"num_hidden_layers\": 12, \"pad_token_id\": 1, \"relative_attention_num_buckets\": 32, \"torch_dtype\": \"float32\", \"transformers_version\": \"4.31.0\", \"vocab_size\": 30527}"}}
diff --git a/...ggingface/sentence-transformers/multi-qa-mpnet-base-cos-v1/1.0.1/torch_script/config.json b/...ggingface/sentence-transformers/multi-qa-mpnet-base-cos-v1/1.0.1/torch_script/config.json
@@ -0,0 +1 @@
+{"name": "sentence-transformers/multi-qa-mpnet-base-cos-v1", "version": "1.0.1", "description": "This is a sentence-transformers model: It maps sentences & paragraphs to a 768 dimensional dense vector space and was designed for semantic search. It has been trained on 215M  pairs from diverse sources.", "model_format": "TORCH_SCRIPT", "model_task_type": "TEXT_EMBEDDING", "model_config": {"model_type": "mpnet", "embedding_dimension": 768, "framework_type": "sentence_transformers", "pooling_mode": "MEAN", "normalize_result": true, "all_config": "{\"_name_or_path\": \"/root/.cache/torch/sentence_transformers/sentence-transformers_multi-qa-mpnet-base-cos-v1/\", \"architectures\": [\"MPNetModel\"], \"attention_probs_dropout_prob\": 0.1, \"bos_token_id\": 0, \"eos_token_id\": 2, \"hidden_act\": \"gelu\", \"hidden_dropout_prob\": 0.1, \"hidden_size\": 768, \"initializer_range\": 0.02, \"intermediate_size\": 3072, \"layer_norm_eps\": 1e-05, \"max_position_embeddings\": 514, \"model_type\": \"mpnet\", \"num_attention_heads\": 12, \"num_hidden_layers\": 12, \"pad_token_id\": 1, \"relative_attention_num_buckets\": 32, \"torch_dtype\": \"float32\", \"transformers_version\": \"4.31.0\", \"vocab_size\": 30527}"}}
diff --git a/...ggingface/sentence-transformers/multi-qa-mpnet-base-cos-v1/2.0.0/torch_script/config.json b/...ggingface/sentence-transformers/multi-qa-mpnet-base-cos-v1/2.0.0/torch_script/config.json
@@ -0,0 +1 @@
+{"name": "sentence-transformers/multi-qa-mpnet-base-cos-v1", "version": "2.0.0", "description": "This is a sentence-transformers model: It maps sentences & paragraphs to a 768 dimensional dense vector space and was designed for semantic search. It has been trained on 215M  pairs from diverse sources. (New Version)", "model_format": "TORCH_SCRIPT", "model_task_type": "TEXT_EMBEDDING", "model_config": {"model_type": "mpnet", "embedding_dimension": 768, "framework_type": "sentence_transformers", "pooling_mode": "MEAN", "normalize_result": true, "all_config": "{\"_name_or_path\": \"/root/.cache/torch/sentence_transformers/sentence-transformers_multi-qa-mpnet-base-cos-v1/\", \"architectures\": [\"MPNetModel\"], \"attention_probs_dropout_prob\": 0.1, \"bos_token_id\": 0, \"eos_token_id\": 2, \"hidden_act\": \"gelu\", \"hidden_dropout_prob\": 0.1, \"hidden_size\": 768, \"initializer_range\": 0.02, \"intermediate_size\": 3072, \"layer_norm_eps\": 1e-05, \"max_position_embeddings\": 514, \"model_type\": \"mpnet\", \"num_attention_heads\": 12, \"num_hidden_layers\": 12, \"pad_token_id\": 1, \"relative_attention_num_buckets\": 32, \"torch_dtype\": \"float32\", \"transformers_version\": \"4.31.0\", \"vocab_size\": 30527}"}}
diff --git a/..._folder/ml-models/other_source/jhgan/ko-sroberta-multitask/1.0.1/torch_script/config.json b/..._folder/ml-models/other_source/jhgan/ko-sroberta-multitask/1.0.1/torch_script/config.json
@@ -0,0 +1 @@
+{"name": "jhgan/ko-sroberta-multitask", "version": "1.0.1", "description": "This is a sentence-transformers model: It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search.", "model_format": "TORCH_SCRIPT", "model_task_type": "TEXT_EMBEDDING", "model_config": {"model_type": "roberta", "embedding_dimension": 768, "framework_type": "sentence_transformers", "pooling_mode": "MEAN", "normalize_result": false, "all_config": "{\"_name_or_path\": \"/root/.cache/torch/sentence_transformers/jhgan_ko-sroberta-multitask/\", \"architectures\": [\"RobertaModel\"], \"attention_probs_dropout_prob\": 0.1, \"bos_token_id\": 0, \"classifier_dropout\": null, \"eos_token_id\": 2, \"gradient_checkpointing\": false, \"hidden_act\": \"gelu\", \"hidden_dropout_prob\": 0.1, \"hidden_size\": 768, \"initializer_range\": 0.02, \"intermediate_size\": 3072, \"layer_norm_eps\": 1e-05, \"max_position_embeddings\": 514, \"model_type\": \"roberta\", \"num_attention_heads\": 12, \"num_hidden_layers\": 12, \"pad_token_id\": 1, \"position_embedding_type\": \"absolute\", \"tokenizer_class\": \"BertTokenizer\", \"torch_dtype\": \"float32\", \"transformers_version\": \"4.31.0\", \"type_vocab_size\": 1, \"use_cache\": true, \"vocab_size\": 32000}"}}
diff --git a/tests/ml_model_listing/samples/config_paths.txt b/tests/ml_model_listing/samples/config_paths.txt
@@ -0,0 +1 @@
+ml-models/huggingface/intfloat/e5-small-v2/1.0.1/onnx/config.json ml-models/other_source/jhgan/ko-sroberta-multitask/1.0.1/torch_script/config.json ml-models/huggingface/sentence-transformers/clip-ViT-B-32-multilingual-v1/1.0.1/torch_script/config.json ml-models/huggingface/sentence-transformers/multi-qa-mpnet-base-cos-v1/1.0.1/onnx/config.json ml-models/huggingface/sentence-transformers/multi-qa-mpnet-base-cos-v1/1.0.1/torch_script/config.json ml-models/huggingface/sentence-transformers/multi-qa-mpnet-base-cos-v1/2.0.0/torch_script/config.json
diff --git a/tests/ml_model_listing/samples/pretrained_model_listing.json b/tests/ml_model_listing/samples/pretrained_model_listing.json
@@ -0,0 +1,53 @@
+[
+  {
+    "name": "huggingface/intfloat/e5-small-v2",
+    "versions": {
+      "1.0.1": {
+        "format": [
+          "onnx"
+        ],
+        "description": "This is a sentence-transformers model: It maps sentences & paragraphs to a 384 dimensional dense vector space."
+      }
+    }
+  },
+  {
+    "name": "huggingface/sentence-transformers/clip-ViT-B-32-multilingual-v1",
+    "versions": {
+      "1.0.1": {
+        "format": [
+          "torch_script"
+        ],
+        "description": "This is a multi-lingual version of the OpenAI CLIP-ViT-B32 model. You can map text  and images to a common dense vector space such that images and the matching texts are close. This model can be used for image search  and for multi-lingual zero-shot image classification ."
+      }
+    }
+  },
+  {
+    "name": "huggingface/sentence-transformers/multi-qa-mpnet-base-cos-v1",
+    "versions": {
+      "1.0.1": {
+        "format": [
+          "onnx",
+          "torch_script"
+        ],
+        "description": "This is a sentence-transformers model: It maps sentences & paragraphs to a 768 dimensional dense vector space and was designed for semantic search. It has been trained on 215M  pairs from diverse sources."
+      },
+      "2.0.0": {
+        "format": [
+          "torch_script"
+        ],
+        "description": "This is a sentence-transformers model: It maps sentences & paragraphs to a 768 dimensional dense vector space and was designed for semantic search. It has been trained on 215M  pairs from diverse sources. (New Version)"
+      }
+    }
+  },
+  {
+    "name": "other_source/jhgan/ko-sroberta-multitask",
+    "versions": {
+      "1.0.1": {
+        "format": [
+          "torch_script"
+        ],
+        "description": "This is a sentence-transformers model: It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search."
+      }
+    }
+  }
+]
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		{"name": "intfloat/e5-small-v2", "version": "1.0.1", "description": "This is a sentence-transformers model: It maps sentences & paragraphs to a 384 dimensional dense vector space.", "model_format": "ONNX", "model_task_type": "TEXT_EMBEDDING", "model_config": {"model_type": "bert", "embedding_dimension": 384, "framework_type": "sentence_transformers", "pooling_mode": "MEAN", "normalize_result": true, "all_config": "{\"_name_or_path\": \"/root/.cache/torch/sentence_transformers/intfloat_e5-small-v2/\", \"architectures\": [\"BertModel\"], \"attention_probs_dropout_prob\": 0.1, \"classifier_dropout\": null, \"hidden_act\": \"gelu\", \"hidden_dropout_prob\": 0.1, \"hidden_size\": 384, \"initializer_range\": 0.02, \"intermediate_size\": 1536, \"layer_norm_eps\": 1e-12, \"max_position_embeddings\": 512, \"model_type\": \"bert\", \"num_attention_heads\": 12, \"num_hidden_layers\": 12, \"pad_token_id\": 0, \"position_embedding_type\": \"absolute\", \"torch_dtype\": \"float32\", \"transformers_version\": \"4.31.0\", \"type_vocab_size\": 2, \"use_cache\": true, \"vocab_size\": 30522}"}}
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		{"name": "sentence-transformers/clip-ViT-B-32-multilingual-v1", "version": "1.0.1", "description": "This is a multi-lingual version of the OpenAI CLIP-ViT-B32 model. You can map text and images to a common dense vector space such that images and the matching texts are close. This model can be used for image search and for multi-lingual zero-shot image classification .", "model_format": "TORCH_SCRIPT", "model_task_type": "TEXT_EMBEDDING", "model_config": {"model_type": "distilbert", "embedding_dimension": 512, "framework_type": "sentence_transformers", "pooling_mode": "MEAN", "normalize_result": false, "all_config": "{\"_name_or_path\": \"/root/.cache/torch/sentence_transformers/sentence-transformers_clip-ViT-B-32-multilingual-v1/\", \"activation\": \"gelu\", \"architectures\": [\"DistilBertModel\"], \"attention_dropout\": 0.1, \"dim\": 768, \"dropout\": 0.1, \"hidden_dim\": 3072, \"initializer_range\": 0.02, \"max_position_embeddings\": 512, \"model_type\": \"distilbert\", \"n_heads\": 12, \"n_layers\": 6, \"output_past\": true, \"pad_token_id\": 0, \"qa_dropout\": 0.1, \"seq_classif_dropout\": 0.2, \"sinusoidal_pos_embds\": false, \"tie_weights_\": true, \"torch_dtype\": \"float32\", \"transformers_version\": \"4.31.0\", \"vocab_size\": 119547}"}}
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		{"name": "sentence-transformers/multi-qa-mpnet-base-cos-v1", "version": "1.0.1", "description": "This is a sentence-transformers model: It maps sentences & paragraphs to a 768 dimensional dense vector space and was designed for semantic search. It has been trained on 215M pairs from diverse sources.", "model_format": "ONNX", "model_task_type": "TEXT_EMBEDDING", "model_config": {"model_type": "mpnet", "embedding_dimension": 768, "framework_type": "sentence_transformers", "pooling_mode": "MEAN", "normalize_result": true, "all_config": "{\"_name_or_path\": \"/root/.cache/torch/sentence_transformers/sentence-transformers_multi-qa-mpnet-base-cos-v1/\", \"architectures\": [\"MPNetModel\"], \"attention_probs_dropout_prob\": 0.1, \"bos_token_id\": 0, \"eos_token_id\": 2, \"hidden_act\": \"gelu\", \"hidden_dropout_prob\": 0.1, \"hidden_size\": 768, \"initializer_range\": 0.02, \"intermediate_size\": 3072, \"layer_norm_eps\": 1e-05, \"max_position_embeddings\": 514, \"model_type\": \"mpnet\", \"num_attention_heads\": 12, \"num_hidden_layers\": 12, \"pad_token_id\": 1, \"relative_attention_num_buckets\": 32, \"torch_dtype\": \"float32\", \"transformers_version\": \"4.31.0\", \"vocab_size\": 30527}"}}
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		{"name": "jhgan/ko-sroberta-multitask", "version": "1.0.1", "description": "This is a sentence-transformers model: It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search.", "model_format": "TORCH_SCRIPT", "model_task_type": "TEXT_EMBEDDING", "model_config": {"model_type": "roberta", "embedding_dimension": 768, "framework_type": "sentence_transformers", "pooling_mode": "MEAN", "normalize_result": false, "all_config": "{\"_name_or_path\": \"/root/.cache/torch/sentence_transformers/jhgan_ko-sroberta-multitask/\", \"architectures\": [\"RobertaModel\"], \"attention_probs_dropout_prob\": 0.1, \"bos_token_id\": 0, \"classifier_dropout\": null, \"eos_token_id\": 2, \"gradient_checkpointing\": false, \"hidden_act\": \"gelu\", \"hidden_dropout_prob\": 0.1, \"hidden_size\": 768, \"initializer_range\": 0.02, \"intermediate_size\": 3072, \"layer_norm_eps\": 1e-05, \"max_position_embeddings\": 514, \"model_type\": \"roberta\", \"num_attention_heads\": 12, \"num_hidden_layers\": 12, \"pad_token_id\": 1, \"position_embedding_type\": \"absolute\", \"tokenizer_class\": \"BertTokenizer\", \"torch_dtype\": \"float32\", \"transformers_version\": \"4.31.0\", \"type_vocab_size\": 1, \"use_cache\": true, \"vocab_size\": 32000}"}}
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		ml-models/huggingface/intfloat/e5-small-v2/1.0.1/onnx/config.json ml-models/other_source/jhgan/ko-sroberta-multitask/1.0.1/torch_script/config.json ml-models/huggingface/sentence-transformers/clip-ViT-B-32-multilingual-v1/1.0.1/torch_script/config.json ml-models/huggingface/sentence-transformers/multi-qa-mpnet-base-cos-v1/1.0.1/onnx/config.json ml-models/huggingface/sentence-transformers/multi-qa-mpnet-base-cos-v1/1.0.1/torch_script/config.json ml-models/huggingface/sentence-transformers/multi-qa-mpnet-base-cos-v1/2.0.0/torch_script/config.json