Feature implementation from commits ff2ddad..dc23555 #3

yashuatla · 2025-06-23T18:01:18Z

PR Summary

Refactor Model Configuration and Training System

Overview

This PR refactors the model configuration system to use a unified approach for base and thinking models, adds CUDA availability checking, and updates the data pipeline to use configuration objects instead of environment variables.

Change Types

Type	Description
Enhancement	Added CUDA availability checking functionality
Refactor	Unified model configuration interfaces and updated stores
Refactor	Removed environment variable dependencies in data pipeline

Affected Modules

Module / File	Change Description
`.eslintrc.js`	Added TypeScript and Node.js import resolver configuration
`src/service/modelConfig.ts`	Refactored model config interfaces and added thinking config update function
`src/service/train.ts`	Added CUDA availability check and refactored training interfaces
`src/store/*Store.ts`	Updated stores for model config and training with new properties and methods
`L2/data_pipeline/data_prep/*`	Removed .env dependencies and switched to user_llm_config object

Notes for Reviewers

The localStorage key changed from 'trainingConfig' to 'trainingParams'
Model name property changed from 'baseModel' to 'model_name'
Training status handling has been updated with new suspended/failed states

* fix:fix l1 save problem * fix:simplify the code * fix:delete no use import * fix:delete useless data

* fix password update logic, if there's more than one load * update fix

* fix: modify thinking_model loading configuration * feat: realize thinkModel ui * feat:store * feat: add combined_llm_config_dto * add thinking_model_config & database migration * directly add thinking model to user_llm_config * delete thinking model repo dto service * delete thinkingmodel table migration * add is_cot config * feat: allow define is_cot * feat: simplify logs info * feat: add training model * feat: fix is_cot problem * fix: fix chat message * fix: fix progress error * fix: disable no settings thinking * feat: add thinking warning * fix: fix start service error * feat:fix init trainparams problem * feat: change playGround prompt * feat: Add Dimension Mismatch Handling for ChromaDB (mindverse#157) (mindverse#207) * Fix Issue mindverse#157 Add chroma_utils.py to manage chromaDB and added docs for explanation * Add logging and debugging process - Enhanced the`reinitialize_chroma_collections` function in`chroma_utils.py` to properly check if collections exist before attempting to delete them, preventing potential errors when collections don't exist. - Improved error handling in the`_handle_dimension_mismatch` method in`embedding_service.py` by adding more robust exception handling and verification steps after reinitialization. - Enhanced the collection initialization process in`embedding_service.py` to provide more detailed error messages and better handle cases where collections still have incorrect dimensions after reinitialization. - Added additional verification steps to ensure that collection dimensions match the expected dimension after creation or retrieval. - Improved logging throughout the code to provide more context in error messages, making debugging easier. * Change topics_generator timeout to 30 (mindverse#263) * quick fix * fix: shade -> shade_merge_info (mindverse#265) * fix: shade -> shade_merge_info * add convert array * quick fix import error * add log * add heartbeat * new strategy * sse version * add heartbeat * zh to en * optimize code * quick fix convert function * Feat/new branch management (mindverse#267) * feat: new branch management * feat: fix multi-upload * optimize contribute management --------- Co-authored-by: Crabboss Mr <1123357821@qq.com> Co-authored-by: Ye Xiangle <yexiangle@mail.mindverse.ai> Co-authored-by: Xinghan Pan <sampan090611@gmail.com> Co-authored-by: doubleBlack2 <108928143+doubleBlack2@users.noreply.github.com> Co-authored-by: kevin-mindverse <kevin@mindverse.ai> Co-authored-by: KKKKKKKevin <115385420+kevin-mindverse@users.noreply.github.com>

* feat: replace tutorial link * replace video link --------- Co-authored-by: kevin-mindverse <kevin@mindverse.ai>

* Add CUDA support - CUDA detection - Memory handling - Ollama model release after training * Fix logging issue added cuda support flag so log accurately reflected cuda toggle * Update llama.cpp rebuild Changed llama.cpp to only check if cuda support is enabled and if so rebuild during the first build rather than each run * Improved vram management Enabled memory pinning and optimizer state offload * Fix CUDA check rewrote llama.cpp rebuild logic, added manual y/n toggle if user wants to enable cuda support * Added fast restart and fixed CUDA check command Added make docker-restart-backend-fast to restart the backend and reflect code changes without causing a full llama.cpp rebuild Fixed make docker-check-cuda command to correctly reflect cuda support * Added docker-compose.gpu.yml Added docker-compose.gpu.yml to fix error on machines without nvidia gpu and made sure "\n" is added before .env modification * Fixed cuda toggle Last push accidentally broke cuda toggle * Code review fixes Fixed errors resulting from removed code: - Added return save_path to end of save_hf_model function - Rolled back download_file_with_progress function * Update Makefile Use cuda by default when using docker-restart-backend-fast * Minor cleanup Removed unnecessary makefile command and fixed gpu logging * Delete .gpu_selected * Simplified cuda training code - Removed dtype setting to let torch automatically handle it - Removed vram logging - Removed Unnecessary/old comments * Fixed gpu/cpu selection Made "make docker-use-gpu/cpu" command work with .gpu_selected flag and changed "make docker-restart-backend-fast" command to respect flag instead of always using gpu * Fix Ollama embedding error Added custom exception class for Ollama embeddings, which seemed to be returning keyword arguments while the Python exception class only accepts positional ones * Fixed model selection & memory error Fixed training defaulting to 0.5B model regardless of selection and fixed "free(): double free detected in tcache 2" error caused by cuda flag being passed incorrectly

…rse#279) * feature: use uv to setup python environment * TrainProcessService add singleten method: get_instance

New section for FAQ doc

* feature: use uv to setup python environment * TrainProcessService add singleten method: get_instance * feat: fix code * Added CUDA support (mindverse#228) * Add CUDA support - CUDA detection - Memory handling - Ollama model release after training * Fix logging issue added cuda support flag so log accurately reflected cuda toggle * Update llama.cpp rebuild Changed llama.cpp to only check if cuda support is enabled and if so rebuild during the first build rather than each run * Improved vram management Enabled memory pinning and optimizer state offload * Fix CUDA check rewrote llama.cpp rebuild logic, added manual y/n toggle if user wants to enable cuda support * Added fast restart and fixed CUDA check command Added make docker-restart-backend-fast to restart the backend and reflect code changes without causing a full llama.cpp rebuild Fixed make docker-check-cuda command to correctly reflect cuda support * Added docker-compose.gpu.yml Added docker-compose.gpu.yml to fix error on machines without nvidia gpu and made sure "\n" is added before .env modification * Fixed cuda toggle Last push accidentally broke cuda toggle * Code review fixes Fixed errors resulting from removed code: - Added return save_path to end of save_hf_model function - Rolled back download_file_with_progress function * Update Makefile Use cuda by default when using docker-restart-backend-fast * Minor cleanup Removed unnecessary makefile command and fixed gpu logging * Delete .gpu_selected * Simplified cuda training code - Removed dtype setting to let torch automatically handle it - Removed vram logging - Removed Unnecessary/old comments * Fixed gpu/cpu selection Made "make docker-use-gpu/cpu" command work with .gpu_selected flag and changed "make docker-restart-backend-fast" command to respect flag instead of always using gpu * Fix Ollama embedding error Added custom exception class for Ollama embeddings, which seemed to be returning keyword arguments while the Python exception class only accepts positional ones * Fixed model selection & memory error Fixed training defaulting to 0.5B model regardless of selection and fixed "free(): double free detected in tcache 2" error caused by cuda flag being passed incorrectly * fix: train service singlten --------- Co-authored-by: Zachary Pitroda <30330004+zpitroda@users.noreply.github.com>

* fix: adjustment status order * fix: adjustment train status * fix: split the status of service and train

* Update README.md Changed the updated tutorial link * Update README.md with FAQ New section for FAQ doc

* fix: adjustment status order * fix: adjustment train status * fix: split the status of service and train * feat: adjustment train rule

* feat: what? no llama.cpp * add cache

* fix: add relative

# v1.0.0 - First Release 🎉

codeowlai · 2025-06-23T20:32:25Z

lpm_frontend/src/store/useModelConfigStore.ts

+        });
+      })
+      .catch((error) => {
+        console.error(error.message || 'Failed to fetch model config');


🐛 Correctness Issue

Silent API Error Handling.

The fetchModelConfig function silently fails by only logging errors to console without propagating them, which could hide API failures from the UI.

Current Code (Diff):

- console.error(error.message || 'Failed to fetch model config'); + console.error(error.message || 'Failed to fetch model config'); + throw error; // Propagate error to caller

📝 Committable suggestion

‼️ IMPORTANT
Trust, but verify! 🕵️ Please review this suggestion with the care of a code archaeologist - check that it perfectly replaces the highlighted code, preserves all lines, maintains proper indentation, and won't break anything in production. Your future self will thank you! 🚀

Suggested change

console.error(error.message || 'Failed to fetch model config');

console.error(error.message || 'Failed to fetch model config');

throw error; // Propagate error to caller

codeowlai · 2025-06-23T20:32:29Z

lpm_frontend/src/store/useTrainingStore.ts

+    const preStatus = get().status;
+
+    //Only trained and running can be interchanged.
+    if (statusRankMap[status] < statusRankMap[preStatus]) {


🐛 Correctness Issue

Potential undefined property access.

Accessing statusRankMap[status] without checking if status exists in the map could cause runtime errors if an invalid status is provided

Current Code (Diff):

- if (statusRankMap[status] < statusRankMap[preStatus]) { + if (status in statusRankMap && preStatus in statusRankMap && statusRankMap[status] < statusRankMap[preStatus]) {

📝 Committable suggestion

‼️ IMPORTANT
Trust, but verify! 🕵️ Please review this suggestion with the care of a code archaeologist - check that it perfectly replaces the highlighted code, preserves all lines, maintains proper indentation, and won't break anything in production. Your future self will thank you! 🚀

Suggested change

if (statusRankMap[status] < statusRankMap[preStatus]) {

if (status in statusRankMap && preStatus in statusRankMap && statusRankMap[status] < statusRankMap[preStatus]) {

yexiangle and others added 24 commits April 23, 2025 20:46

fix:fix l1 save problem (mindverse#269)

81c4861

* fix:fix l1 save problem * fix:simplify the code * fix:delete no use import * fix:delete useless data

Feature/fix update instace (mindverse#272)

ce9cfcb

* fix password update logic, if there's more than one load * update fix

fix: fetch uploadInfo in homepage (mindverse#271)

fd64b4e

mcp search online secondme model (mindverse#242)

516843d

feat: replace tutorial link (mindverse#268)

f049167

* feat: replace tutorial link * replace video link --------- Co-authored-by: kevin-mindverse <kevin@mindverse.ai>

feature: use uv to setup python environment (mindverse#277)

71d54a5

Optimize TrainProcessService Singleton Pattern Implementation (mindve…

29a17c8

…rse#279) * feature: use uv to setup python environment * TrainProcessService add singleten method: get_instance

Update README.md with FAQ

5ddf2ea

New section for FAQ doc

fix move trainprocess to solve loop (mindverse#288)

de8370b

add execute right (mindverse#289)

3ae664f

Feat/0423/train status (mindverse#287)

19adcac

* fix: adjustment status order * fix: adjustment train status * fix: split the status of service and train

Updated README with FAQ (mindverse#285)

c4a9b90

* Update README.md Changed the updated tutorial link * Update README.md with FAQ New section for FAQ doc

Feat/0425/adjustment of training rule (mindverse#290)

ef4c491

* fix: adjustment status order * fix: adjustment train status * fix: split the status of service and train * feat: adjustment train rule

preserve training param (mindverse#292)

1d8b48e

Feat/fix no llama.cpp (mindverse#297)

34d4329

* feat: what? no llama.cpp * add cache

fix for gpu

c88d236

feat: up max seq length

53dfdaf

fix: fix page overflow (mindverse#299)

5457a7a

* fix: add relative

fix:fix monitor model download log problem (mindverse#305)

b3dcdd8

Merge pull request mindverse#306 from mindverse/release_0428

ddfcd15

# v1.0.0 - First Release 🎉

Merge branch 'mindverse:master' into master

dc23555

codeowlai bot reviewed Jun 23, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature implementation from commits ff2ddad..dc23555 #3

Feature implementation from commits ff2ddad..dc23555 #3

Uh oh!

yashuatla commented Jun 23, 2025 •

edited by codeowlai bot

Loading

Uh oh!

codeowlai bot Jun 23, 2025

Uh oh!

codeowlai bot Jun 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

	console.error(error.message \|\| 'Failed to fetch model config');
	console.error(error.message \|\| 'Failed to fetch model config');
	throw error; // Propagate error to caller

	if (statusRankMap[status] < statusRankMap[preStatus]) {
	if (status in statusRankMap && preStatus in statusRankMap && statusRankMap[status] < statusRankMap[preStatus]) {

Feature implementation from commits ff2ddad..dc23555 #3

Are you sure you want to change the base?

Feature implementation from commits ff2ddad..dc23555 #3

Uh oh!

Conversation

yashuatla commented Jun 23, 2025 • edited by codeowlai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Summary

Overview

Change Types

Affected Modules

Notes for Reviewers

Uh oh!

codeowlai bot Jun 23, 2025

Choose a reason for hiding this comment

Uh oh!

codeowlai bot Jun 23, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

yashuatla commented Jun 23, 2025 •

edited by codeowlai bot

Loading