Skip to content

Conversation

@Cifko
Copy link
Collaborator

@Cifko Cifko commented Jun 2, 2025

Add memory usage cap at 0.9

@Cifko Cifko changed the base branch from main to release/v0.1.14 June 2, 2025 09:52
Copy link
Contributor

@jorgeantonio21 jorgeantonio21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, but I believe some parts of the logic needs to be refactored

const DEFAULT_MAX_TOKENS: u64 = 8_192;

/// The ceiling for memory usage, above which the service will not accept new requests
const MEMORY_USAGE_CEILING: f64 = 0.9;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reckon these values should be set in the config.toml, as we will most likely need to tweak them

@jorgeantonio21 jorgeantonio21 merged commit 6b669a7 into release/v0.1.14 Jun 2, 2025
10 checks passed
jorgeantonio21 added a commit that referenced this pull request Jun 5, 2025
* fix: ensure client errors are correctly tracked (#635)

* fix: ensure client errors are correctly tracked

* chore: update error tracking

* chore: adjust clippy

* chore: grammatical error

* ci: use stable toolchain (#645)

* ci: use stable toolchain

* chore: fix clippy issues

* revert to use prometheus for queued requests (#646)

* revert to use prometheus for queued requests

* add start metrics collector

* update logs

* feat: turn on too many requests for a period of time (#647)

* feat: add request running cap (#649)

* feat: add request running cap

* fix clippy

---------

Co-authored-by: Jorge Antonio <matroid@outlook.com>

* refactor num running requests for prometheus check

* logs

* handle deadlock for too many requests timeout trigger check (#650)

* feat: add mem usage (#651)

* feat: add memusage to get_metrics

* add lower threshold for disabling the flag

* fix clippy

* address 2 comments

* add values to config

* fix

* fix tests

* fix name

* feat: update sui dependencies (#654)

* resolve compilation issues

* ci: add caching strategy for ci

* ci: optimize coverage job

* ci: adjust coverage job

* ci: update deny action

* ci: use grcov

* ci: use stable toolchain

* ci: only run tests once

* ci: move coverage to test file

* ci: use --codecov flag & stable toolchain

* ci: discard p2p tester

---------

Co-authored-by: chad <chad.nehemiah94@gmail.com>

* feat: add max number of queued requests configuration and update request handling (#656)

* fix: correct deadlock in `check_if_too_many_requests` (#658)

* correct deadlock in check_if_too_many_requests method

* resolve tests

* add changes

* add changes

* continue improving logic

* add changes

* fix: normalize model strings to lowercase in request handlers (#661)

* fix: normalize model strings to lowercase in request handlers

* fix test

* fix

---------

Co-authored-by: Chad Nehemiah <chad.nehemiah94@gmail.com>
Co-authored-by: Martin Stefcek <35243812+Cifko@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants