Skip to content

Enhance configuration options #385

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

Cheelax
Copy link

@Cheelax Cheelax commented Jul 5, 2025

closes #285

  • Added examples for configuring processing limits in README.md.
  • Introduced new CLI options for max files, max total size, and max directory depth.
  • Updated environment variable support in config.py for various limits.
  • Modified ingestion functions to accept new parameters for file processing limits.
  • Enhanced limit checks in ingestion logic to utilize new configuration options.

Cheelax added 2 commits July 5, 2025 09:27
- Added examples for configuring processing limits in README.md.
- Introduced new CLI options for max files, max total size, and max directory depth.
- Updated environment variable support in config.py for various limits.
- Modified ingestion functions to accept new parameters for file processing limits.
- Enhanced limit checks in ingestion logic to utilize new configuration options.
Refactor

* Deduplicate `_get_env_var` by moving it to `utils/config_utils.py`.
* Remove redundant `local_path` parameter from `_process_file`.

Fix

* Add missing `tag` parameter to `_async_main`, `main`, and `_CLIArgs`.
* Introduce the missing `--tag` CLI flag.

Docs and consistency

* Update `README.md` for `markdownlint` compliance and other minor tweaks.
* Add missing argument docs to `_async_main` docstring.
* Re-order global variables in `config.py` for consistency.
* Swap the order of `include_patterns` and `ignore_patterns` in `parse_query` and `ingest_async`.
* Tidy docstrings for `_async_main`, `IngestionQuery`, `parse_query`, `ingest_async`, and `ingest`.

Tests

* Temporarily disable `[tool.ruff.lint.isort]` due to conflict with the `isort` pre-commit hook.
* Add new arguments to `expected` in `test_parse_query_without_host`.
* Run `pre-commit` hooks.
@filipchristiansen
Copy link
Contributor

Thanks for your work @Cheelax. I have rebase upon main and added a commit with the following:

Refactor

  • Deduplicate _get_env_var by moving it to utils/config_utils.py.
  • Remove redundant local_path parameter from _process_file.

Fix

  • Add missing tag parameter to _async_main, main, and _CLIArgs.
  • Introduce the missing --tag CLI flag.

Docs and consistency

  • Update README.md for markdownlint compliance and other minor tweaks.
  • Add missing argument docs to _async_main docstring.
  • Re-order global variables in config.py for consistency.
  • Swap the order of include_patterns and ignore_patterns in parse_query and ingest_async.
  • Tidy docstrings for _async_main, IngestionQuery, parse_query, ingest_async, and ingest.

Tests

  • Temporarily disable [tool.ruff.lint.isort] due to conflict with the isort pre-commit hook.
  • Add new arguments to expected in test_parse_query_without_host.
  • Run pre-commit hooks.

TODO

  • The environment variable names such as GITINGEST_MAX_FILE_SIZE) do not seem to align with the names used and looked for in _get_env_var`.

  • The _get_env_var function returns int | str which causes downstream problems when the variables set with this functions are used (Type "int | str" is incompatible with type "int").

Comment on lines 9 to 40
def _get_env_var(key: str, default: int | str, cast_func: Callable[[str], int | str] | None = None) -> int | str:
"""Get environment variable with ``GITINGEST_`` prefix and optional type casting.

Parameters
----------
key : str
The name of the environment variable.
default : int | str
The default value to return if the environment variable is not set.
cast_func : Callable[[str], int | str] | None
The function to cast the environment variable to the desired type.

Returns
-------
int | str
The value of the environment variable, cast to the desired type if provided.

"""
env_key = f"GITINGEST_{key}"
value = os.environ.get(env_key)

if value is None:
return default

if cast_func:
try:
return cast_func(value)
except (ValueError, TypeError):
print(f"Warning: Invalid value for {env_key}: {value}. Using default: {default}")
return default

return value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the approach but I think in the end having int | str as a return type might cause some unnecessary headaches in some situations.

Perhaps just returning str and casting whenever calling the function is fine?

@Cheelax
Copy link
Author

Cheelax commented Jul 7, 2025

@filipchristiansen it looks to me that the naming is good in the readme. Every var I am looking for should be prefixed by :

   env_key = f"GITINGEST_{key}"
    value = os.environ.get(env_key)

Cheelax and others added 2 commits July 7, 2025 15:05
- Introduced a new helper function `_get_int_env_var` to retrieve environment variables as integers with default fallback and error handling.
- Updated `config.py` and `server_config.py` to use the new helper function for better clarity and maintainability.
- Simplified `_get_env_var` in `config_utils.py` to focus solely on retrieving string values without type casting.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feat: config parameters as environment variables
3 participants