Skip to content

Conversation

JeremyTar
Copy link

@JeremyTar JeremyTar commented Sep 19, 2025

This PR want to add a simply new provider to use llm functionality on Timesketch. Including a new file in :

/timesketch/timesketch/lib/llm/providers/contrib/azureai.py

Actualité using on our context with all the summary feature we need to improve it with other llm features and pass all the tests required for a good PR.

Checks

  • All tests succeed.
  • Unit tests added.
  • e2e tests added.
  • Documentation updated.

Closing issues

closes #3522 - Add azureai provider for LLM features

Nice day !

Copy link

google-cla bot commented Sep 19, 2025

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@jkppr jkppr self-requested a review September 19, 2025 16:40
@jkppr jkppr added the LLM AI all the things label Sep 19, 2025
@jkppr jkppr removed their request for review September 19, 2025 16:40
@jkppr
Copy link
Collaborator

jkppr commented Sep 23, 2025

Thanks for the contribution. 🎉

Can you please move the provider into a cobtrib subfolder and also sign the CLA mentioned in the comment above? Thanks

@JeremyTar
Copy link
Author

Hi @jkppr ! I make the change about the contrib folder dedicated for new provider. About the CLA, i'm little block by our compagny process. I try to fix it fastly and i send a message when it's ok 👌

Great day !

@jkppr
Copy link
Collaborator

jkppr commented Sep 24, 2025

Thanks for making the changes. Is the code ready for review? If yes, can you mark the PR as ready? Then I'll give it a quick review. If all is good and CLA sorted we can merge it.

@JeremyTar JeremyTar force-pushed the llm_new_azure_provider branch 3 times, most recently from c71c72e to 2d2f2d1 Compare September 24, 2025 09:19
@JeremyTar
Copy link
Author

I've requested the Black formatting in your GitHub actions for azureai.py.

Do you need a final rebase to make just one clean commit?

@JeremyTar JeremyTar marked this pull request as ready for review September 24, 2025 09:31
@jkppr
Copy link
Collaborator

jkppr commented Sep 24, 2025

Please take a look at the linter issues.
You can check the linter locally following these steps:

bash utils/tsdev.sh shell

# in your docker shell
cd /usr/local/src/timesketch
pylint --rcfile=.pylintrc timesketch/lib/llms/providers/contrib/azureai.py

Once CLA is sorted we can review the code and when all is approved, we will merge in the latest changes from the master branch and squash & merge.

@JeremyTar JeremyTar force-pushed the llm_new_azure_provider branch 2 times, most recently from d32b219 to ce4518d Compare September 25, 2025 07:25
Move azureai provider into contrib folder

Format azureai.py to be compliant with black formating

Fix pylint feeback
@JeremyTar JeremyTar force-pushed the llm_new_azure_provider branch from ce4518d to 0df7977 Compare September 25, 2025 07:27
@JeremyTar
Copy link
Author

Finally .. i just didn't link my personal profile with my work email address ... Apolagize !

@jkppr
Copy link
Collaborator

jkppr commented Sep 29, 2025

/Gemini summary

Copy link
Contributor

Summary of Changes

This pull request significantly enhances Timesketch's large language model capabilities by adding a new provider for Azure AI. This integration allows Timesketch to leverage Azure OpenAI services, expanding the range of available LLM backends for features like summarization and other text-based analyses. The change is implemented modularly, making it easy to extend with future providers.

Highlights

  • New Azure AI LLM Provider: Introduced a new LLM provider for Azure AI, allowing Timesketch to integrate with Azure OpenAI services for large language model functionalities.
  • Modular Integration: The new AzureAI provider is added within a new 'contrib' subfolder and registered with the existing LLMManager, ensuring modularity and extensibility.
  • Azure OpenAI API Interaction: The provider handles configuration for Azure OpenAI API (endpoint, API key, model), constructs chat completion requests, and processes responses, including basic JSON schema handling.
Changelog
  • timesketch/lib/llms/providers/init.py
    • Added an import statement for the new 'azureai' module from the 'contrib' subpackage, making the new provider discoverable.
  • timesketch/lib/llms/providers/contrib/azureai.py
    • New file created to implement the 'AzureAI' class, inheriting from 'interface.LLMProvider'.
    • The class includes an 'init' method to configure the provider with Azure endpoint, API key, and model, raising an error if essential parameters are missing.
    • A 'generate' method is implemented to send prompts to the Azure OpenAI chat completion API, handle HTTP requests, and parse responses.
    • Basic support for 'response_schema' is included to attempt JSON parsing of the LLM output.
    • The 'AzureAI' provider is registered with the 'LLMManager' to be available for use within Timesketch.
Activity
  • google-cla[bot] commented, indicating that a Contributor License Agreement (CLA) is required before the PR can be reviewed.
  • jkppr thanked for the contribution and requested moving the provider to a 'contrib' subfolder and signing the CLA.
  • JeremyTar confirmed the change to the 'contrib' folder and mentioned facing company process blocks for the CLA, promising an update once resolved.
  • jkppr inquired if the code was ready for review and asked the author to mark the PR as such, noting that a quick review would follow once CLA is sorted.
  • JeremyTar asked about applying Black formatting and rebasing for a single clean commit.
  • jkppr advised on fixing linter issues locally using pylint and reiterated the process for CLA, code review, and eventual squash & merge.
  • JeremyTar apologized for the CLA delay, explaining it was due to an issue linking their personal profile with their work email address.
  • jkppr posted '/Gemini summary'.

@jkppr
Copy link
Collaborator

jkppr commented Sep 29, 2025

/Gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new Azure AI provider for LLM functionality. While the initiative is great, the current implementation has several issues that need to be addressed. My review focuses on improving code quality, robustness, and adherence to the project's style guide. The feedback includes fixing incorrect type hints, addressing potential runtime errors like UnboundLocalError and unsafe dictionary access, ensuring consistency with the base provider class, and improving documentation. Addressing these points will make the new provider more reliable and maintainable.

"""Azure Ai Foundry LLM provider."""

import json
from typing import Optional, Any
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

To correctly type hint the generate method's return value, Union is needed from the typing module.

Suggested change
from typing import Optional, Any
from typing import Optional, Any, Union

"endpoint, api_key, and model are required for AzureAI provider"
)

def generate(self, prompt: str, response_schema: Optional[dict] = None) -> dict:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The return type hint for this function is dict, but it can also return a str on line 54. This is inconsistent and can lead to type errors. The return type should be Union[dict, str].

Additionally, the base class LLMProvider.generate is type-hinted to return str. Overriding it with a different return type violates the Liskov Substitution Principle, which could cause issues in code that uses the provider through the common interface.

This method is also missing a docstring. Please add a Google-style docstring as per the style guide.1

Suggested change
def generate(self, prompt: str, response_schema: Optional[dict] = None) -> dict:
def generate(self, prompt: str, response_schema: Optional[dict] = None) -> Union[dict, str]:

Style Guide References

Footnotes

  1. The style guide requires docstrings for functions.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JeremyTar Is there a specific reason you changed the return type to dict instead of string?
If I'm not mistaken this would make the azureai provider incompatible with the existing features like Event summary or query generation? (Maybe you tested it and it works?)

Would it be an option to return only the LLM text answer from the dict like it's done in the aistudio provider for example?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused about this point 😅 We mainly use the summarization feature and we don't have the opportunity to use the other providers in our context.

I relied on the other providers to build the Azure provider. Despite initial tests by providing a string provided by response_data = response.json()["choices"][0]["message"]["content"] I encountered JSON type errors.

Currently, the provider works for us. To do this, we add the key expected by the response_schema schema expected.

I'd appreciate your opinions, because I can't understand how the Ollama provider works, for example. I probably misunderstood the use of response_schema or the use of json.loads()?



class AzureAI(interface.LLMProvider):
"""AzureAI class for interacting with the Azure OpenAI API via Timesketch."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The class docstring should follow the Google style guide as specified in the project's style guide.1 It should be more descriptive, explaining what the class does and what configuration it expects.

For example:

"""AzureAI provider for Timesketch.

This provider uses the Azure OpenAI API to generate text.
It requires the endpoint, api_key, and model to be configured.
"""

Style Guide References

Footnotes

  1. The style guide requires using Google-style docstrings.


NAME = "azureai"

def __init__(self, config: dict, **kwargs: Any):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This __init__ method is missing a docstring. According to the style guide, all functions should have a docstring explaining their purpose, arguments, and any exceptions they might raise.1

Please add a Google-style docstring, for example:

    """Initializes the AzureAI provider.

    Args:
        config (dict): A dictionary of provider-specific configuration options.
        **kwargs (Any): Additional arguments passed to the base class.

    Raises:
        ValueError: If required configuration keys (endpoint, api_key, model)
            are missing.
    """

Style Guide References

Footnotes

  1. The style guide requires docstrings for functions, following the Google style.

Copy link
Collaborator

@jkppr jkppr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please have a look at the unresolved comments by Gemini.

I'll leave the second round of review to @itsmvd

f"Error JSON parsing text: {formatted_data}: {error}"
) from error

return response_data
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this response_data by default a string?

If yes let's add the Union type hint as recommended by Gemini code review to ensure we can return dict in case a response_schema is used and str by default.

@jkppr jkppr requested a review from itsmvd September 29, 2025 13:09
@JeremyTar
Copy link
Author

I take note of all the comments and start working on them tomorrow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
LLM AI all the things
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants