Skip to content

Conversation

@ylnsnv
Copy link
Contributor

@ylnsnv ylnsnv commented Nov 9, 2023

Summary

This PR refines the OpenAIEmbeddingOperator to accurately reflect and support the extended range of input types it can handle for generating embeddings (by OpenAI official API). The operator, which previously accepted a string or a list of any type, now has improved type annotations and validation to handle strings, lists of strings, lists of integers, and lists of lists of integers more explicitly.

Changes

  • Modified type annotations str | list[Any] to be str | list[str] | list[int] | list[list[int]]
  • Refined input validation to ensure input_text is non-empty and matches one of the expected types.
  • Expanded unit tests to cover the newly supported input types and to ensure the operator behaves as expected.

Impact

The operator's enhanced input type support provides users with clearer guidance on what data can be passed for embedding generation. This update facilitates the use of the operator in a broader range of scenarios, such as processing numerical data or sequences of data, which is common in machine learning and NLP tasks.

Tests

  • Extended the test suite to include cases for each of the supported input types, ensuring robustness.
  • Added negative test cases to verify that the operator raises exceptions for invalid inputs as expected.

With these changes, the OpenAIEmbeddingOperator becomes more intuitive and versatile, allowing for seamless integration into various data processing pipelines within Airflow environments.

…ation that it matches the type we need, and aligned docstring for the openai operator accordingly
@ylnsnv
Copy link
Contributor Author

ylnsnv commented Nov 9, 2023

@potiuk hi! can you please review my changes in your free time? :)

@potiuk
Copy link
Member

potiuk commented Nov 9, 2023

@potiuk hi! can you please review my changes in your free time? :)

Generally - unless you have good reasons to know someone SHOULD review something we do not have a custom to call people in comments. ESPECIALLY if you opened your PR few hours before. People here contribute and review when they have free time and see where they can help. By calling me (or anyone else) directly - you also limit your options of getting review from other people, because they might think I am already involved and engaged with a PR, where I am not. So please avoid doing that unless you have very good reason to believe called person is absolutely necessary and SHOULD review the PR for whatever reason.

@ylnsnv
Copy link
Contributor Author

ylnsnv commented Nov 9, 2023

@potiuk hi! can you please review my changes in your free time? :)

Generally - unless you have good reasons to know someone SHOULD review something we do not have a custom to call people in comments. ESPECIALLY if you opened your PR few hours before. People here contribute and review when they have free time and see where they can help. By calling me (or anyone else) directly - you also limit your options of getting review from other people, because they might think I am already involved and engaged with a PR, where I am not. So please avoid doing that unless you have very good reason to believe called person is absolutely necessary and SHOULD review the PR for whatever reason.

I apologize for any inconvenience my direct request may have caused. It was not my intention to limit the review opportunities by others or to impose unnecessarily. I appreciate the collaborative efforts here and will ensure to keep pr's open for the community to address collectively. Thank you for bringing this to my attention and sorry again!

@potiuk
Copy link
Member

potiuk commented Nov 12, 2023

LGTM but I'd love @utkarsharma2 to take a look as this is so fresh contribution :)

@utkarsharma2
Copy link
Contributor

@ylnsnv Overall changes look good to me, just a few minor things. Also, can you mention the docs where you found the specific types accepted by OpenAI API? just curious :)

@ylnsnv
Copy link
Contributor Author

ylnsnv commented Nov 13, 2023

@utkarsharma2

@ylnsnv Overall changes look good to me, just a few minor things. Also, can you mention the docs where you found the specific types accepted by OpenAI API? just curious :)

First of all, thank you for your feedback!
To clarify, my initial observation about the list[Any] type being used in Airflow's code for OpenAI Operator input prompted me to investigate further.

I didn't find the specific type information in the OpenAI API documentation directly. Instead, I referred to the actual implementation of the OpenAI API in their codebase. There, I found a line of code that clearly indicated the expected type. This discovery led me to open the PR to align Airflow's type indications with the actual implementation in the OpenAI API.

I'm glad to contribute and ensure our codebase is as accurate and up-to-date as possible. Thank you again for your thorough review and for pointing out areas that needed clarification. Your input is greatly valued! :)

…f input text to the constructor of the operator, changed value of invalid value to ValueError instead of AirflowException
Copy link
Contributor

@utkarsharma2 utkarsharma2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good to me.

@ylnsnv ylnsnv force-pushed the improve_openai_embedding_operator branch from cc0682c to 4f0982d Compare November 13, 2023 16:08
@ylnsnv
Copy link
Contributor Author

ylnsnv commented Nov 14, 2023

Hi @potiuk ,

I hope I'm not intruding or overstepping again, but I wanted to quickly check if there's anything else needed on my PR for it to be merge-ready.

Thanks for your time!

@potiuk potiuk merged commit acb9458 into apache:main Nov 14, 2023
@potiuk
Copy link
Member

potiuk commented Nov 14, 2023

LGTM :)

@ephraimbuddy ephraimbuddy added the changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) label Nov 20, 2023
@ephraimbuddy ephraimbuddy added this to the Airflow 2.8.0 milestone Nov 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants