Skip to content

feat: add py.typed; adjust Component protocol #9329

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open

Conversation

anakin87
Copy link
Member

@anakin87 anakin87 commented Apr 30, 2025

Related Issues

Proposed Changes:

  • Add py.typed file to the repository: this enables type information to be used by downstream projects, in line
    with PEP 561 (Python typing docs). Previously, users got Skipping analyzing "haystack": module is installed, but missing library stubs or py.typed marker and all Haystack types were treated as Any.
  • Change how we define the run method in the Component Protocol. This reverts fix: Update Component protocol to fix some type checking issues #7270 because I found that the previous definition caused type-checking issues (better explained in the code comments).

How did you test it?

The main risk of this change is that users performing static type checking may start seeing new errors, since Haystack types were previously treated as Any.

While some errors are expected and are probably the temporary price to pay to leverage Haystack types in downstream projects, I wanted to evaluate the impact of this change. For this reason, I converted our tutorials into python scripts (jupyter nbconvert --to python --RegexRemovePreprocessor.patterns '%%bash' ./tutorials/*.ipynb) and run mypy on them.

  • Before this PR

    • Found 176 errors in 17 files (checked 17 source files)
    • most errors like tutorials/scripts/44_Creating_Custom_SuperComponents.py:52: error: Skipping analyzing "haystack": module is installed, but missing library stubs or py.typed marker [import-untyped]
  • After adding py.typed (22a089d)

    • Found 151 errors in 17 files (checked 17 source files)
    • most errors like tutorials/scripts/41_Query_Classification_with_TransformersTextRouter_and_TransformersZeroShotTextRouter.py:338: error: Argument 2 to "add_component" of "PipelineBase" has incompatible type "TransformersTextRouter"; expected "Component" [arg-type]
      note: Protocol member Component.run expected settable variable, got read-only attribute
      related to run method signature
  • py.typed + Component protocol change (this PR)

    • Found 32 errors in 13 files (checked 17 source files)
Errors tutorials/scripts/41_Query_Classification_with_TransformersTextRouter_and_TransformersZeroShotTextRouter.py:89: error: Library stubs not installed for "pandas" [import-untyped] tutorials/scripts/41_Query_Classification_with_TransformersTextRouter_and_TransformersZeroShotTextRouter.py:91: error: Need type annotation for "results" [var-annotated] tutorials/scripts/41_Query_Classification_with_TransformersTextRouter_and_TransformersZeroShotTextRouter.py:159: error: Need type annotation for "sent_results" [var-annotated] tutorials/scripts/41_Query_Classification_with_TransformersTextRouter_and_TransformersZeroShotTextRouter.py:182: error: Incompatible types in assignment (expression has type "TransformersZeroShotTextRouter", variable has type "TransformersTextRouter") [assignment] tutorials/scripts/41_Query_Classification_with_TransformersTextRouter_and_TransformersZeroShotTextRouter.py:216: error: Incompatible types in assignment (expression has type "TransformersZeroShotTextRouter", variable has type "TransformersTextRouter") [assignment] tutorials/scripts/41_Query_Classification_with_TransformersTextRouter_and_TransformersZeroShotTextRouter.py:275: error: Cannot find implementation or library stub for module named "datasets" [import-not-found] tutorials/scripts/40_Building_Chat_Application_with_Function_Calling.py:334: error: Cannot find implementation or library stub for module named "gradio" [import-not-found] tutorials/scripts/39_Embedding_Metadata_for_Improved_Retrieval.py:100: error: Cannot find implementation or library stub for module named "wikipedia" [import-not-found] tutorials/scripts/37_Simplifying_Pipeline_Inputs_with_Multiplexer.py:168: error: Cannot find implementation or library stub for module named "haystack.components.others" [import-not-found] tutorials/scripts/34_Extractive_QA_Pipeline.py:54: error: Cannot find implementation or library stub for module named "datasets" [import-not-found] tutorials/scripts/33_Hybrid_Retrieval.py:69: error: Cannot find implementation or library stub for module named "datasets" [import-not-found] tutorials/scripts/30_File_Type_Preprocessing_Index_Pipeline.py:59: error: Cannot find implementation or library stub for module named "gdown" [import-not-found] tutorials/scripts/28_Structured_Output_With_Loop.py:92: error: Library stubs not installed for "colorama" [import-untyped] tutorials/scripts/28_Structured_Output_With_Loop.py:92: note: Hint: "python3 -m pip install types-colorama" tutorials/scripts/28_Structured_Output_With_Loop.py:92: note: (or run "mypy --install-types" to install all missing stub packages) tutorials/scripts/28_Structured_Output_With_Loop.py:113: error: Argument 1 to "loads" has incompatible type "str | None"; expected "str | bytes | bytearray" [arg-type] tutorials/scripts/28_Structured_Output_With_Loop.py:137: error: Argument "pydantic_model" to "OutputValidator" has incompatible type "type[CitiesData]"; expected "BaseModel" [arg-type] tutorials/scripts/27_First_RAG_Pipeline.py:72: error: Cannot find implementation or library stub for module named "datasets" [import-not-found] tutorials/scripts/27_First_RAG_Pipeline.py:72: note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports tutorials/scripts/44_Creating_Custom_SuperComponents.py:58: error: Cannot find implementation or library stub for module named "datasets" [import-not-found] tutorials/scripts/44_Creating_Custom_SuperComponents.py:97: error: "HybridRetriever" has no attribute "run" [attr-defined] tutorials/scripts/44_Creating_Custom_SuperComponents.py:182: error: Incompatible types in assignment (expression has type "HybridRetrieverWithRanker", variable has type "HybridRetriever") [assignment] tutorials/scripts/44_Creating_Custom_SuperComponents.py:183: error: "HybridRetriever" has no attribute "run" [attr-defined] tutorials/scripts/44_Creating_Custom_SuperComponents.py:301: error: Incompatible types in assignment (expression has type "AdvancedHybridRetriever", variable has type "HybridRetriever") [assignment] tutorials/scripts/44_Creating_Custom_SuperComponents.py:302: error: "HybridRetriever" has no attribute "run" [attr-defined] tutorials/scripts/42_Sentence_Window_Retriever.py:131: error: Incompatible types in assignment (expression has type "TextIOWrapper[_WrappedBuffer]", variable has type "str") [assignment] tutorials/scripts/42_Sentence_Window_Retriever.py:148: error: Library stubs not installed for "requests" [import-untyped] tutorials/scripts/42_Sentence_Window_Retriever.py:148: note: Hint: "python3 -m pip install types-requests" tutorials/scripts/42_Sentence_Window_Retriever.py:155: error: "Document" has no attribute "iter_content" [attr-defined] tutorials/scripts/43_Building_a_Tool_Calling_Agent.py:217: error: Cannot find implementation or library stub for module named "IPython.display" [import-not-found] tutorials/scripts/35_Evaluating_RAG_Pipelines.py:81: error: Cannot find implementation or library stub for module named "datasets" [import-not-found] tutorials/scripts/35_Evaluating_RAG_Pipelines.py:305: error: Library stubs not installed for "pandas" [import-untyped] tutorials/scripts/35_Evaluating_RAG_Pipelines.py:305: note: Hint: "python3 -m pip install pandas-stubs" tutorials/scripts/35_Evaluating_RAG_Pipelines.py:307: error: Item "dict[str, list[Any]]" of "dict[str, list[Any]] | Any | str" has no attribute "nlargest" [union-attr] tutorials/scripts/35_Evaluating_RAG_Pipelines.py:307: error: Item "str" of "dict[str, list[Any]] | Any | str" has no attribute "nlargest" [union-attr] tutorials/scripts/35_Evaluating_RAG_Pipelines.py:308: error: Item "dict[str, list[Any]]" of "dict[str, list[Any]] | Any | str" has no attribute "nsmallest" [union-attr] tutorials/scripts/35_Evaluating_RAG_Pipelines.py:308: error: Item "str" of "dict[str, list[Any]] | Any | str" has no attribute "nsmallest" [union-attr] Found 32 errors in 13 files (checked 17 source files)

In short, this analysis suggests that no serious type issues related to Haystack types remain after these changes.

I also verified that reverting #7270 does not cause regressions in VSCode/Pylance.

Notes for the reviewer

I also rely on the fact that platform will test this change before the next Haystack release. If any serious issues come up, the changes are easy to revert.

Checklist

  • I have read the contributors guidelines and the code of conduct
  • I have updated the related issue with new insights and changes
  • I added unit tests and updated the docstrings
  • I've used one of the conventional commit types for my PR title: fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test: and added ! in case the PR includes breaking changes.
  • I documented my code
  • I ran pre-commit hooks and fixed any issue

@coveralls
Copy link
Collaborator

coveralls commented May 2, 2025

Pull Request Test Coverage Report for Build 14854285423

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 4 unchanged lines in 2 files lost coverage.
  • Overall coverage decreased (-0.007%) to 90.403%

Files with Coverage Reduction New Missed Lines %
components/preprocessors/document_preprocessor.py 2 95.56%
core/component/component.py 2 98.9%
Totals Coverage Status
Change from base Build 14838765775: -0.007%
Covered Lines: 10908
Relevant Lines: 12066

💛 - Coveralls

@github-actions github-actions bot added the type:documentation Improvements on the docs label May 5, 2025
@anakin87 anakin87 changed the title experimenting with py.typed feat: add py.typed May 5, 2025
@anakin87 anakin87 marked this pull request as ready for review May 5, 2025 15:22
@anakin87 anakin87 requested review from a team as code owners May 5, 2025 15:22
@anakin87 anakin87 requested review from dfokina, mpangrazzi, julian-risch and sjrl and removed request for a team and mpangrazzi May 5, 2025 15:22
Comment on lines +167 to +168
# Using `run: Callable[..., Dict[str, Any]]` directly leads to type errors: the protocol would expect a settable
# attribute `run`, while the actual implementation is a read-only method.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be addressed by defining run as a property getter instead: #9344

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, yours also seems like a valid solution.
The one proposed in this PR seems more explicit to me, but I would say it's a matter of taste.

@anakin87 anakin87 changed the title feat: add py.typed feat: add py.typed; adjust Component protocol May 6, 2025
Copy link
Contributor

@sjrl sjrl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

I removed a few more type: ignores in one of the end-to-end tests

@anakin87
Copy link
Member Author

anakin87 commented May 6, 2025

@julian-risch do you want to take a look as well?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic:core type:documentation Improvements on the docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

haystack-ai==2.13.0 missing py.typed file, which causes pylance to complain about missing stub file
4 participants