Skip to content

Conversation

AntObi
Copy link
Collaborator

@AntObi AntObi commented Jul 31, 2025

ElementEmbeddings Integration

Description

  • Wrappers to ElementEmbeddings

Type of change

  • New feature (non-breaking change which adds functionality)

  • This change requires a documentation update

How Has This Been Tested?

(TODO)

  • Test A
  • Test B

Test Configuration:

  • Python version: 3.11
  • Operating System: macOS

Reviewers

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules
  • I have checked my code and corrected any misspellings

Summary by CodeRabbit

  • New Features

    • Introduced a Jupyter notebook tutorial demonstrating how to generate feature vectors for chemical compositions using element embeddings.
    • Added new options to filter and output chemical compositions in various formats, including formula strings and dictionaries.
    • Provided functions for generating feature vectors from chemical formulas and species compositions with flexible output options.
  • Documentation

    • Added a tutorial notebook illustrating the integration of element embeddings with chemical composition screening.
  • Refactor

    • Introduced new enumerations and output formats for improved flexibility in filtering and featurising chemical compositions.

Copy link

codecov bot commented Jul 31, 2025

Codecov Report

❌ Patch coverage is 28.78788% with 47 lines in your changes missing coverage. Please review.
✅ Project coverage is 79.48%. Comparing base (515a369) to head (a34e4e8).

Files with missing lines Patch % Lines
smact/io/elementembeddings.py 0.00% 37 Missing ⚠️
smact/screening.py 66.66% 9 Missing ⚠️
smact/utils/composition.py 50.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #442      +/-   ##
===========================================
- Coverage    80.66%   79.48%   -1.18%     
===========================================
  Files           33       34       +1     
  Lines         2871     2935      +64     
===========================================
+ Hits          2316     2333      +17     
- Misses         555      602      +47     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

coderabbitai bot commented Jul 31, 2025

Walkthrough

This update introduces a new tutorial notebook demonstrating the integration of element embeddings with chemical screening using the smact library. Supporting this, new modules and utility functions are added for feature vector generation and flexible output formatting. The smact_filter function is enhanced with multiple output formats, and new enums and wrappers are provided for embedding operations.

Changes

Cohort / File(s) Change Summary
Tutorial Example
docs/tutorials/element_embeddings_integration.ipynb
New Jupyter notebook tutorial added, showing how to filter element combinations and generate feature vectors using smact_filter and composition_featuriser.
Element Embeddings Interface
smact/io/elementembeddings.py
New interface module providing enums and wrapper functions for element and species embeddings, exposing composition_featuriser and species_composition_featuriser with detailed type annotations and docstrings.
IO Module Init
smact/io/__init__.py
New __init__.py file added with a docstring; no functional code.
Screening Output Flexibility
smact/screening.py
Enhanced smact_filter function to support multiple output formats via new SmactFilterOutputs enum; updated function signature and return types, with branching logic for format selection.
Composition Utilities
smact/utils/composition.py
Added composition_dict_maker function to convert smact_filter outputs into composition dictionaries, complementing existing conversion utilities.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Notebook
    participant smact.screening
    participant smact.io.elementembeddings

    User->>Notebook: Run tutorial cells
    Notebook->>smact.screening: smact_filter(elements, return_output)
    smact.screening-->>Notebook: allowed_formulas (various formats)
    Notebook->>smact.io.elementembeddings: composition_featuriser(allowed_formulas)
    smact.io.elementembeddings-->>Notebook: feature_vectors
    Notebook-->>User: Display formulas and feature vectors
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~15–20 minutes

Possibly related PRs

  • Utility module updates #307: Implements and tests utility functions (comp_maker, formula_maker) for converting smact_filter outputs, which are used or extended in this PR for flexible output and embedding integration.

Suggested labels

python

Poem

A rabbit hopped through code anew,
With filters, enums, wrappers too!
Now elements combine with ease,
And vectors form as you please.
In notebooks bright, embeddings gleam—
Chemistry and code, a perfect team!
🐇✨

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 403bcea and a34e4e8.

📒 Files selected for processing (2)
  • smact/io/elementembeddings.py (1 hunks)
  • smact/screening.py (5 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • smact/io/elementembeddings.py
  • smact/screening.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: test (3.11, ubuntu-latest)
  • GitHub Check: test (3.11, macos-latest)
  • GitHub Check: test (3.11, windows-latest)
  • GitHub Check: test (3.13, ubuntu-latest)
  • GitHub Check: test (3.13, macos-latest)
  • GitHub Check: test (3.12, macos-latest)
  • GitHub Check: test (3.12, windows-latest)
  • GitHub Check: test (3.10, ubuntu-latest)
  • GitHub Check: test (3.12, ubuntu-latest)
  • GitHub Check: test (3.10, windows-latest)
  • GitHub Check: test (3.10, macos-latest)
✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/io_element_embeddings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai or @coderabbitai title anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (5)
smact/utils/composition.py (1)

96-105: Fix type annotation inconsistency in docstring.

The function signature correctly shows tuple[str, int, int] | tuple[str, int] but the docstring only mentions tuple[str, int, int].

    Args:
-        smact_filter_output (tuple[str, int, int]): An item in the list returned from smact_filter
+        smact_filter_output (tuple[str, int, int] | tuple[str, int]): An item in the list returned from smact_filter

    Returns:
        composition_dict (dict[str, float]): An composition dictionary

The implementation correctly reuses comp_maker and follows the same pattern as formula_maker.

smact/screening.py (2)

337-338: Consider refactoring to reduce code duplication in match statements.

The match statements at lines 426-433 and 438-444 are nearly identical. Consider extracting the format conversion logic into a helper function.

+def _format_compositions(compositions, return_output):
+    """Helper function to format compositions based on return_output type."""
+    match return_output:
+        case SmactFilterOutputs.default:
+            return compositions
+        case SmactFilterOutputs.formula:
+            return [formula_maker(smact_filter_output=comp) for comp in compositions]
+        case SmactFilterOutputs.dict:
+            return [composition_dict_maker(smact_filter_output=comp) for comp in compositions]

    # Return list depending on whether we are interested in unique species combinations
    # or just unique element combinations.
    if species_unique:
-        match return_output:
-            case SmactFilterOutputs.default:
-                return compositions
-            case SmactFilterOutputs.formula:
-                return [formula_maker(smact_filter_output=comp) for comp in compositions]
-            case SmactFilterOutputs.dict:
-                return [composition_dict_maker(smact_filter_output=comp) for comp in compositions]
+        return _format_compositions(compositions, return_output)

    else:
        compositions = [(i[0], i[2]) for i in compositions]

        compositions = list(set(compositions))
-        match return_output:
-            case SmactFilterOutputs.default:
-                return compositions
-            case SmactFilterOutputs.formula:
-                return [formula_maker(smact_filter_output=comp) for comp in compositions]
-            case SmactFilterOutputs.dict:
-                return [composition_dict_maker(smact_filter_output=comp) for comp in compositions]
+        return _format_compositions(compositions, return_output)

354-354: Improve docstring formatting and add return type documentation.

The docstring should better document the different return types based on the return_output parameter.

-        return_output (SmactFilterOutputs): If set to 'default', the function will return a list of tuples containing the tuples of symbols, oxidation states and stoichiometry values. "Formula" returns a list of formulas and "dict" returns a list of dictionaries.
+        return_output (SmactFilterOutputs): Controls the output format:
+            - 'default': List of tuples containing symbols, oxidation states and stoichiometry values
+            - 'formula': List of chemical formula strings  
+            - 'dict': List of composition dictionaries

    Returns:
    -------
-        allowed_comps (list): Allowed compositions for that chemical system
-        in the form [(elements), (oxidation states), (ratios)] if species_unique=True and tuple=False
-        or in the form [(elements), (ratios)] if species_unique=False and tuple=False.
+        allowed_comps (list): Allowed compositions for that chemical system.
+        Return type depends on return_output parameter:
+        - SmactFilterOutputs.default: List[tuple] with format depending on species_unique
+        - SmactFilterOutputs.formula: List[str] of chemical formulas
+        - SmactFilterOutputs.dict: List[dict] of composition dictionaries
smact/io/elementembeddings.py (2)

17-31: Address the TODO comment about moving enums to ElementEmbeddings.

The comment suggests this enum should be moved to the ElementEmbeddings codebase. Consider whether this duplication is necessary or if the enums should be imported from ElementEmbeddings directly.

If these enums are temporary, consider adding a TODO comment with a timeline or GitHub issue reference. If they're permanent, remove the comment and ensure they stay in sync with ElementEmbeddings capabilities.


39-49: Consider the same enum consolidation for PoolingStats.

Similar to the previous enum, this also has a comment suggesting it should be moved to ElementEmbeddings codebase.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 515a369 and 403bcea.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (5)
  • docs/tutorials/element_embeddings_integration.ipynb (1 hunks)
  • smact/io/__init__.py (1 hunks)
  • smact/io/elementembeddings.py (1 hunks)
  • smact/screening.py (5 hunks)
  • smact/utils/composition.py (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
smact/screening.py (1)
smact/structure_prediction/structure.py (1)
  • composition (575-595)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: test (3.12, macos-latest)
  • GitHub Check: test (3.13, ubuntu-latest)
  • GitHub Check: test (3.13, macos-latest)
🔇 Additional comments (6)
smact/io/__init__.py (1)

1-1: LGTM! Clean package initialisation.

The docstring appropriately describes the purpose of this package as an interface to external libraries.

docs/tutorials/element_embeddings_integration.ipynb (2)

732-753: Excellent tutorial demonstrating the integration workflow.

This cell effectively shows the complete workflow from element screening to feature vector generation. The imports are correct and the progression from smact_filter with the new SmactFilterOutputs.formula option to composition_featuriser demonstrates the practical value of the integration.


898-902: Good demonstration of the raw output format.

This second cell shows users what the actual feature vector arrays look like, which is valuable for understanding the output format.

smact/screening.py (1)

27-33: Well-designed enum for output format options.

The SmactFilterOutputs enum provides clear, descriptive options for the different return formats. The use of auto() ensures consistent string values.

smact/io/elementembeddings.py (2)

70-102: Well-documented wrapper function with clear parameter descriptions.

The species_composition_featuriser function provides good documentation and maintains consistency with the ElementEmbeddings interface. The parameter passing is straightforward and appropriate.


53-67: Verify ElementEmbeddings wrapper parameters

Please ensure that our composition_featuriser wrapper remains aligned with the upstream ElementEmbeddings API:

  • Confirm that ee_composition_featuriser in elementembeddings.composition actually expects parameters named data, formula_column, embedding, stats and inplace.
  • Verify that the allowed types (pd.DataFrame | pd.Series | CompositionalEmbedding | list for data; Embedding | AllowedElementEmbeddings for embedding; and PoolingStats | list[PoolingStats] for stats) match the upstream function’s signature.
  • Pin and document the minimum ElementEmbeddings version that our wrapper was tested against, so future releases don’t silently break this interface.
  • Add an integration test (or CI check) that exercises the wrapper end-to-end with a known ElementEmbeddings version to catch breaking changes early.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant