-
Notifications
You must be signed in to change notification settings - Fork 28
[WIP] ElementEmbeddings integration #442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #442 +/- ##
===========================================
- Coverage 80.66% 79.48% -1.18%
===========================================
Files 33 34 +1
Lines 2871 2935 +64
===========================================
+ Hits 2316 2333 +17
- Misses 555 602 +47 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
WalkthroughThis update introduces a new tutorial notebook demonstrating the integration of element embeddings with chemical screening using the Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Notebook
participant smact.screening
participant smact.io.elementembeddings
User->>Notebook: Run tutorial cells
Notebook->>smact.screening: smact_filter(elements, return_output)
smact.screening-->>Notebook: allowed_formulas (various formats)
Notebook->>smact.io.elementembeddings: composition_featuriser(allowed_formulas)
smact.io.elementembeddings-->>Notebook: feature_vectors
Notebook-->>User: Display formulas and feature vectors
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~15–20 minutes Possibly related PRs
Suggested labels
Poem
Note ⚡️ Unit Test Generation is now available in beta!Learn more here, or try it out under "Finishing Touches" below. 📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (5)
smact/utils/composition.py (1)
96-105
: Fix type annotation inconsistency in docstring.The function signature correctly shows
tuple[str, int, int] | tuple[str, int]
but the docstring only mentionstuple[str, int, int]
.Args: - smact_filter_output (tuple[str, int, int]): An item in the list returned from smact_filter + smact_filter_output (tuple[str, int, int] | tuple[str, int]): An item in the list returned from smact_filter Returns: composition_dict (dict[str, float]): An composition dictionaryThe implementation correctly reuses
comp_maker
and follows the same pattern asformula_maker
.smact/screening.py (2)
337-338
: Consider refactoring to reduce code duplication in match statements.The match statements at lines 426-433 and 438-444 are nearly identical. Consider extracting the format conversion logic into a helper function.
+def _format_compositions(compositions, return_output): + """Helper function to format compositions based on return_output type.""" + match return_output: + case SmactFilterOutputs.default: + return compositions + case SmactFilterOutputs.formula: + return [formula_maker(smact_filter_output=comp) for comp in compositions] + case SmactFilterOutputs.dict: + return [composition_dict_maker(smact_filter_output=comp) for comp in compositions] # Return list depending on whether we are interested in unique species combinations # or just unique element combinations. if species_unique: - match return_output: - case SmactFilterOutputs.default: - return compositions - case SmactFilterOutputs.formula: - return [formula_maker(smact_filter_output=comp) for comp in compositions] - case SmactFilterOutputs.dict: - return [composition_dict_maker(smact_filter_output=comp) for comp in compositions] + return _format_compositions(compositions, return_output) else: compositions = [(i[0], i[2]) for i in compositions] compositions = list(set(compositions)) - match return_output: - case SmactFilterOutputs.default: - return compositions - case SmactFilterOutputs.formula: - return [formula_maker(smact_filter_output=comp) for comp in compositions] - case SmactFilterOutputs.dict: - return [composition_dict_maker(smact_filter_output=comp) for comp in compositions] + return _format_compositions(compositions, return_output)
354-354
: Improve docstring formatting and add return type documentation.The docstring should better document the different return types based on the
return_output
parameter.- return_output (SmactFilterOutputs): If set to 'default', the function will return a list of tuples containing the tuples of symbols, oxidation states and stoichiometry values. "Formula" returns a list of formulas and "dict" returns a list of dictionaries. + return_output (SmactFilterOutputs): Controls the output format: + - 'default': List of tuples containing symbols, oxidation states and stoichiometry values + - 'formula': List of chemical formula strings + - 'dict': List of composition dictionaries Returns: ------- - allowed_comps (list): Allowed compositions for that chemical system - in the form [(elements), (oxidation states), (ratios)] if species_unique=True and tuple=False - or in the form [(elements), (ratios)] if species_unique=False and tuple=False. + allowed_comps (list): Allowed compositions for that chemical system. + Return type depends on return_output parameter: + - SmactFilterOutputs.default: List[tuple] with format depending on species_unique + - SmactFilterOutputs.formula: List[str] of chemical formulas + - SmactFilterOutputs.dict: List[dict] of composition dictionariessmact/io/elementembeddings.py (2)
17-31
: Address the TODO comment about moving enums to ElementEmbeddings.The comment suggests this enum should be moved to the ElementEmbeddings codebase. Consider whether this duplication is necessary or if the enums should be imported from ElementEmbeddings directly.
If these enums are temporary, consider adding a TODO comment with a timeline or GitHub issue reference. If they're permanent, remove the comment and ensure they stay in sync with ElementEmbeddings capabilities.
39-49
: Consider the same enum consolidation for PoolingStats.Similar to the previous enum, this also has a comment suggesting it should be moved to ElementEmbeddings codebase.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lock
is excluded by!**/*.lock
📒 Files selected for processing (5)
docs/tutorials/element_embeddings_integration.ipynb
(1 hunks)smact/io/__init__.py
(1 hunks)smact/io/elementembeddings.py
(1 hunks)smact/screening.py
(5 hunks)smact/utils/composition.py
(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
smact/screening.py (1)
smact/structure_prediction/structure.py (1)
composition
(575-595)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: test (3.12, macos-latest)
- GitHub Check: test (3.13, ubuntu-latest)
- GitHub Check: test (3.13, macos-latest)
🔇 Additional comments (6)
smact/io/__init__.py (1)
1-1
: LGTM! Clean package initialisation.The docstring appropriately describes the purpose of this package as an interface to external libraries.
docs/tutorials/element_embeddings_integration.ipynb (2)
732-753
: Excellent tutorial demonstrating the integration workflow.This cell effectively shows the complete workflow from element screening to feature vector generation. The imports are correct and the progression from
smact_filter
with the newSmactFilterOutputs.formula
option tocomposition_featuriser
demonstrates the practical value of the integration.
898-902
: Good demonstration of the raw output format.This second cell shows users what the actual feature vector arrays look like, which is valuable for understanding the output format.
smact/screening.py (1)
27-33
: Well-designed enum for output format options.The
SmactFilterOutputs
enum provides clear, descriptive options for the different return formats. The use ofauto()
ensures consistent string values.smact/io/elementembeddings.py (2)
70-102
: Well-documented wrapper function with clear parameter descriptions.The
species_composition_featuriser
function provides good documentation and maintains consistency with the ElementEmbeddings interface. The parameter passing is straightforward and appropriate.
53-67
: Verify ElementEmbeddings wrapper parametersPlease ensure that our
composition_featuriser
wrapper remains aligned with the upstream ElementEmbeddings API:
- Confirm that
ee_composition_featuriser
inelementembeddings.composition
actually expects parameters nameddata
,formula_column
,embedding
,stats
andinplace
.- Verify that the allowed types (
pd.DataFrame | pd.Series | CompositionalEmbedding | list
for data;Embedding | AllowedElementEmbeddings
for embedding; andPoolingStats | list[PoolingStats]
for stats) match the upstream function’s signature.- Pin and document the minimum ElementEmbeddings version that our wrapper was tested against, so future releases don’t silently break this interface.
- Add an integration test (or CI check) that exercises the wrapper end-to-end with a known ElementEmbeddings version to catch breaking changes early.
ElementEmbeddings Integration
Description
Type of change
New feature (non-breaking change which adds functionality)
This change requires a documentation update
How Has This Been Tested?
(TODO)
Test Configuration:
Reviewers
Checklist
Summary by CodeRabbit
New Features
Documentation
Refactor