Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add PySCENIC #50445

Open
wants to merge 72 commits into
base: master
Choose a base branch
from
Open

Add PySCENIC #50445

wants to merge 72 commits into from

Conversation

LiliyaBioinf
Copy link

@LiliyaBioinf LiliyaBioinf commented Sep 2, 2024

Describe your pull request here


Please read the guidelines for Bioconda recipes before opening a pull request (PR).

General instructions

  • If this PR adds or updates a recipe, use "Add" or "Update" appropriately as the first word in its title.
  • New recipes not directly relevant to the biological sciences need to be submitted to the conda-forge channel instead of Bioconda.
  • PRs require reviews prior to being merged. Once your PR is passing tests and ready to be merged, please issue the @BiocondaBot please add label command.
  • Please post questions on Gitter or ping @bioconda/core in a comment.

Instructions for avoiding API, ABI, and CLI breakage issues

Conda is able to record and lock (a.k.a. pin) dependency versions used at build time of other recipes.
This way, one can avoid that expectations of a downstream recipe with regards to API, ABI, or CLI are violated by later changes in the recipe.
If not already present in the meta.yaml, make sure to specify run_exports (see here for the rationale and comprehensive explanation).
Add a run_exports section like this:

build:
  run_exports:
    - ...

with ... being one of:

Case run_exports statement
semantic versioning {{ pin_subpackage("myrecipe", max_pin="x") }}
semantic versioning (0.x.x) {{ pin_subpackage("myrecipe", max_pin="x.x") }}
known breakage in minor versions {{ pin_subpackage("myrecipe", max_pin="x.x") }} (in such a case, please add a note that shortly mentions your evidence for that)
known breakage in patch versions {{ pin_subpackage("myrecipe", max_pin="x.x.x") }} (in such a case, please add a note that shortly mentions your evidence for that)
calendar versioning {{ pin_subpackage("myrecipe", max_pin=None) }}

while replacing "myrecipe" with either name if a name|lower variable is defined in your recipe or with the lowercase name of the package in quotes.

Bot commands for PR management

Please use the following BiocondaBot commands:

Everyone has access to the following BiocondaBot commands, which can be given in a comment:

@BiocondaBot please update Merge the master branch into a PR.
@BiocondaBot please add label Add the please review & merge label.
@BiocondaBot please fetch artifacts Post links to CI-built packages/containers.
You can use this to test packages locally.

Note that the @BiocondaBot please merge command is now depreciated. Please just squash and merge instead.

Also, the bot watches for comments from non-members that include @bioconda/<team> and will automatically re-post them to notify the addressed <team>.

Summary by CodeRabbit

  • New Features
    • Introduced a new package configuration file for pyscenic, detailing versioning and dependencies.
    • Added a build script to streamline the installation process for the pyscenic package.
  • Bug Fixes
    • Updated import statements to utilize the standard multiprocessing library, improving compatibility and stability.

@pcm32 pcm32 changed the title test new adding files Add PySCENIC Sep 3, 2024
@pcm32 pcm32 closed this Sep 4, 2024
@pcm32 pcm32 reopened this Sep 4, 2024
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (4)
recipes/pyscenic/meta.yaml (4)

12-12: Consider removing the folder specification.

The folder key in the source section is not typically necessary for PyPI packages. Conda-build usually handles the extraction automatically. Unless there's a specific reason for this, consider removing this line:

-    folder: pyscenic-{{ version }}

This simplification aligns better with standard Bioconda practices.


26-26: Consider broadening the Python version range.

The current Python version restriction (>=3.10,<3.11) is quite narrow. Unless there's a specific reason for this limitation, consider broadening the range to allow for more flexibility:

-    - python >=3.10,<3.11
+    - python >=3.10,<3.12

This change would allow the package to work with Python 3.11 as well, increasing its compatibility.

Also applies to: 32-32, 38-38


67-69: Enhance test coverage.

The current test section only includes a basic import check. Consider adding more comprehensive tests to ensure the package is functioning correctly. For example:

test:
  imports:
    - pyscenic
  commands:
    - pyscenic --help
  requires:
    - pytest
  source_files:
    - tests
  commands:
    - pytest tests

This would run the package's test suite (if available) and check if the CLI is working correctly.


71-75: Enhance package metadata.

The current about section provides good basic information. Consider adding more metadata to improve the package's discoverability and provide more information to users. For example:

about:
  home: https://github.com/aertslab/pySCENIC
  summary: 'pySCENIC is a python implementation of the SCENIC pipeline'
  description: |
    pySCENIC automates the SCENIC workflow (Single-Cell rEgulatory Network Inference and Clustering) 
    for single-cell gene expression data. SCENIC is an algorithm that reverse-engineers gene regulatory 
    networks and identifies stable cell states from single-cell RNA-seq data.
  license: MIT
  license_family: MIT
  license_file: LICENSE
  doc_url: https://pyscenic.readthedocs.io/
  dev_url: https://github.com/aertslab/pySCENIC

This additional information helps users understand the package better and provides links to documentation and development resources.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 3078e4f and fd83fbc.

📒 Files selected for processing (1)
  • recipes/pyscenic/meta.yaml (1 hunks)
🧰 Additional context used
🪛 yamllint
recipes/pyscenic/meta.yaml

[error] 1-1: syntax error: found character '%' that cannot start any token

(syntax)

Comment on lines 14 to 23
build:
number: 0
noarch: python
script: |
pushd ${SRC_DIR}/pyscenic-{{ version }}
sed -i 's/from multiprocessing_on_dill.connection import Pipe/from multiprocessing import Pipe/' src/pyscenic/prune.py
sed -i 's/from multiprocessing_on_dill.context import Process/from multiprocessing import Process/' src/pyscenic/prune.py
${PYTHON} -m pip install . --use-pep517 --no-deps --ignore-installed -vv
popd

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Simplify the build process.

The current build script is more complex than typical Bioconda recipes. Consider simplifying it as follows:

  1. Instead of using sed commands in the build script, create a patch file for the necessary changes.
  2. Use the standard {{ PYTHON }} -m pip install . -vv command for installation.

Here's a suggested revision:

build:
  number: 0
  noarch: python
  script: "{{ PYTHON }} -m pip install . -vv"

source:
  - url: "https://pypi.io/packages/source/{{ name[0] }}/{{ name }}/{{ name }}-{{ version }}.tar.gz"
    sha256: "{{ sha256 }}"
  - patch: pyscenic_multiprocessing.patch

Create a patch file named pyscenic_multiprocessing.patch with the necessary changes to src/pyscenic/prune.py.

This approach is more maintainable and follows Bioconda best practices.


⚠️ Potential issue

Add run_exports to prevent breakage issues.

As mentioned in the PR objectives, it's important to specify run_exports in the meta.yaml file to prevent API, ABI, or CLI breakage issues. Consider adding the following to the build section:

build:
  number: 0
  noarch: python
  script: "{{ PYTHON }} -m pip install . -vv"
  run_exports:
    - {{ pin_subpackage(name, max_pin='x.x') }}

This ensures that packages depending on pyscenic will be rebuilt when there are potentially breaking changes in new versions.

Comment on lines 37 to 66
run:
- python >=3.10,<3.11
- aiohttp
- arboreto
- attrs
- boltons
- cloudpickle
- cytoolz
- dill
- dask >=2023.4.1
- distributed >=2023.4.1,<2023.5.0
- frozendict
- fsspec
- llvmlite
- loompy
- networkx
- numba >=0.51.2
- numexpr
- pandas >=1.3.5
- pyyaml
- requests
- scikit-learn
- scipy
- tqdm
- umap-learn
- numpy ==1.23.5
- pyarrow
- interlap
- ctxcore

Copy link
Contributor

@coderabbitai coderabbitai bot Sep 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Review and update dependency specifications.

  1. The multiprocessing_on_dill package is missing from the dependencies. As discussed in the past comments, this package has licensing issues. Consider one of the following options:
    a. Work with the upstream authors to resolve the licensing issue and add it to conda-forge.
    b. Include it as a pip-installed dependency in the build script, although this is not preferred by the Bioconda team.
    c. Find an alternative solution that doesn't require this package.

  2. The numpy version is pinned to a specific version (1.23.5), which might be too restrictive. Consider allowing a range of compatible versions instead.

  3. Some other dependencies might benefit from more flexible version constraints. Review each dependency and consider if strict version pinning is necessary.

Suggested changes:

+    - multiprocessing_on_dill  # TODO: Resolve licensing issue and add to conda-forge
-    - numpy ==1.23.5
+    - numpy >=1.23.5,<2

Please choose the most appropriate solution for handling the multiprocessing_on_dill dependency based on the options provided above.

Committable suggestion was skipped due to low confidence.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coderabbitai Item 1, the multiprocessing_on_dill import is replaced by multiprocessing in the patch, dill_patch.patch. multiprocessing_on_dill should not need to be in the dependencies.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aliciaaevans Thank you for the clarification. You're right, since multiprocessing_on_dill has been replaced with multiprocessing in the patch dill_patch.patch, there's no need to include multiprocessing_on_dill in the dependencies. I'll note this for future reviews.

(_/)
( •_•)
/ >💡


✏️ Learnings added
Learnt from: aliciaaevans
PR: bioconda/bioconda-recipes#50445
File: recipes/pyscenic/meta.yaml:35-64
Timestamp: 2024-10-03T18:16:54.451Z
Learning: In the `pyscenic` recipe, `multiprocessing_on_dill` has been replaced with the standard `multiprocessing` module via `dill_patch.patch`, so `multiprocessing_on_dill` is no longer a required dependency.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (2)
recipes/pyscenic/meta.yaml (2)

64-66: Consider adding more comprehensive tests.

The current test section includes a basic import check, which is a good start. However, to ensure the package is functioning correctly, consider adding more comprehensive tests. For example:

  • Test key functions or classes within the pyscenic package.
  • Add command-line tests if the package provides any CLI tools.
  • Include small dataset tests if applicable.

This will help catch potential issues early and ensure the package is working as expected in the Bioconda environment.


68-72: Consider adding more metadata fields.

The current about section provides essential information. To enhance the package description, consider adding the following fields:

  • doc_url: Link to the package documentation.
  • dev_url: Link to the development repository.
  • description: A more detailed description of the package and its features.
  • license_file: Path to the license file in the source distribution.

These additional fields will provide users with more information about the package and its development.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between fd83fbc and ace7c98.

📒 Files selected for processing (1)
  • recipes/pyscenic/meta.yaml (1 hunks)
🧰 Additional context used
🪛 yamllint
recipes/pyscenic/meta.yaml

[error] 1-1: syntax error: found character '%' that cannot start any token

(syntax)

🔇 Additional comments (3)
recipes/pyscenic/meta.yaml (3)

1-13: LGTM! Package and source sections are well-configured.

The package metadata and source configurations look good. The inclusion of both the PyPI tarball and a local patch file is noted.

Could you provide more information about the dill_patch.patch file? What specific changes does it make, and why is it necessary?

🧰 Tools
🪛 yamllint

[error] 1-1: syntax error: found character '%' that cannot start any token

(syntax)


14-19: Excellent build configuration!

The build section is well-configured:

  • Correct build number (0) for a new package.
  • Appropriate use of noarch: python.
  • run_exports is correctly set up to prevent API/ABI breakage issues.
  • The build script uses the recommended pip installation method.

This configuration addresses previous comments and follows Bioconda best practices.


1-72: Overall, the recipe is well-structured with minor improvements needed.

The meta.yaml file for pyscenic is generally well-configured and addresses many of the concerns raised in previous reviews. Here's a summary of the current state:

Strengths:

  1. Correct package and source configuration.
  2. Well-structured build section with appropriate run_exports settings.
  3. Comprehensive list of dependencies.

Areas for improvement:

  1. Resolve the multiprocessing_on_dill dependency issue.
  2. Consider relaxing version constraints for Python and numpy.
  3. Enhance the test section with more comprehensive checks.
  4. Add additional metadata fields in the about section.

Once these minor issues are addressed, the recipe should be ready for inclusion in Bioconda. Great work on getting the recipe to this stage!

🧰 Tools
🪛 yamllint

[error] 1-1: syntax error: found character '%' that cannot start any token

(syntax)

@aliciaaevans
Copy link
Contributor

@LiliyaBioinf feel free to pause code rabbit by commenting @ coderabbitai pause (without a space after the @) if it's comments are annoying while you are making changes.

@pcm32
Copy link
Member

pcm32 commented Sep 27, 2024

@LiliyaBioinf I would suggest looking at other recipes in bioconda that patch code, to see how they deal with that. I would expect the patching to need to happen on a build.sh (but I might be wrong - so I suggest that you check at other recipes that do this).

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Outside diff range and nitpick comments (3)
recipes/pyscenic/meta.yaml (3)

13-14: Clarify the purpose of the patch file.

The recipe includes a patch file dill_patch.patch. It's good practice to document the purpose of this patch and ensure it's necessary. Consider adding a comment explaining why this patch is needed and what it modifies in the source code.


66-68: Enhance test coverage.

The current test only checks if the package can be imported. Consider adding more comprehensive tests to ensure the package functions correctly. Some suggestions:

  1. Add command line interface (CLI) tests if pyscenic provides any CLI tools.
  2. Include a simple functional test that exercises core functionality.
  3. Check if key submodules can be imported.

Example:

test:
  imports:
    - pyscenic
    - pyscenic.cli
    - pyscenic.utils
  commands:
    - pyscenic --help  # If there's a CLI
    - python -c "from pyscenic import Module; assert Module('test').name == 'test'"  # Simple functional test

70-74: Enhance package metadata.

The current about section provides good basic information. Consider adding more metadata to improve the package's discoverability and provide more context for users. Some suggestions:

  1. Add a doc_url pointing to the package's documentation.
  2. Include a dev_url linking to the GitHub repository.
  3. Add description with a more detailed explanation of the package's functionality.
  4. Include license_file to specify the location of the license file in the source distribution.

Example:

about:
  home: https://github.com/aertslab/pySCENIC
  summary: 'pySCENIC is a python implementation of the SCENIC pipeline'
  description: |
    pySCENIC (Single-Cell rEgulatory Network Inference and Clustering) 
    is a computational method to infer Gene Regulatory Networks and cell 
    types from single-cell RNA-seq data.
  license: MIT
  license_family: MIT
  license_file: LICENSE
  doc_url: https://pyscenic.readthedocs.io/
  dev_url: https://github.com/aertslab/pySCENIC
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between f68cace and c3de49f.

📒 Files selected for processing (2)
  • recipes/pyscenic/build.sh (1 hunks)
  • recipes/pyscenic/meta.yaml (1 hunks)
🧰 Additional context used
🪛 Shellcheck
recipes/pyscenic/build.sh

[error] 1-1: Literal carriage return. Run script through tr -d '\r' .

(SC1017)


[error] 2-2: Literal carriage return. Run script through tr -d '\r' .

(SC1017)


[error] 3-3: Literal carriage return. Run script through tr -d '\r' .

(SC1017)


[error] 4-4: Literal carriage return. Run script through tr -d '\r' .

(SC1017)


[error] 5-5: Literal carriage return. Run script through tr -d '\r' .

(SC1017)

🪛 yamllint
recipes/pyscenic/meta.yaml

[error] 1-1: syntax error: found character '%' that cannot start any token

(syntax)

🔇 Additional comments (2)
recipes/pyscenic/meta.yaml (2)

1-74: Overall recipe review summary.

The recipe for pyscenic is generally well-structured and follows many Bioconda best practices. Here's a summary of the key points and suggestions for improvement:

  1. The package and source sections are correctly defined, including SHA256 checksums for security.
  2. The build section includes run_exports as requested, but the build process could be simplified.
  3. Dependencies are comprehensive, but some version constraints may need review, and the multiprocessing_on_dill dependency needs to be addressed.
  4. Test coverage could be expanded to ensure package functionality.
  5. The about section could benefit from additional metadata to improve discoverability and user information.

Addressing these points will enhance the recipe's quality, maintainability, and user experience. Once these suggestions are implemented, the recipe should be ready for inclusion in the Bioconda channel.

🧰 Tools
🪛 yamllint

[error] 1-1: syntax error: found character '%' that cannot start any token

(syntax)


23-65: ⚠️ Potential issue

Review and update dependency specifications.

  1. The Python version constraint (>=3.10,<3.11) might be too restrictive. Consider allowing a broader range of Python versions if possible, e.g., python >=3.10.

  2. Some dependencies have very specific version constraints. Review these and consider if they can be relaxed:

    • dask >=2023.4.1
    • distributed >=2023.4.1,<2023.5.0
    • numpy ==1.23.5
  3. The multiprocessing_on_dill package is missing from the dependencies. As discussed in past comments, this package has licensing issues. Consider one of the following options:
    a. Work with the upstream authors to resolve the licensing issue and add it to conda-forge.
    b. Include it as a pip-installed dependency in the build script, although this is not preferred by the Bioconda team.
    c. Find an alternative solution that doesn't require this package.

Please choose the most appropriate solution for handling the multiprocessing_on_dill dependency based on the options provided above.

To verify the impact of these changes, you can run the following script:

Comment on lines 1 to 2
#!/bin/bash
set -e
Copy link
Contributor

@coderabbitai coderabbitai bot Oct 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Remove carriage returns and approve set -e usage.

The shebang and set -e usage are correct. However, the static analysis tool detected carriage returns which can cause issues on Unix-like systems.

Please remove the carriage returns from the entire file. You can use the following command:

tr -d '\r' < build.sh > build_unix.sh && mv build_unix.sh build.sh

This will ensure compatibility across different operating systems.

🧰 Tools
🪛 Shellcheck

[error] 1-1: Literal carriage return. Run script through tr -d '\r' .

(SC1017)


[error] 2-2: Literal carriage return. Run script through tr -d '\r' .

(SC1017)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LiliyaBioinf, if you'd like to pause CodeRabbit's reviews on this pull request, you can use the command @coderabbitai stop in a comment. This will prevent CodeRabbit from posting further comments on this PR.

If you want to disable CodeRabbit for this repository entirely, you can adjust the repository's settings:

  1. Go to the repository's Settings tab.
  2. Click on Integrations or Installed GitHub Apps.
  3. Find CodeRabbit in the list of installed apps.
  4. Click on Configure and unselect this repository from CodeRabbit's access.

Let me know if you need any further assistance.

🔗 Analysis chain
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Fetch the CodeRabbit FAQ to find out how to pause CodeRabbit
curl -s https://docs.coderabbit.ai/faq/

Length of output: 34416

set -e
patch -p1 < ${RECIPE_DIR}/dill_patch.patch

$PYTHON -m pip install . --no-deps --ignore-installed -vv
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Consider alternatives to using pip for installation.

While the pip installation command is correctly formatted, using pip within a Conda recipe is generally discouraged by the Bioconda/core team.

According to the PR objectives, the consensus is against using pip to install additional packages within recipes. Instead, consider the following alternatives:

  1. Use $PYTHON setup.py install or $PYTHON -m build followed by installation of the built wheel.
  2. If setup.py is not available, use conda_build.build.build_wheel() to create a wheel, then install it.

Example:

$PYTHON -m build --wheel
$PYTHON -m pip install dist/*.whl --no-deps --ignore-installed -vv

This approach aligns better with Conda's package management philosophy and helps maintain consistency within the Bioconda ecosystem.

🧰 Tools
🪛 Shellcheck

[error] 5-5: Literal carriage return. Run script through tr -d '\r' .

(SC1017)

@@ -0,0 +1,5 @@
#!/bin/bash
set -e
patch -p1 < ${RECIPE_DIR}/dill_patch.patch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Missing dill_patch.patch file in RECIPE_DIR.

The build script is attempting to apply dill_patch.patch from ${RECIPE_DIR}, but the patch file was not found. Please ensure that dill_patch.patch is included in the repository and located in the appropriate directory referenced by RECIPE_DIR.

🔗 Analysis chain

Verify the contents and necessity of 'dill_patch.patch'.

The patch application looks correct. However, it's important to ensure that the patch file is necessary and contains the expected modifications.

Please verify the contents of the patch file:

Also, consider documenting the purpose of this patch in a comment above the patch application line.


'RECIPE_DIR' environment variable is not set or incorrect.

The build script failed to locate dill_patch.patch because RECIPE_DIR appears to be empty or incorrectly set. Please verify the value of the RECIPE_DIR environment variable by running the following script:

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Display the contents of the dill_patch.patch file

# Test: Check if the patch file exists and display its contents
if [ -f "${RECIPE_DIR}/dill_patch.patch" ]; then
    echo "Contents of dill_patch.patch:"
    cat "${RECIPE_DIR}/dill_patch.patch"
else
    echo "Error: dill_patch.patch not found in ${RECIPE_DIR}"
fi

Length of output: 115


Script:

#!/bin/bash
# Description: Display the value of RECIPE_DIR

echo "RECIPE_DIR is set to: '${RECIPE_DIR}'"

Length of output: 64


Script:

#!/bin/bash
# Description: Search the codebase for assignments or usages of RECIPE_DIR

echo "Searching for assignments or usages of RECIPE_DIR in the codebase..."

rg --ignore-case 'RECIPE_DIR' .

Length of output: 41950

🧰 Tools
🪛 Shellcheck

[error] 3-3: Literal carriage return. Run script through tr -d '\r' .

(SC1017)

noarch: python
run_exports:
- "{{ pin_subpackage(name, max_pin='x.x') }}"
script: bash ${RECIPE_DIR}/build.sh
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Simplify the build process.

Consider simplifying the build process by using the standard pip install command instead of a separate build script. This approach is more common for Python packages in Bioconda and easier to maintain.

Replace:

script: bash ${RECIPE_DIR}/build.sh

with:

script: "{{ PYTHON }} -m pip install . -vv"

If there are specific build steps required, consider incorporating them directly in the meta.yaml file or provide a clear rationale for using a separate build script.

Comment on lines 10 to 14
source:
- url: "https://pypi.io/packages/source/{{ name[0] }}/{{ name }}/{{ name }}-{{ version }}.tar.gz"
sha256: "{{ sha256 }}"
- path: dill_patch.patch
sha256: "{{ patch_sha256 }}"
Copy link
Member

@pcm32 pcm32 Oct 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to be that in other recipes they handle patches like this:

source:
  url: https://github.com/kcleal/gw/archive/refs/tags/v{{ version }}.tar.gz
  sha256: {{ sha256 }}
  patches:
    - fix-linker-issues.patch  # [linux]
    - fix-context.patch

and they don't seem to do anything on the build.sh. Can you please try like this and removing the patching part from the build? I suspect you might not need the build if this patching mechanism works. Thanks.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (2)
recipes/pyscenic/meta.yaml (2)

65-67: Consider expanding test coverage.

The current test section includes a basic import check for pyscenic, which is a good start.

To improve the robustness of the package, consider adding more comprehensive tests. For example:

  1. Check for the presence of key functions or classes within the pyscenic module.
  2. Add a simple functional test that exercises core functionality.
  3. Include checks for expected command-line tools, if any are provided by the package.

Example expansion:

test:
  imports:
    - pyscenic
  commands:
    - python -c "from pyscenic import cli; assert hasattr(cli, 'pyscenic')"
    - pyscenic --help

69-73: LGTM: About section is concise and accurate.

The about section provides essential information about the package, including the home URL, summary, and license details.

Consider adding a more detailed description of the package to help users understand its purpose and capabilities. You could include this under a description key. For example:

about:
  home: https://github.com/aertslab/pySCENIC
  summary: 'pySCENIC is a python implementation of the SCENIC pipeline'
  description: |
    pySCENIC (Single-Cell rEgulatory Network Inference and Clustering) 
    is a computational method to infer Gene Regulatory Networks and cell 
    types from single-cell RNA-seq data.
  license: MIT
  license_family: MIT
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between c3de49f and b203376.

📒 Files selected for processing (1)
  • recipes/pyscenic/meta.yaml (1 hunks)
🧰 Additional context used
🪛 yamllint
recipes/pyscenic/meta.yaml

[error] 1-1: syntax error: found character '%' that cannot start any token

(syntax)

🔇 Additional comments (3)
recipes/pyscenic/meta.yaml (3)

1-7: LGTM: Package metadata is well-defined.

The package metadata section is correctly structured using Jinja2 variables for the name, version, and SHA256 checksum. This approach enhances maintainability and reduces the risk of inconsistencies.

🧰 Tools
🪛 yamllint

[error] 1-1: syntax error: found character '%' that cannot start any token

(syntax)


15-20: LGTM: Build configuration follows best practices.

The build section is well-configured:

  • Correct use of build number and noarch: python.
  • Proper implementation of run_exports for dependency management.
  • Standard pip installation command for Python packages.

This configuration aligns with Bioconda best practices and should ensure consistent builds.


9-13: Verify the necessity of the patch file.

The source section is well-structured, with the URL correctly using Jinja2 variables. However, the inclusion of 'dill_patch.patch' warrants attention.

Could you please verify if this patch is still necessary? If possible, consider upstreaming these changes to reduce maintenance overhead. If the patch is still required, please provide a brief explanation of its purpose in a comment.

Comment on lines 22 to 64
requirements:
build:
- python >=3.10,<3.11
- pip
- setuptools
- wheel

host:
- python >=3.10,<3.11
- pip
- setuptools
- wheel

run:
- python >=3.10,<3.11
- aiohttp
- arboreto
- attrs
- boltons
- cloudpickle
- cytoolz
- dill
- dask >=2023.4.1
- distributed >=2023.4.1,<2023.5.0
- frozendict
- fsspec
- llvmlite
- loompy
- networkx
- numba >=0.51.2
- numexpr
- pandas >=1.3.5
- pyyaml
- requests
- scikit-learn
- scipy
- tqdm
- umap-learn
- numpy ==1.23.5
- pyarrow
- interlap
- ctxcore

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Review and update dependency specifications.

The requirements section is well-structured, but there are some points to consider:

  1. The Python version constraint (>=3.10,<3.11) might be too restrictive. Consider allowing a broader range of Python versions if possible.

  2. Some version constraints might be unnecessarily strict:

    • numpy is pinned to ==1.23.5
    • distributed is constrained to >=2023.4.1,<2023.5.0
  3. The multiprocessing_on_dill package is missing from the dependencies. As discussed in the past comments, this package has licensing issues.

Consider the following changes:

- python >=3.10,<3.11
+ python >=3.10
- numpy ==1.23.5
+ numpy >=1.23.5
- distributed >=2023.4.1,<2023.5.0
+ distributed >=2023.4.1

Regarding multiprocessing_on_dill:

  1. Work with the upstream authors to resolve the licensing issue and add it to conda-forge.
  2. If the above is not possible in the short term, consider including it as a pip-installed dependency in the build script, although this is not preferred by the Bioconda team.
  3. Investigate if there's an alternative solution that doesn't require this package.

Would you like assistance in implementing these changes or investigating alternatives for multiprocessing_on_dill?

Committable suggestion was skipped due to low confidence.

@LiliyaBioinf
Copy link
Author

LiliyaBioinf commented Oct 3, 2024

@aliciaaevans can you help me, please, by reviewing of what I have done - patching multiprocessing_of_dill. I created a patch file like in saw in bioconda recepies, but after running more than 30 min it is usually canceled. What could be the problem? How do deal with that?

@aliciaaevans
Copy link
Contributor

@aliciaaevans can you help me, please, by reviewing of what I have done - patching multiprocessing_of_dill. I created a patch file like in saw in bioconda recepies, but after running more than 30 min it is usually canceled. What could be the problem? How do deal with that?

Looks like it might be running out of RAM (or possibly CPU, but between the number of dependencies and some other PRs with a similar error, I'm guessing RAM). The runners we use have 2 core CPU, 7 GB of RAM, and 14 GB of SSD disk space. Have you tried building this locally to see if it completes? I'll take a deeper look but at first glance it seems OK.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (1)
recipes/pyscenic/meta.yaml (1)

15-26: LGTM with a suggestion: Build configuration is well-structured.

The build section is correctly configured with appropriate settings for a Python package. The inclusion of run_exports is excellent for maintaining compatibility.

However, consider modifying the installation script:

-  script: "{{ PYTHON }} -m pip install . --no-deps --no-build-isolation --no-cache-dir -vvv"
+  script: "{{ PYTHON }} -m pip install . --no-deps --ignore-installed --no-cache-dir -vvv"

This change removes --no-build-isolation and adds --ignore-installed, which may help prevent potential issues with existing installations while still allowing proper dependency resolution.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between e84caeb and 5d2ed24.

📒 Files selected for processing (1)
  • recipes/pyscenic/meta.yaml (1 hunks)
🧰 Additional context used
🪛 yamllint
recipes/pyscenic/meta.yaml

[error] 1-1: syntax error: found character '%' that cannot start any token

(syntax)

🔇 Additional comments (3)
recipes/pyscenic/meta.yaml (3)

1-7: LGTM: Package metadata is well-defined.

The package metadata section is correctly structured using Jinja2 variables for the name, version, and SHA256 checksum. The package name is appropriately lowercased, which is a good practice.

🧰 Tools
🪛 yamllint

[error] 1-1: syntax error: found character '%' that cannot start any token

(syntax)


9-13: LGTM: Source configuration is correct.

The source section is well-structured with a correctly templated URL and the inclusion of the dill_patch.patch file. This patch likely addresses the multiprocessing_on_dill replacement discussed in previous comments.


66-79: LGTM: Comprehensive metadata provided.

The about and extra sections are well-structured and provide comprehensive metadata for the package. The inclusion of DOI and biotools identifiers is excellent for discoverability and citation purposes.

Comment on lines +62 to +64
test:
imports:
- pyscenic
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Enhance test coverage.

While the import check is a good start, consider adding more comprehensive tests for a package of this complexity.

Suggested additions:

test:
  imports:
    - pyscenic
  commands:
    - pyscenic --help
    - db2feather --help
    - csv2loom --help
    - invertdb --help
    - gmt2regions --help

These additional tests would ensure that the CLI entry points are working correctly.

Comment on lines +28 to +60
requirements:
host:
- python >=3.6
- pip
run:
- python >=3.6
- aiohttp
- arboreto
- attrs
- boltons
- cloudpickle
- cytoolz
- dill
- dask-core >=2023.4.1
- distributed >=2023.4.1,<2023.5.0
- frozendict
- fsspec
- llvmlite
- loompy
- networkx
- numba >=0.51.2
- numexpr
- pandas >=1.3.5
- pyyaml
- requests
- scikit-learn
- scipy
- tqdm
- umap-learn
- numpy ==1.23.5
- pyarrow
- interlap
- ctxcore
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider updating dependency specifications.

The requirements section is comprehensive, but there are some points to consider:

  1. The Python version constraint (>=3.6) might be too permissive. Consider raising the minimum version to 3.8 or higher, as many of the dependencies likely require a more recent Python version.

  2. Some version constraints might need updating:

    • numpy is pinned to ==1.23.5, which might be too restrictive. Consider allowing a range of compatible versions.
    • distributed is constrained to >=2023.4.1,<2023.5.0, which might prevent using newer versions.
  3. The absence of multiprocessing_on_dill is correct, as it's been replaced by the standard multiprocessing module via the patch.

Consider the following changes:

- python >=3.6
+ python >=3.8
- numpy ==1.23.5
+ numpy >=1.23.5,<2
- distributed >=2023.4.1,<2023.5.0
+ distributed >=2023.4.1

These changes would allow for more flexibility in dependency versions while still maintaining compatibility.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
requirements:
host:
- python >=3.6
- pip
run:
- python >=3.6
- aiohttp
- arboreto
- attrs
- boltons
- cloudpickle
- cytoolz
- dill
- dask-core >=2023.4.1
- distributed >=2023.4.1,<2023.5.0
- frozendict
- fsspec
- llvmlite
- loompy
- networkx
- numba >=0.51.2
- numexpr
- pandas >=1.3.5
- pyyaml
- requests
- scikit-learn
- scipy
- tqdm
- umap-learn
- numpy ==1.23.5
- pyarrow
- interlap
- ctxcore
requirements:
host:
- python >=3.8
- pip
run:
- python >=3.8
- aiohttp
- arboreto
- attrs
- boltons
- cloudpickle
- cytoolz
- dill
- dask-core >=2023.4.1
- distributed >=2023.4.1
- frozendict
- fsspec
- llvmlite
- loompy
- networkx
- numba >=0.51.2
- numexpr
- pandas >=1.3.5
- pyyaml
- requests
- scikit-learn
- scipy
- tqdm
- umap-learn
- numpy >=1.23.5,<2
- pyarrow
- interlap
- ctxcore

@LiliyaBioinf
Copy link
Author

@aliciaaevans yes, i ran it locally and the package built. It was on 1 core and 16 Gb RAM. It also seemed to me that it lacked memory here, because I tried different approaches of patching.

@pcm32
Copy link
Member

pcm32 commented Oct 4, 2024

Don't worry about the patching as long as it is correctly applied, memory wise the patching should be negligible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants