Skip to content

DEP Remove pytorch from environment.yml #49796

Closed
@MarcoGorelli

Description

@MarcoGorelli

It's one of the heaviest dependencies, and I think it's responsible for bringing in cudatoolkit, cudnn, and mkl

$ grep '"size":' ${CONDA_PREFIX}/conda-meta/*.json | sort -k3rn | column -t | sed 's|/home/marcogorelli/mambaforge/envs/pandas-dev/conda-meta||g' | head -n 20
/cudatoolkit-11.7.0-hd8887f6_10.json:                          "size":  871987469,
/cudnn-8.4.1.50-hed8a83a_0.json:                               "size":  648438607,
/pytorch-1.12.1-cuda112py38hd94e077_201.json:                  "size":  514342178,
/mkl-2022.1.0-h84fe81f_915.json:                               "size":  209326825,
/nccl-2.14.3.1-h0800d71_0.json:                                "size":  152235202,
/magma-2.5.4-h6103c52_2.json:                                  "size":  93350293,
/qt-main-5.15.6-hc525480_0.json:                               "size":  64435513,
/gcc_impl_linux-64-10.4.0-h7ee1905_16.json:                    "size":  48931715,
/pillow-9.2.0-py38ha3b2c9c_2.json:                             "size":  47380059,
/libllvm14-14.0.6-he0ac6c6_0.json:                             "size":  36954351,
/sysroot_linux-64-2.12-he073ed8_15.json:                       "size":  32940552,
/arrow-cpp-9.0.0-py38hc370d79_10_cpu.json:                     "size":  32731227,
/pandoc-2.19.2-h32600fe_1.json:                                "size":  31408180,
/libllvm11-11.1.0-hf817b99_3.json:                             "size":  30536060,
/scipy-1.9.3-py38h8ce737c_2.json:                              "size":  27570711,
/python-3.8.13-h582c2e5_0_cpython.json:                        "size":  26366309,
/libdb-6.2.32-h9c3ff4c_0.json:                                 "size":  24409456,
/mypy-0.981-py38h0a891b7_0.json:                               "size":  16946789,
/icu-70.1-h27087fc_0.json:                                     "size":  14191488,
/bokeh-2.4.3-pyhd8ed1ab_3.json:                                "size":  13940985,

and as far as I can tell it's only used in a single ASV benchmark:

class From3rdParty:
# GH#44616
def setup(self):
try:
import torch
except ImportError:
raise NotImplementedError
row = 700000
col = 64
self.val_tensor = torch.randn(row, col)
def time_from_torch(self):
DataFrame(self.val_tensor)

In which case, it feels a bit wasteful to have everyone install it. We could noticeably cut down the environment size by reducing it - not all contributors have the fastest internet connections so this'd make a difference

Metadata

Metadata

Assignees

No one assigned

    Labels

    DependenciesRequired and optional dependencies

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions