Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unified graph samplers for PyG backend #95

Draft
wants to merge 2 commits into
base: dev
Choose a base branch
from
Draft

Conversation

TheLuminary
Copy link

As far as I realize, various representative graph sampling mechanisms (e.g., Neighbor Sampling, GraphSAINT, cluster sampling, etc.) provided by PyTorch-Geometric can be unified as a unified interface, all the provided sampling methods are wrapped as subclasses of module.sampling.graph_sampler.PyGGraphSampler, and thus can be instantiated by a generic __init__(self, data, sampler_configurations, **kwargs) method. Besides, an auxiliary factory method instantiate_graph_sampler is implemented in module.sampling.graph_sampler package to instantiate different provided samplers with a unified constructor.
However, different graph samplers require diverse configurations, according to different corresponding methodology. It's intractable for me to figure out a more elegant approach to unify different sampling methods. A simple example to demonstrate the unified sampler interface is provided in test_pyg_graph_sampler_pipeline.py.
Configurations for certain sampler to be adopted are assumed to be specified by user via either YAML file or programatic arguments, which should be handled by solver

If you have any reasonable suggestions on the unified sampling interface, please don't hesitate to communicate with me.

Upcoming: I would introduce a novel base class SearchableGraphSampler to support auto (HP-optimizable) sampler

raise NotImplementedError


class SampledSubgraph:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个类的存在是为了什么?能不能直接用一些已有的表Graph的类

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

是的,这个类的存在目前只是一个没有意义的容器,只是为了预备给未来可能的需求。可以不要(听您的)

raise NotImplementedError


class PyGHomogeneousGraphSampler(PyGGraphSampler):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

既然同构图没有统一需要的接口,也还没有异构图sampling,这个类是不是可以不用?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

是的,可以不要(听您的)

class PyGSampledSubgraph(_graph_sampler.SampledSubgraph):
@property
def data(self) -> torch_geometric.data.Data:
raise NotImplementedError
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同理,这个能不能复用已有的类

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

是的,可以不要(听您的)

sampler_configurations: typing.Mapping[str, typing.Any], **kwargs
):
super(PyGClusterSampler, self).__init__()
_filtered_configurations, _remaining_configurations = _sampler_utility.ConfigurationsFilter(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这一系列筛选参数的模式虽然是可以,但是感觉比较繁琐。如果直接把kwargs扔给pyg里的类做初始化会有一方么问题吗

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PyG在实现某些Sampling方式的时候,例如ClusterGCN,是分为两个功能类实现的。

ClusterData: https://pytorch-geometric.readthedocs.io/en/latest/modules/loader.html#torch_geometric.loader.ClusterData
ClusterLoader: https://pytorch-geometric.readthedocs.io/en/latest/modules/loader.html#torch_geometric.loader.ClusterLoader

两个功能类所需要的参数不同。由于我们把几个代表性的sampling方法做了这样的一个统一的抽象,因此我们必须从接收到的configurations中提取两个类所分别需要的参数,分别实例化两个功能类。
诚然,可以简单地针对kwargs提取所需的参数,但是由于各个sampler所需的配置参数是不同的,为了User-friendly,在用户传入错误的配置组合(未指定需要的参数,或传入参数类型错误)时提供比较好的报错信息,抽象了ConfigurationsFilter这一util类,是内部类。

def __init__(self, data: torch_geometric.data.Data, *_args, **_kwargs):
if not isinstance(data, torch_geometric.data.Data):
raise TypeError
self._data: torch_geometric.data.Data = data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这块同理,是不是可以先不用这个类

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

是的,可以不用。(听您的)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants