-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unified graph samplers for PyG backend #95
base: dev
Are you sure you want to change the base?
Conversation
raise NotImplementedError | ||
|
||
|
||
class SampledSubgraph: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个类的存在是为了什么?能不能直接用一些已有的表Graph的类
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
是的,这个类的存在目前只是一个没有意义的容器,只是为了预备给未来可能的需求。可以不要(听您的)
raise NotImplementedError | ||
|
||
|
||
class PyGHomogeneousGraphSampler(PyGGraphSampler): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
既然同构图没有统一需要的接口,也还没有异构图sampling,这个类是不是可以不用?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
是的,可以不要(听您的)
class PyGSampledSubgraph(_graph_sampler.SampledSubgraph): | ||
@property | ||
def data(self) -> torch_geometric.data.Data: | ||
raise NotImplementedError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同理,这个能不能复用已有的类
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
是的,可以不要(听您的)
sampler_configurations: typing.Mapping[str, typing.Any], **kwargs | ||
): | ||
super(PyGClusterSampler, self).__init__() | ||
_filtered_configurations, _remaining_configurations = _sampler_utility.ConfigurationsFilter( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这一系列筛选参数的模式虽然是可以,但是感觉比较繁琐。如果直接把kwargs扔给pyg里的类做初始化会有一方么问题吗
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PyG在实现某些Sampling方式的时候,例如ClusterGCN,是分为两个功能类实现的。
ClusterData: https://pytorch-geometric.readthedocs.io/en/latest/modules/loader.html#torch_geometric.loader.ClusterData
ClusterLoader: https://pytorch-geometric.readthedocs.io/en/latest/modules/loader.html#torch_geometric.loader.ClusterLoader
两个功能类所需要的参数不同。由于我们把几个代表性的sampling方法做了这样的一个统一的抽象,因此我们必须从接收到的configurations中提取两个类所分别需要的参数,分别实例化两个功能类。
诚然,可以简单地针对kwargs提取所需的参数,但是由于各个sampler所需的配置参数是不同的,为了User-friendly,在用户传入错误的配置组合(未指定需要的参数,或传入参数类型错误)时提供比较好的报错信息,抽象了ConfigurationsFilter
这一util类,是内部类。
def __init__(self, data: torch_geometric.data.Data, *_args, **_kwargs): | ||
if not isinstance(data, torch_geometric.data.Data): | ||
raise TypeError | ||
self._data: torch_geometric.data.Data = data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这块同理,是不是可以先不用这个类
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
是的,可以不用。(听您的)
As far as I realize, various representative graph sampling mechanisms (e.g., Neighbor Sampling, GraphSAINT, cluster sampling, etc.) provided by PyTorch-Geometric can be unified as a unified interface, all the provided sampling methods are wrapped as subclasses of
module.sampling.graph_sampler.PyGGraphSampler
, and thus can be instantiated by a generic__init__(self, data, sampler_configurations, **kwargs)
method. Besides, an auxiliary factory methodinstantiate_graph_sampler
is implemented inmodule.sampling.graph_sampler
package to instantiate different provided samplers with a unified constructor.However, different graph samplers require diverse configurations, according to different corresponding methodology. It's intractable for me to figure out a more elegant approach to unify different sampling methods. A simple example to demonstrate the unified sampler interface is provided in test_pyg_graph_sampler_pipeline.py.
Configurations for certain sampler to be adopted are assumed to be specified by user via either YAML file or programatic arguments, which should be handled by solver
If you have any reasonable suggestions on the unified sampling interface, please don't hesitate to communicate with me.