A library to fetch Reddit data using Reddit WebAPI (gateway), w/ pushshift historical submissions support.
Documentation · Replit Playground
Although there are existing libraries (e.g. praw) to interact with Reddit developer's API, there are still several drawbacks when we trying to collect vast amount of data. gatered tries to counter these problems to provide these features:
- No authentication (API key) is needed to access the data.
- Extra attributes is presented using the Reddit webAPI compared to the public devAPI.
- Fully Async based.
- Proxy support via httpx.
You can install this library easily from pypi:
# with pip
pip install gatered
# with poetry
poetry add gatered
The library provides easy functions to get start fast:
gatered.func.get_post_comments
gatered.func.get_posts_with_subreddit_info
gatered.func.get_posts
gatered.func.get_comments
gatered.func.get_pushshift_posts
Alternatively you can directly use gatered.client.Client
and gatered.pushshift.PushShiftAPI
classes as your base to implement your own logics.
Errors can be handled by importing either gatered.RequestError
or gatered.HTTPStatusError
,
see httpx exceptions to learn more.
See examples/
for more examples.
Alternately, you can fork the example repo on Replit and play around online.
Copyright 2022 CounterTek
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.