Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/twitter post metrics #135

Merged
merged 4 commits into from
May 25, 2024
Merged

Conversation

henry410213028
Copy link
Collaborator

Types of changes

  • New feature

Description

A new DAG for scraping Twitter post and insights data.

Checklist:

  • Add test cases to all the changes you introduce
  • Run poetry run pytest locally to ensure all linter checks pass
  • Update the documentation if necessary

python_callable=udfs.save_twitter_posts_and_insights,
)

CREATE_TABLE_IF_NEEDED >> SAVE_TWITTER_POSTS_AND_INSIGHTS
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

Comment on lines +44 to +48
new_posts = [
post
for post in posts
if post["timestamp"] > last_post["created_at"].timestamp()
]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice


def request_posts_data() -> List[dict]:
url = "https://twitter154.p.rapidapi.com/user/tweets"
# 499339900 is PyConTW's twitter id
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

# 499339900 is PyConTW's twitter id
querystring = {
"username": "pycontw",
"user_id": "96479162",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the user id diff to the comment?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

Comment on lines +110 to +112
if response.ok:
return response.json()["results"]
raise RuntimeError(f"Failed to fetch posts data: {response.text}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

bigquery.SchemaField("created_at", "TIMESTAMP", mode="REQUIRED"),
bigquery.SchemaField("message", "STRING", mode="REQUIRED"),
],
write_disposition="WRITE_APPEND",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

client = bigquery.Client(project=os.getenv("BIGQUERY_PROJECT"))
sql = """
SELECT
*
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to me that we only need created at? Querying less field can save us money

@henry410213028 henry410213028 merged commit 3c37063 into master May 25, 2024
2 checks passed
@henry410213028 henry410213028 deleted the feature/twitter-post-metrics branch May 25, 2024 08:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants