Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question regarding the time zones in the review.json file #55

Open
jeroen-bos opened this issue Jul 9, 2020 · 1 comment
Open

Question regarding the time zones in the review.json file #55

jeroen-bos opened this issue Jul 9, 2020 · 1 comment

Comments

@jeroen-bos
Copy link

I have a question. I’m currently writing my master thesis on on the relationship between the time of posting an online review and the belonging rating, length and emotional tone of an online review.

In order to do a proper data analysis on this I want to use the Yelp Academic Dataset (2020), containing the following variables: https://www.yelp.com/dataset/documentation/main In the variable ‘date’ there is information on the date obviously, but also on the time of writing the review I suppose (this is not specifically mentioned), as the format is in: "2016-08-29 00:41:13” when you download it? My main question on this would be, in what timezone did Yelp record (or convert) all this DateTime (e.g."2016-08-29 00:41:13”) information of the variable ‘date’ in the academic dataset of Yelp as part of the ‘review.json’ information? Since I need to convert all timezones back to the original location, it is quite important that I have this piece of information right. I saw that the Yelp API uses PST as time zone as this is specifically mentioned: "The time that the review was created in PST." However, this is not mentioned in the aforementioned documentation of the Yelp dataset (see aforementioned link), so that is why I’m asking. Maybe Yelp has converted the DateTime notation of the Yelp academic dataset into another format, so I want to validate this piece of information. Could someone please help me out? Thank you very much in advance.

Link to the data in 2020: https://www.yelp.com/dataset

@JennyZhou395
Copy link

Hey, I wonder if your question is solved, as I have the same question myself. The study I'm doing also relies heavily on the accuracy of the time when the review is posted. At first, I assumed the time is in PST, but after I convert it into EST, a big portion of the review is posted between 0 to 2 am, which does not seem right.

Then I tried to match the reviews in the dataset with the reviews on the Yelp website. The Yelp website only shows the date of the review, but I find that the review written between 7 to 8 am as shown in the dataset will be assigned to the previous day during winter, and the same day during summer. And the review before 7 am in the dataset will be shown as written in the previous day in Yelp website. This is true in three states I tested: PA, NV, and AZ. The Yelp website should show the reviews as in the real local time (?), so my guess is that the time in database - 8 = real local time in winter, and time in database -7 = real local time in summer. But I really need some confirmation about this question. I wish someone could answer this question, as it is very critical for the analyzing the data in real time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants