You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
writeLines(toJSON(res_tbl, pretty = TRUE), con = "../data/Star_Trek.json")
418
418
```
419
419
420
-
{fig-alt="The Movie Database logo"}
420
+
{fig-alt="The Movie Database logo" width="50%"}
421
421
In this section we'll work with some data gathered from TMDB (the movie database).
422
422
I submitted a query for all movies that Patrick Stewart was involved with, and you can find the resulting JSON file [here](https://raw.githubusercontent.com/srvanderplas/stat-computing-r-python/main/data/Patrick_Stewart.json).
<details><summary>Exploring the output structure (long version)</summary>
465
+
```{r}
455
466
# Top-level objects (show the first object in the list)
456
467
ps_messy$cast[[1]]
457
468
ps_messy$crew[[1]]
458
469
ps_messy$id
459
470
```
471
+
</details>
460
472
461
473
Let's start with the cast list. Most objects seem to be single entries; the only thing that isn't is the `genre_ids` field. So let's see whether we can just convert each list entry to a data frame, and then deal with the `genre_ids` column afterwards.
462
474
463
475
```{r, error = T}
464
476
cast_list <- ps_messy$cast
477
+
```
478
+
465
479
480
+
<details><summary>Data frame conversion</summary>
481
+
```{r}
466
482
as.data.frame(cast_list[[1]])
483
+
```
484
+
</details>
467
485
486
+
```{r}
468
487
map(cast_list, as.data.frame)
469
488
```
470
489
471
490
Well, that didn't work, but the error message at least tells us what index is causing the problem: 6. Let's look at that data:
Ok, so `backdrop_path` is `NULL`, and `as.data.frame` can't handle the fact that some fields are defined (length 1) and others are NULL (length 0). We could possibly replace the NULL with NA first?
478
499
@@ -482,11 +503,12 @@ fix_nulls <- function(x) {
482
503
}
483
504
484
505
cast_list_fix <- map(cast_list, fix_nulls)
485
-
cast_list_fix[[6]]
506
+
507
+
cast_list_fix[[6]][1:5]
486
508
487
509
map(cast_list_fix, as.data.frame)
488
510
489
-
cast_list_fix[[8]]
511
+
cast_list_fix[[8]][1:5]
490
512
```
491
513
492
514
Ok, well, this time, we have an issue with position 8, and we have an empty list of genre_ids.
{fig-alt="The Movie Database logo"}
623
+
{fig-alt="The Movie Database logo" width="50%"}
599
624
600
625
I used TMDB to find all movies resulting from the query "Star Trek" and stored the resulting JSON file [here](https://raw.githubusercontent.com/srvanderplas/stat-computing-r-python/main/data/Star_Trek.json).
0 commit comments