Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Archive import appears to be broken on recent exports #54

Open
jacobian opened this issue Jan 5, 2021 · 5 comments
Open

Archive import appears to be broken on recent exports #54

jacobian opened this issue Jan 5, 2021 · 5 comments

Comments

@jacobian
Copy link
Contributor

jacobian commented Jan 5, 2021

I requested a Twitter export yesterday, and unfortunately they seem to have changed it such that twitter-to-sqlite import can't handle it anymore 😢

So far I've ran into two issues. The first was easy to work around, but the second will take more investigation. If I can find the time I'll keep working on it and update this issue accordingly.

The issues (so far):

1. Data seems to have moved to a data/ subdirectory

Running twitter-to-sqlite import on the raw zip file reports a bunch of "not yet implemented" errors, and then exits without actually importing anything:

❯ twitter-to-sqlite import tarchive.db twitter.zip
...
data/manifest: not yet implemented
data/account-creation-ip: not yet implemented
data/account-suspension: not yet implemented
... (dozens of more lines like this, including critical stuff like data/tweets) ...

(tarchive.db now exists, but is empty)

Workaround: unpack the zip file, and run twitter-to-sqlite import tarchive.db path/to/archive/data

That gets further, but:

2. Some schema(s?) have changed

At least, the blocks schema seems different now:

❯ twitter-to-sqlite import tarchive.db archive/data
direct-messages-group: not yet implemented
branch-links: not yet implemented
periscope-expired-broadcasts: not yet implemented
direct-messages: not yet implemented
mute: not yet implemented
Traceback (most recent call last):
  File "/Users/jacob/Library/Caches/pypoetry/virtualenvs/jacobian-dogsheep-4AXaN4tu-py3.8/bin/twitter-to-sqlite", line 8, in <module>
    sys.exit(cli())
  File "/Users/jacob/Library/Caches/pypoetry/virtualenvs/jacobian-dogsheep-4AXaN4tu-py3.8/lib/python3.8/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/Users/jacob/Library/Caches/pypoetry/virtualenvs/jacobian-dogsheep-4AXaN4tu-py3.8/lib/python3.8/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/Users/jacob/Library/Caches/pypoetry/virtualenvs/jacobian-dogsheep-4AXaN4tu-py3.8/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/jacob/Library/Caches/pypoetry/virtualenvs/jacobian-dogsheep-4AXaN4tu-py3.8/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/jacob/Library/Caches/pypoetry/virtualenvs/jacobian-dogsheep-4AXaN4tu-py3.8/lib/python3.8/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/Users/jacob/Library/Caches/pypoetry/virtualenvs/jacobian-dogsheep-4AXaN4tu-py3.8/lib/python3.8/site-packages/twitter_to_sqlite/cli.py", line 772, in import_
    archive.import_from_file(db, filepath.name, open(filepath, "rb").read())
  File "/Users/jacob/Library/Caches/pypoetry/virtualenvs/jacobian-dogsheep-4AXaN4tu-py3.8/lib/python3.8/site-packages/twitter_to_sqlite/archive.py", line 215, in import_from_file
    to_insert = transformer(data)
  File "/Users/jacob/Library/Caches/pypoetry/virtualenvs/jacobian-dogsheep-4AXaN4tu-py3.8/lib/python3.8/site-packages/twitter_to_sqlite/archive.py", line 115, in lists_member
    return {"lists-member": _list_from_common(data)}
  File "/Users/jacob/Library/Caches/pypoetry/virtualenvs/jacobian-dogsheep-4AXaN4tu-py3.8/lib/python3.8/site-packages/twitter_to_sqlite/archive.py", line 200, in _list_from_common
    for url in block["userListInfo"]["urls"]:
KeyError: 'urls'

That's as far as I got before I needed to work on something else. I'll report back if I get further!

@jacobian
Copy link
Contributor Author

jacobian commented Jan 5, 2021

Correction: the failure is on lists-member.js (I was thrown by the block variable name, but that's just a coincidence)

jacobian added a commit to jacobian/twitter-to-sqlite that referenced this issue Jan 5, 2021
See dogsheep#54 for discussion. This also ignores files in the new "assets"
directory, which appear to be some stuff for a browser interface
Twitter's created.
@jacobian
Copy link
Contributor Author

jacobian commented Jan 5, 2021

I was able to fix this, at least enough to get my archive to import. Not sure if there's more work to be done here or not.

@henry501
Copy link

My import got much further with the applied fixes than 0.21.3, but not 100%. I do appear to have all of the tweets imported at least.
Not sure when I'll have a chance to look further to try to fix or see what didn't make it into the import.

Here's my output:

direct-messages-group: not yet implemented
branch-links: not yet implemented
periscope-expired-broadcasts: not yet implemented
direct-messages: not yet implemented
mute: not yet implemented
periscope-comments-made-by-user: not yet implemented
periscope-ban-information: not yet implemented
periscope-profile-description: not yet implemented
screen-name-change: not yet implemented
manifest: not yet implemented
fleet: not yet implemented
user-link-clicks: not yet implemented
periscope-broadcast-metadata: not yet implemented
contact: not yet implemented
fleet-mute: not yet implemented
device-token: not yet implemented
protected-history: not yet implemented
direct-message-mute: not yet implemented
Traceback (most recent call last):
  File "/Users/henry/.local/share/virtualenvs/python-sqlite-testing-mF3G2xKl/bin/twitter-to-sqlite", line 33, in <module>
    sys.exit(load_entry_point('twitter-to-sqlite==0.21.3', 'console_scripts', 'twitter-to-sqlite')())
  File "/Users/henry/.local/share/virtualenvs/python-sqlite-testing-mF3G2xKl/lib/python3.9/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/Users/henry/.local/share/virtualenvs/python-sqlite-testing-mF3G2xKl/lib/python3.9/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/Users/henry/.local/share/virtualenvs/python-sqlite-testing-mF3G2xKl/lib/python3.9/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/henry/.local/share/virtualenvs/python-sqlite-testing-mF3G2xKl/lib/python3.9/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/henry/.local/share/virtualenvs/python-sqlite-testing-mF3G2xKl/lib/python3.9/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/Users/henry/.local/share/virtualenvs/python-sqlite-testing-mF3G2xKl/lib/python3.9/site-packages/twitter_to_sqlite/cli.py", line 772, in import_
    archive.import_from_file(db, filepath.name, open(filepath, "rb").read())
  File "/Users/henry/.local/share/virtualenvs/python-sqlite-testing-mF3G2xKl/lib/python3.9/site-packages/twitter_to_sqlite/archive.py", line 233, in import_from_file
    to_insert = transformer(data)
  File "/Users/henry/.local/share/virtualenvs/python-sqlite-testing-mF3G2xKl/lib/python3.9/site-packages/twitter_to_sqlite/archive.py", line 21, in callback
    return {filename: [fn(item) for item in data]}
  File "/Users/henry/.local/share/virtualenvs/python-sqlite-testing-mF3G2xKl/lib/python3.9/site-packages/twitter_to_sqlite/archive.py", line 21, in <listcomp>
    return {filename: [fn(item) for item in data]}
  File "/Users/henry/.local/share/virtualenvs/python-sqlite-testing-mF3G2xKl/lib/python3.9/site-packages/twitter_to_sqlite/archive.py", line 88, in ageinfo
    return item["ageMeta"]["ageInfo"]
KeyError: 'ageInfo'

simonw pushed a commit that referenced this issue Aug 20, 2021
* Find data files in subdirectories in archives

See #54 for discussion. This also ignores files in the new "assets"
directory, which appear to be some stuff for a browser interface
Twitter's created.

* Fix list-member importer

It appears in list data that some rows contain a `urls` key with a list
of URLs, while others contain a `url` key with just a single one. This
change supports either way.

* Fix tweet import

This was working, sorta, but wasn't properly unpacking the tweet
data into columns. This commit fixes that in what I think should
be a backwards-compatible way.
@danp
Copy link

danp commented Sep 26, 2021

Similar trouble with ageinfo using 0.22. Here's what my ageinfo.js file looks like:

window.YTD.ageinfo.part0 = [
  {
    "ageMeta" : { }
  }
]

Commenting out the registration for ageinfo in archive.py gets my archive to import.

@swyxio
Copy link

swyxio commented Jan 4, 2023

as of 2023 it appears that tweets: not yet implemented is happening.. pretty important for a twitter export functionality!

this seems an impossible task with twitter liable to switch this around every other day

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants