-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Advisories
Update 6 Feb 2014: This issue is now resolved. See this commit for details.
Late last week, a reader reported an issue with Twitter's Streaming API as illustrated in Example 9-8 with the symptom that no data was ever returned from the stream. As it turns out, this issue was related to a recent breaking API change imposed by Twitter that's related to Twitter now requiring HTTPS for all API traffic.
As a result of this breaking API change, the twitter
Python package that's used throughout the examples needs updated in order for Example 9-8 to work again. However, all other examples that don't make use of the Streaming API should be unaffected and work fine in the meanwhile.
You can read more about the situation along with a fine workaround here with the basic workaround being the following simple steps:
-
First, restart your notebook's kernel to make sure that you have a clean working session. From the menubar, click "Kernel" => "Restart"
-
Next, create a new notebook cell with the following "Bash magic" contents, and execute it:
%%bash
sudo pip install --upgrade git+https://github.com/adonoho/twitter.git@db751260f733ed5f7833dc4b186717934ff18ebe#egg=adonoho-pr-fix-stream-db751260f733ed5f7833dc4b186717934ff18ebe
Assuming the standard output you see from executing that cell doesn't contain any kind of error (and it shouldn't), you should now have a twitter
package installed that is capable of properly interfacing with the Streaming API by adopting the patch from https://github.com/sixohsix/twitter/pull/196.
Once a final patch is adopted by the twitter
package, I'll update the pip requirements file (mtsw2e-requirements.txt) to reflect the change so that Example 9-8 will "just work" again without any additional intervention, and I'll also update this advisory with strikethrough to reflect its final resolution.
Thanks for your patience, and I apologize if this issue has inconvenienced you in any way.
Earlier in the week, I approved the final manuscript of Mining the Social Web, 2nd Edition! The O'Reilly catalog, Amazon, and other retailers should soon show the 2nd Edition as being "in stock," and I hope to more officially launch the book at Strata/Hadoop World in NYC the last week of October. I hope you can drop by the O'Reilly Media booth so that I can sign your copy if you are able to attend.
To keep the repository in good order, I've gone ahead and tagged the source code that appears in the first printing of the 2nd Edition as "v2.0" here in the GitHub repository for reference where the major number "2" refers to the 2nd Edition and the minor number "0" refers to the initial printing (After all, programmers start counting with the number "0", right?) In subsequent updates to the text of the manuscript, I'll increment the minor number accordingly to keep a correspondence between code that appeared "in print" and tags here in the repository.
I've also done a little bit of extra work this evening to add a couple of enhancements to the source code repository since I want nothing less than the best for my readers:
-
HTML web pages for each of the IPython Notebooks are now checked-in at the
ipynb/html
location and will be maintained/updated as the source code for the corresponding .ipynb file updates. (These HTML files are the same as what the IPython Notebook Viewer produces online, but no longer have to rely on it as a third-party service as a dependency and can instead be served directly through GitHub. See the links in the main README.md for links to each of the web pages as served by GitHub.) -
All numbered examples in IPython Notebooks can now be anchor-linked now that example titles are converted to "header" cells as opposed to being "markdown" cells. Previously, the best that we could do was to refer to a particular example number in a notebook and link to the page. Now, however, we can refer to a particular example and link directly to it. To illustrate how this works, click on the link for Example 9-3. From within the source code repository, you just navigate to the HTML file, view the "raw" source, and then remove the first "." in the URL so that the domain is "rawgithub" as opposed to "raw.github". From there, you just click on the corresponding heading to get the link to the particular example of interest.
-
All numbered examples are available at Numbered Examples as convenient catalog page.
Thoughts and comments are always welcome.
Enjoy!
This advisory rectifies issues reflected in the (2) advisories from 7 August and includes important information about some 3rd party dependencies that have been upgraded.
As of 13 August at approximately 11pm (U.S. Central Time, GMT-5) the following changes were pushed to GitHub:
- The Vagrant Berkshelf plugin that was previously required was removed, which simplifies VM installation. Installation of the virtual machine is now one step simpler for everyone since there is no longer a need to install the Vagrant Berkshelf dependency. Mac OS X users no longer have to worry about installing gcc and developer tools, which was never intended to be the case in the first place.
- Users who are working on 32-bit systems with the precise32 image should no longer be affected by the 7 August advisory now that a newer version of
jpype
(hosted asJPype1
at PyPI) is being used. - IPython Notebook was upgraded to v1.0.0, which provides a better UX and many enhancements
- The python-boilerpipe and jpype dependencies were updated to be installed from PyPI instead of from forked GitHub repositories, which simplifies behind the scenes configuration management. (Thanks especially to @marsam, @misja, and @originell for helping to make this happen. In the configuration for the virtual machine, it's now as simple as
pip install boilerpipe
to install this package and it'sjpype
dependency.) - A couple of other minor updates to IPython Notebooks also took place involving code tweaks
In order to take advantage of these changes, existing virtual machine users are encouraged to perform the following steps:
-
vagrant destroy
the existing virtual machine -
git pull
to update the working code repository with the latest changes -
vagrant up
to bootstrap the new virtual machine
I realize that asking you to vagrant destroy
and vagrant up
is annoying, and my hope is that this is the last major update to the virtual machine before Mining the Social Web, 2nd Ed. is published. It was important to get in these enhancements to make the virtual machine the best possible experience for everyone. Thanks for your patience and understanding as the code approaches its official launch in tandem with the official release of the book.
As always, please log an issue here on GitHub or reach out via Twitter/Facebook with any questions or concerns.
This advisory was obsoleted by the 13 August 2013 advisory
It appears that the precise32 image that users with 32-bit systems should use for their Vagrant base image installs Java in a somewhat unexpected location, so the installation of jpype
(one of the 3rd party dependencies that is handled by the virtual machine) currently fails.
The workaround is rather simple, but needs to be tested before it is rolled out so that no regressions occur. If you are a user with a 32-bit system and need support before a fix is in place (estimated turnaround time on the fix is not later than 11 August), please reach out on Twitter, Facebook, or here on GitHub.
Hopefully, this affects a relatively small number of users. A update will be added to this advisory reflecting that it is no longer an issue once the problem is fixed in the GitHub repository.
This advisory was obsoleted by the 13 August 2013 advisory
In order for Mac OS X users to complete the vagrant plugin install vagrant-berkshelf
step of the setup process, it turns out you need to have a developer tool that compiles C-based source code called gcc
on your machine. The reason is because vagrant-berkshelf
has dependencies that require C code to be compiled behind the scenes.
If you are a developer and have XCode or Homebrew installed, there is a good chance that you may already have gcc
on your machine and need to do nothing, or you might be technically inclined to go about acquiring it in your own preferred way. However, if you use a Mac and are not a developer and haven't installed developer tools such as the command line tools that come with XCode or a gcc toolchain from Homebrew, you will experience an error in trying to install vagrant-berkshelf
, which prevents you from ever completing a successful vagrant up
.
Fortunately, the fix is simple, although it does involve the one extra step of getting gcc
on your machine. By far, the simplest route is by installing it from the osx-gcc-installer. Just download the package file, follow along with the wizard, and then complete the installation of your vagrant-berkshelf
plugin. That's all.
However, be sure that don't already have gcc
installed before installing this additional package. You'll know that you don't have it because your attempt to install the plugin will have failed, and you'll have gotten directed to this advisory, or because you'll have proactively typed gcc
in a terminal before attempting any installation and get a result back to the effect of "gcc: fatal error: no input files", which indicates that you do have gcc on your system
Other options for getting the developer tools involve a little bit of work that you can read about here.
This advisory does not affect Linux or Windows users. If you are a Linux developer, then it is almost guaranteed that you have gcc
(or know how to get it), and it turns out that Vagrant for Windows ships with an embedded version of gcc
.
An annotation is added to the YouTube video and the Appendix A instructions are updated to reflect this advisory.
As always, reach out on Twitter, Facebook, or here on GitHub if you need help.
A significant pull request was merged in on July 26, 2013 around 9pm (CST) that provides an enhancement to the way the book's Vagrant-based virtual machine is configured. If you were an adopter of the book's virtual machine on or before 26 July 2013, you will probably want to do complete the following steps to get in a good spot for moving forward with the code. If you pulled the code after July 26, this advisory will not affect you.
- Save any work in IPython Notebook that you don't want to lose by copying the ipynb file to another location (or by using
git stash
) -
vagrant destroy
- Kill the existing virtual machine -
git pull
- Update your repository's code -
vagrant plugin install vagrant-berkshelf
- Install a required Vagrant plugin -
vagrant up
- Re-bootstrap your virtual machine with Chef-based configuration management. The first bootstrap takes ~20 minutes, which is significantly faster than the previous approach.
Update on 29 July: Thanks to David Rush (@DDucks) for pointing out that you should do the vagrant destory
before doing the git pull
.
Enjoy!
In completing a final review of the manuscript and doing another round of testing on the code, I realized that the time that it takes for Vagrant to perform synchronization of the thousands of files that are unarchived from the Enron corpus can take a very long time -- potentially upwards of 2 hours on some systems. Although preprocessing the original data is a very worthwhile exercise, a 2 hour delay is far from acceptable for users who are taking advantage of the virtual machine to streamline learning.
As a reasonable workaround, I checked in a highly compressed version of the output from Example 6.3 so that readers who are either not interested in the details of preprocessing the original corpus or just don't have the time to wait can proceed throughout the real substance of the chapter without additional delays. The IPython Notebook for Chapter 6 has been updated with notes that clearly explain the steps involved.
Bottom line: You can opt to downloading the original Enron corpus and you won't need to execute Examples 6.2 or 6.3 if you'd like to opt-out of the potentially time consuming preprocessing.
Enjoy!