This repository provides template code for running Spark on SOPHI. The code will include a mixture of Scala, PySpark and SparkR.
| Topics |
|---|
| Twitter Gnip SQL-DataFrame Manipulation with PySpark |
| Twitter Gnip Summary Count Files with PySpark |
| Twitter Gnip Latent Dirichlet Allocation with Scala |
To access SOPHI, you must have an active UNCC ID username (student, faculty or staff) and be connected to the UNCC network either directly (edu-roam) or through VPN. See this link on how to set up VPN access.
This link (https://cci-hadoopm3.uncc.edu) provides access to SOPHI's Hue Interface.
To start, click this link and then when prompted, enter your UNCC ID and password.
Within SOPHI, click the "Notebook" button on the top ribbon and click the "+ Notebook" button to create a new Notebook.
Once within a new Notebook, create a PySpark, Scala or SparkR (not available yet) session.