This is a collection of material from the course IGERT Data and Network Science bootcamp, held at UCSB from 9/8/2015 to 9/18/2015. The goal of the course is to introduce students of the IGERT program to data science. The data science methods and concepts presented will mainly be oriented to analyzing networks and networks science in general.
- Please provide a general outline in the syllabus below filling in the part pertaining to your session, following the example for Day 1.
- Please add any relevant material to the folder that has been set up for your module (and linked from the syllabus)
- Try to be compliant to the directory structure of Day 1 as much as possible.
- Provide a detailed goal description and outline in the Readme.md file in the module folder following the example for Day 1
To get started follow the intructions here: Course setup
- Unix Basics
- How to open and use the terminal
- How to connect to Unix servers (ssh)
- Text manipulation and command-line magic
- Git
- The importance of version control
- Github, reproducibility and the scientific method
- Python and Jupyter notebooks
- Introduction to Python
- Jupyter (née IPython) notebooks
- Will be used throughout rest of the bootcamp
- What is data? Data representation in a computer
- Native data types in Python: integer, list, dict, numpy arrays, pandas
- From simple to complex: text, time series, networks, geometric objects
- Discuss complexity of manipulation of these objects.
- Load and visualize different datasets on Python.
- Small data, big data. Do you really have big data?
- Storage latencies. cache/ram/SSD/Redis/s3
- Computation engines: single core, multi-core, memory distributed, disk distributed. Pandas, numpy multi-core extensions
- Examples: single-machine SSD-backed operation. Caveats (sequential access needed)
- Introduction to Visualization -visual variables, design, types etc.
- Visualization of Social Media Data (Demos and Techniques)
- Visualizing Live Twitter Data with Python and D3JS (Hands-on Project)
- Visual Analytics
- Visualizing live feeds using Python and Plot.ly (Hands-on)
- Interactive Visualization (Demo)
- Visualization in Academia and Industry
- Anatomy of a research paper in Visualization
-
Review of Linear Algebra's Fundamentals
- Matrix arithmetic
- Inversion and Linear Systems
- Vector spaces
- Angles, lengths, projection
- Eigenproblem, SVD
-
Linear Algebra and Graphs
- Graphs: definitions, properties, representation
- Spectral graph theory
- Computational thinking and algorithm complexity
- Basic data structures: arrays, lists, balanced binary trees (sets), hash tables (dicts);
- Use of data structures in sort and searching.
- Introduction to NP hard problems
- Graph definitions (directed, undirected, weighted, unweighted, trees, cycles, bipartite, complete) etc…
- Graph representations (adjacency matrix, adjacency list); pros and cons;
- Graph generation: Erdos-Renyi model
- Simple graph definition in Python NetworkX
- Algorithms on Graphs.
- Introduction to special classes of graphs
- Demonstration of a few algorithms above in networkX
- Examples in NetworkX on real networks (social, brain)
- Preferential attachment
- Small-world networks
- Hands-on
- Generating and characterizing several graphs (both synthetic and real)
- Counting triangles
- Visualization using GraphViz
- Introduction to dynamics
- Why is it important to study dynamics on networks?
- First order dynamics - flows on a line
- Linear vs nonlinear dynamics
- Stability analysis -an intuitive explanation
- Lyapunov equations - an intuitive explanation
- Introduction to second order dynamics
- Eigenvalues and eigenvectors, stability
- Introduction to bifurcations and hysteresis
- Basic probability and combinatorics.
- Bernoulli trials. Expectation. Variance. Tail bounds.
- Significance and p-values
- Regression, controlling, example in R
-
Supervised learning
- Decision Tree and Random Forest
- Linear Regression and Support Vector Machine
- Logistic Regression and Neural Network
-
Unsupervised learning
- k-Means, k-Medoids, and Hierarchical Clustering
- Mixture Modeling
- Classification on graphs
- Community detection
- Frequent patterns
- Fun with scikit-learn: two end-to-end examples of supervised and unsupervised learning
- Introduction to the concepts of Deep Learning