Skip to content

happyrabbit/DataScienceWorkshop2019

Repository files navigation

👉 We have updated this course and here is 2020 course: website, GitHub repo

Big Data, Data Science, and Deep Learning for Statisticians

With recent big data, data science and deep learning revolution, enterprises ranging from FORTUNE 100 to startups across the world are hungry for data scientists and machine learning scientists to bring actionable insight from the vast amount of data collected. In the past a couple of years, deep learning has gained traction in many application areas and it becomes an essential tool in data scientist’s toolbox. In this course, students will develop a clear understanding of the big data cloud platform, technical skills in data sciences and machine learning, and especially the motivation and use cases of deep learning through hands-on exercises. We will also cover the “art” part of data science and machine learning to guide participants to learn typical agile data science project flow, general pitfalls in data science and machine learning, and soft skills to effectively communicate with business stakeholders. This course will prepare statisticians to be successful data scientists and deep learning scientist in various industries and business sectors.

The big data platform, data science, and deep learning overviews are specifically designed for audience with statistics education background. The data science workflow, pitfalls and soft skills are highlight through real-world data science and machine learning problems. The Databricks community edition cloud platform will be used throughout the training course to cover hands-on sessions including:

(1) Big data platform using Spark through R sparklyr package;

(2) Introduction to Deep Neural Network, Convolutional Neural Network and Recurrent Neural Networks and their applications;

(3) Deep learning examples using TensorFlow through R keras package.

The primary audiences for this course are:

(1) Statistician in traditional industry sectors such as manufacturing, pharmaceutical and banking;

(2) Statistician in government agencies;

(3) Statistical researchers in universities;

(4) Graduate students in statistics departments. The prerequisite knowledge is MS level education in statistics and entry level of R knowledge. No software installation is needed in students’ laptop and the cloud platform is easily accessed through browsers such as Chrome or Firefox with internet connection.

Schedule

Topic Time
Introduction to Data Science 9:00 - 09:45
FFNN 9:45 - 11:00
Morning break (drinks/snacks) 11:00 - 11:15
CNN and RNN 11:15 - 12:00
Lunch 12:00 - 13:30
Deep learning hands on 13:30 - 15:00
Afternoon break (drinks/snacks) 15:00 - 15:15
Big Data Pipeline 15:15 - 15:45
Cloud Platform and Hands-on 15:45 - 16:30

Some links

Releases

No releases published

Packages

No packages published

Languages