TTDS6000: Data Science Overview | Technologies, Tools & Modern Roles in the Data-Driven Enterprise

Become an EPIC Affiliate

To view the class schedule you need to become an Affiliate

  • Largest “Guaranteed To Run” public technical training schedules available
  • Easy to become an Affiliate – no charge or fee
Become an EPIC Affiliate

already an Affiliate?  Login

About this Course

The Data Science & Big Data Overview | Tools, Tech & Modern Roles in the Data-Driven Enterprise is an introductory level course that introduces the entire multi-disciplinary Data Science team to the many evolving and related terms, with focus on Big Data, Data Science, Predictive Analytics, Artificial Intelligence, Data Mining, Data Warehousing. The overview explores the current state of the art and science, the major components of a modern data science infrastructure, team roles and responsibilities, and level-setting realistic possible outcomes for your investment. This goal of this course is to provide students with a baseline understanding of core concepts that can serve as a platform of knowledge to follow up with more in-depth training and real-world practice.

Audience Profile

This introductory-level / primer course is an overview for Business Analysts, Data Analysts, Data Architects, DBAs, Network (Grid) Administrators, Developers or anyone else in the data science realm who need to have a baseline understanding of some of the core areas of modern Data Science technologies, practices and available tools.

At Course Completion

This course provides a high-level view of a variety of core, current data science related technologies, strategies, skillsets, initiatives and supporting tools in common business enterprise practices. This list covers a general range of topics current to the time of course distribution.

Students will explore:

· The Hadoop Ecosystem: HDFS; Resource Navigators, MapReduce, Spark, Distributions

· Big Data, NOSQL, and ETL

· ETL: Exchange, Transform, Load

· Handling Data & a Survey of Useful tools

· Enterprise Integration Patterns and Message Busses

· Developing in Hadoop Ecosystem: R, Python, Java, Scala, Pig, and BPMN

· Artificial Intelligence and Business Systems

· Who’s on the Team? Evolving Roles and Functions in Data Science

· Growing your Infrastructure

Outline

1. Exploring the Hadoop Ecosystem

· HDFS: Hadoop Distributed File System

· Resource Negotiators: YARN, Mesos, and Spark; ZooKeeper

· Hadoop Map/Reduce

· Spark

· Hadoop Ecosystem Distributions: Cloudera, Hortonworks, OpenSource

2. Artificial Intelligence and Business Systems

· Artificial Intelligence: Myths, Legends, and Reality

· The Math

· Statistics

· Probability

· Clustering Algorithms, Mahout, MLLib, SciKit, and Madlib

· Business Rule Systems: Drools, JRules, Pegasus

3. The Modern Data Team

· Agile Data Science

· NOSQL Data Architects and Administrators

· Developers

· Grid Administrators

· Business and Data Analysts

· Management

· Evolving your Team

· Growing your Infrastructure

4. Supervised Learning with Big Data

· Demo

· Exploratory Data Analysis (Demo – Credit Card Risk Model)

· Wrangling and Cleaning Data

· Building a Supervised Machine Learning Model with MyBinder and JupyterLab

5. Deep Learning with Big Data

· Exploratory Data Analysis

· Data Cleaning and Wrangling

· Build a Deep Learning Model

· Demo with TensorFlow

Prerequisites

Attendees should have prior exposure to Enterprise Information Technology, as well as familiarity with Relational Databases.