Detail kurzu

ML0001 - Practical Data Science with Cortana Intelligence: Azure Machine Learning, SQL Data Mining and Microsoft R

EDU Trainings s.r.o.

Popis kurzu

There is much readily available information about algorithms, deep-learning frameworks, or stastistical software packages, but how do you put it all together to solve a real-world problem with data science? This 5-day course will teach you about the tools you need, but above it all, it will also carefully explain the working methods and processes that successful data scientists use. Not only will you know the algorithms, but you will also know how—and when—to start and finish your projects, or which ones are likely to succeed but only with significant extra effort.

  • You will learn machine learning, data mining, some statistics, data preparation, and how to interpret the results.
  • You will see how to formulate business questions in terms of data science hypotheses and experiments, and how to prepare inputs to answer those questions.
  • We will cover common issues and mistakes, how to resolve them, like overtraining, and how to cope with rare events, such as fraud.

At the end of this course you will be able to plan and run data science projects.

Obsah kurzu

Module 1: Data Science Fundamentals

  • Introduction to data science and its components
  • Machine learning vs data mining vs artificial intelligence
  • Tools landscape
  • Statistics
  • Big data
  • Data wrangling
  • Teamwork

Module 2.1: Tools (SQL & R)

  • Getting started with and using SSAS DM, and SQL R
  • Structures, models, data flows
  • Configuration concerns and pricing
  • Using Rattle with R and RStudio
  • Using SQL Server 2016 R Server and Services
  • Getting a feel for the data: interpreting notched boxplots in R

Module 2.2: Tools (R & Azure ML)

  • Overview of Cortana Intelligence Suite
  • Getting started with and using Azure ML and Cortana R
  • Azure requirements and dependencies
  • Provisioning workspaces
  • Uploading and connecting to SQL Azure data
  • Creating and running Azure ML experiments (programs)
  • Embedding R in Azure ML

Module 3: Data

  • Inputs and outputs, features and labels
  • Data formats, discretization vs continuous
  • Cases, observations, signatures
  • Feature engineering
  • Azure ML data preparation and manipulation modules
  • Preparing unstructured text for text analysis
  • Feature hashing
  • Moving data around and its storage
  • Briefly: other Cortana Intelligence Suite tools for data management and storage, including data lakes, BLOBs, and other Hadoop

Module 4: Process

  • Stating business question in data science term
  • CRISP-DM
  • Scientific method of reasoning
  • Hypothesis testing and experiments
  • Student’s t-test
  • Pearson chi-squared test
  • Iterative hypothesis refinement

Module 5: Algorithms

  • What does data mining do?
  • Algorithm classes in Azure ML, R, and SSAS
  • Supervised vs Unsupervised learning
  • Classifiers
  • Clustering
  • Regression
  • Similarity Matching
  • Recommenders

Module 6: Clustering, Segmentation, and Anomaly Detection and Prediction

  • Introduction to segmentation
  • Clustering algorithms (k-means, EM, and others)
  • Interpreting clusters
  • Cluster characteristics
  • Discrimination
  • Tornado charts
  • Using clustering for text analysis
  • Anomaly detection with clustering, PCA and SVMs

Module 7: Classification

  • Introduction to classifiers
  • Two-class (binary) vs multi-class
  • Decision trees, forests, and boosting
  • Decision jungles
  • Neural networks and logistic regression
  • Overfitting (overtraining) concerns
  • Using classifiers for text analysis
  • Associative decision trees

Module 8: Basic Statistics

  • Basic concepts of statistics: population vs sample, measure types, means and dispersion, distributions
  • Confidence intervals, p-values
  • Correlation
  • Descriptive statistics with R
  • Basic concepts of probability
  • Finding important features using p-values, linear regression and ANOVA

Module 9: Model Validation

  • Testing accuracy
  • Lift charts
  • Testing reliability
  • Testing usefulness

Module 10: Classifier Precision

  • Testing classifiers
  • False positives vs. false negatives
  • Classification (confusion) matrix
  • Precision
  • Recall
  • Balancing precision with recall vs business goals and constraints
  • Charting precision-recall (sensitivity-specificity)
  • ROC curves
  • Other measures of accuracy
  • Cross-validation
  • Optimising binary classifier thresholds for a known business goal of prediction quality
  • Refining models to improve accuracy and reliability
  • Hyperparameter tuning
  • Class imbalance problem (fraud analytics and rare event prediction)

Module 11: Regressions

  • Introduction to simple regressions
  • Linear regression (classic)
  • Regression decision trees and other ensemble regression algorithms
  • Relationship to ANOVA
  • Measuring linear regression quality (R-squared, predictor p-values, RMSE, MAE, RAE, RSE, and additional testing using R)

Module 12: Similarity Matching & Recommenders

  • Introduction to recommender concepts
  • Model-based, similarity-based, and hybrid recommenders
  • Association rules
  • Understanding itemsets and rules
  • Rule importance vs. rule probability
  • Data structures for association rules
  • Market Basket Analysis
  • Collaborative filtering
  • Matchbox recommenders
  • Validating recommenders

Module 13: Other Algorithms (Brief Overview)

  • Sequence clustering and Markov chains
  • SVM (Support Vector Machines)
  • Time series
  • Image recognition
  • Text analysis

Module 14: Production & Model Maintenance

  • Deploying models to production
  • SSAS models and DMX queries
  • Azure ML web services: preparation and publishing
  • REST APIs: request/response vs batch
  • On-going maintenance and model updates

Kontaktní osoba

Lukáš Vallo
+420 724 792 023
lukas.vallo@edutrainings.cz

Hodnocení




Organizátor