Introducing 

Prezi AI.

Your new presentation assistant.

Refine, enhance, and tailor your content, source relevant images, and edit visuals quicker than ever before.

Loading content…
Loading…
Transcript

Principles of Machine Learning

Ivan Moreno / Equity American School

What is Machine Learning?

Machine learning is a field of artificial intelligence that uses statistical techniques to give computer systems the ability to "learn" from data.

Simplified Process

Unsupervised Learning

Simplified Process

- Acquire and ingest data

- Explore data

- Prepare data

- Prepare data

- Feature engineering

- Split dataset

- Construct and evaluate machine learning model

- Repeat above steps if needed

Exploring data

First, you will look for the general characteristics in the data such as

  • How big is it?
  • What data types are there?
  • How many features are there?
  • What kind of label is there?

Exploring Data

Visualization

Visualization

The most powerful technique for data exploration. First tool we go to is something called aesthetics. We use them for what aspects of statistical or scientific graphics people understand the best.

* Position: categorical and numerical values

* Length: numeric and categorical value(if there is an order)

* Shape: Generally for categorical variables

* Size: Generally for numeric variables

* Color: Generally for categorical variables but can also be used for numeric

Frequency Tables

Frequency Tables

Gain knowledge about

  • Distribution
  • The frequency
  • Find duplicate irrelevant categories
  • Labels
  • Classifier
  • Understanding the balance of categories

Data Preparation

Cleaning and Preparing Data

This is vital to succeed at machine learning. There are certain steps to follow for the data preparation method. It is an iterative process. Used to identify problems and test your results.

Steps

Data Preparation Steps

  • * Exploration to understand data problems

  • * Remove duplicates

  • * Removal Strategies

  • * Treat missing values

  • * Treat errors and outliers

  • * Scale features

  • * Split dataset

  • * Visualization to check results

  • Just as visualization is necessary to understand the relationships in data, proper preparation or data munging is required to ensure machine learning models work optimally.

Supervised Learning

Supervised learning

Supervised Learning is a type of system in which both input and desired output data are provided. Input and output data are labelled for classification to provide a learning basis for future data processing.

Supervised machine learning systems provide the learning algorithms with known quantities to support future judgments

Unsupervised Learning

Unsupervised learning is the training of an artificial intelligence (AI) algorithm using information that is neither classified nor labeled and allowing the algorithm to act on that information without guidance.

Unsupervised learning algorithms can perform more complex processing tasks than supervised learning systems.

Basic Scikit Learn

a package that provides efficient versions of a large number of common algorithms. Scikit-Learn is characterized by a clean, uniform, and streamlined API, as well as by very useful and complete online documentation.

Basic Scikit Learning

Classification & Regression

Classification and Regression Trees or CART for short is a term introduced by Leo Breiman to refer to Decision Tree algorithms that can be used for classification or regression predictive modeling problems.

Classification & Regression

Classification

Classification

In machine learning and statistics, classification is the problem of identifying to which of a set of categories (sub-populations) a new observation belongs, on the basis of a training set of data containing observations (or instances) whose category membership is known.

Regression

Regression

In statistical modeling, regression analysis is a statistical process for estimating the relationships among variables. It includes many techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables

Learn more about creating dynamic, engaging presentations with Prezi