Introducing 

Prezi AI.

Your new presentation assistant.

Refine, enhance, and tailor your content, source relevant images, and edit visuals quicker than ever before.

Loading…
Transcript

Artificial Intelligence & Machine Learning

CV Concepts

Confusion Matrix

Intro in AI

Introduction to AI

Artificial Intelligence (AI) is a technology that performs cognitive tasks that normally only humans can perform.

Artificial Intelligence leverages computers and machines to mimic the problem-solving and decision-making capabilities of the human mind

AI ALGORITHMS

Fields of artificial intelligence

There are three types of AI:

a) Computer Vision

-is a field of artificial intelligence (AI) that focuses on enabling machines to interpret and understand visual information from the world.

a) Artificial Narrow Inteligence (ANI -weak AI)

AI type that can match human performance in one specific domain

Example: image recognition, speech recognition

b) NLP (Natural Language Processing)

-is a field of artificial intelligence (AI) that focuses on enabling machines to understand, interpret, and generate human language.

  • Algorithms are clearly defined procedures that "tell" a machine how to perform a certain action
  • There are two types of algorithms : traditional and AI algorithms
  • Algorithms have limitations e.g. how to write an algorithm that understands what's inside a photo or how to write an algorithm that understands the human voice.
  • Unlike traditional algorithms, AI algorithms "tell" a machine how to learn something.
  • AI algorithms learn from large amounts of data
  • Example : Google Translate (used traditional algorithms now it uses AI algorithms)

b) Artificial General Intelligence (AGI - strong AI)

-refers to the theoretical concept of creating machines that can perform any intellectual task that a human being can do

Example: ALPHAZERO

c) Robotics

-is a field of artificial intelligence (AI) that focuses on enabling machines to move and behave like humans

c) Artifical Super Inteligence (ASI)

-refers to a theoretical form of artificial intelligence that is vastly more intelligent than human beings across all domains and tasks.

Example: it still doesn't exist

d) Speech Recognition

-is a field of artificial intelligence (AI) that focuses on enabling machines to hear sound like humans

Intro in ML

Intro to Machine Learninig

Machine Learning is a subset of AI that uses algorithms that teach machines how to learn from experience over large data sets

The output of machine learning is a model

Parts of machine learning:

  • Training data
  • ML Algorithms
  • Models

training data + ML algorithm = ML model

How ML learns from the data?

Baby

ML MODEL

Learning Methods

Learning Methods

Reinforcement Learning

Supervised Learning

Unsupervised Learning

Supervised Learning

  • It is used by ML and DL models

  • A learning method where we show the model exactly what we want it to do

  • All of these models require data that is labeled according to what we want the model to learn

Unsupervised Learning

  • Learning method where we let the machine identify patterns on its own

  • These models do not need labeled data

  • Instead of labeled data, we let the machine decide for itself what the common features are in the data it observes

  • Method that forms a pattern that tells how something will be identified

  • Example: clustering

Reinforcement Learning

  • Learning method where we give the machine a goal and let the machine figure out how to reach that goal
  • It's similar to supervised learning, we know what the output is but we don't give the machine labeled data
  • Instead of labeled data, rules are defined
  • The machine is allowed to learn through trial and error
  • It is used in the fields of robotics and gaming
  • Example: AlphaGO

Deep

Learning

Deep Learning

  • Deep learning is a subset of machine learning that involves the use of artificial neural networks with multiple layers to model and solve complex problems.

  • These networks are designed to learn and improve through experience, allowing them to identify patterns and relationships in data that would be difficult or impossible for humans to detect.

  • The term "deep" refers to the fact that these neural networks typically have multiple layers of interconnected nodes that process information and make predictions.

  • By leveraging the power of deep learning, researchers and developers are able to create intelligent systems that can perform tasks previously thought to be beyond the capabilities of machines.
  • ML is inspired by the way a human learns and Deep learning is inspired by the way the human brain works

Types of deep learning models

There are several types of deep learning models:

RNN

CNN

NN

Neural Network

Recurrent Neural Network

Convolutional Neural Network

Neural Network (ANN)

  • A neural network is a type of machine learning algorithm that is loosely inspired by the structure and function of the human brain.
  • It is composed of a network of interconnected nodes or "neurons" that process input data and generate output predictions.
  • The basic idea behind a neural network is that information is processed through layers of interconnected neurons, with each neuron processing some aspect of the input data and passing its output on to the next layer.

Architecture of ANN Deep Learning

Input Layer

  • the data enters the input layer
  • input nodes process, analyze and categorize the data, after which they pass it on to the next layer

Hidden Layer

  • A hidden layer is a layer of neurons between the input layer and the output layer.

  • Complex transformations and analyzes are applied to the input data, using weights and biases, and the results are forwarded to the next layer.

  • The results are passed to the next hidden layer until the output is reached

  • The term "hidden" refers to the fact that the layer is not directly visible to the outside world

  • Hidden layers works "behind the scenes"

Output Layer

  • The output layer in a neural network is the final layer of neurons that produces the network's output. The type of output layer depends on the task the network is performing.

  • Classification example: the output layer can have one node for each possible class, and the predicted class will be the node with the highest activation value

  • Regression example: the output layer can have one node that gives a continuous value

Components of ANN

Biases are a constant value that is added to the weighted sum of the inputs of each neuron.

Biases are initialized with small random values and then adjusted during training to minimize the loss function.

Activation functions are a key component of artificial neural networks (ANNs). They are used to introduce nonlinearity into the output of each neuron in the network, which is essential for the network to learn complex patterns in data.

Activation functions operate on the weighted sum of the inputs to a neuron plus its bias, producing a non-linear output.

Weights are numerical values that determine the strength of the connection between two neurons.

During training, the network learns the optimal values of these weights to minimize the error between the predicted output and the actual output.

How to train NN?

5. Calculate the error

1. Data Preparation

6. Update the weights and biases

3.Initialization

4. Forward propagation

2.Model Architecture

6. Backpropagation

7. Repeat steps 4-6 for multiple epochs

Convolutional Neural Network (CNN)

  • A Convolutional Neural Network (CNN) is a type of artificial neural network (ANN) commonly used for image recognition and processing.

  • CNN is mostly used for classification (insert an image as input and the output is a class)

  • The architecture consists of many different layers, which in the end, from the entrance to the network which is the image, throw out information about what is recognized in the image.

  • The following layers are of great importance:

4. Full Connection

1. Convolution

3. Flattering

2. Pooling

Architecture of CNN Deep Learning

Convolution

  • Convolution is one of the most important operations in image processing. It can be done in 1-D (eg speech processing), 2-D (eg image processing) or 3-D (video processing).
  • The basic thing in image convolution is the mask.
  • A mask is a matrix of small dimensions which, depending on its values, after applying the convolution between the mask and the image, can lead to smoothing, sharpening, edge detection, etc.

Convolution is defined by the following formula:

Masks for filtrtation:

ReLU

  • After the convolution operation, it is necessary to apply a non-linear function to increase the non-linearity of the final CNN function.

  • ReLU quite well
  • showed above all because of the speed in the training process.

  • What the ReLU layer does in the images, which are the result after applying the filter, is to delete all the black (negative values) and save the positive values, which will result in an image where the important details are much more pronounced

Img after convolution

View of the operation of the ReLU layer

Pooling

  • Poling reduces the dimensionality of each feature map but retains important information.
  • There are several different types like
  • Max Pooling,
  • Average Pooling and
  • Sum Pooling.
  • Max Pooling takes the largest element from the corrected feature map
  • Average Pooling takes the average element
  • Sum Pooling the sum of the elements

Flattering

  • Flattening is typically used to convert the output of the convolutional and pooling layers into a one-dimensional feature vector that can be fed into a fully connected layer for classification or regression.

  • The process of flattening involves taking the output of the last convolutional or pooling layer, which is typically a three-dimensional tensor, and reshaping it into a one-dimensional vector.

Full Connection

  • In a fully connected layer, each neuron is connected to every neuron in the previous layer, resulting in a fully connected graph.

  • the fully connected layer is typically placed at the end of the network after the convolutional and pooling layers. The output of the last pooling layer is flattened into a one-dimensional vector and fed into the fully connected layer.

  • The fully connected layer then performs a linear transformation on the input data, followed by an activation function, to produce the final output of the network.

Common concepts in Computer Vision

Masks

Intersection over Union (IoU)

Ground truth

Bounding boxes

Masks

Bounding boxes

Intersection over Union (IoU)

Ground truth

  • It measures the overlap between the predicted bounding box and the ground truth bounding box of an object in an image.

  • Masks are pixel-level annotations that are used to identify the boundaries of objects in an image. Masks are often used in semantic segmentation tasks, where the goal is to classify each pixel in an image according to the object or category it belongs to. Masks can also be used to generate more accurate bounding boxes or to refine the segmentation of objects.
  • Ground truth refers to the actual or true value of a particular object or attribute that is being analyzed or predicted. In computer vision, ground truth data is often used to train and evaluate models, and it provides a reference against which the model's predictions can be compared.
  • These are rectangular shapes that are drawn around an object in an image to identify its location and size. Bounding boxes are often used in object detection tasks, where the goal is to identify all instances of a particular object in an image.

If IoU is high (e.g., greater than 0.5), then the algorithm has successfully detected the cat in the image. If IoU is low (e.g., less than 0.5), then the algorithm has not successfully detected the cat in the image.

  • we first calculate the area of intersection between the two bounding boxes
  • we calculate the area of union between the two bounding boxes.
  • we calculate IoU as the ratio of the intersection area to the union area

Object Detection

  • Object detection is a computer vision technique that involves identifying and localizing objects of interest in an image or video stream. Object detection can be used for a wide range of applications, such as surveillance, autonomous vehicles, and robotics.

The region proposal network is responsible for generating a set of bounding boxes that are likely to contain objects of interest in an image. These bounding boxes are then passed to the object detection network, which evaluates each box to determine whether it contains an object and, if so, what class the object belongs to.

Instance Segmentation

  • Instance segmentation is a computer vision technique that involves identifying and localizing objects of interest in an image or video stream, while also segmenting each individual instance of the object. In other words, instance segmentation is like object detection with the added ability to distinguish between different instances of the same object.

  • Instance segmentation is a challenging task that requires the model to not only identify the object and its location in the image but also to precisely segment the object's boundaries for each instance.

Confusion Matrix

-is used to judge the performance of machine

-also known as an Error Matrix

-is a way of visualisation of prerformance of ML model

Confusion Matrix

  • In the picture we have 15 images of dogs and cats
  • Anything circled is what the AI model predicted was a dog
  • Accuracy: 60 %
  • He correctly identified 9 out of 15 pictures

Confusion Matrix

N=15

TN

TN=3

TN- We start from the left corner. With True Negative, this involves counting how many images the model predicted were not knocks that actually were not knocks.

Confusion Matrix

N=15

FP- How much did the model predict that something is a dog but in reality it is not a dog (incorrect prediction)

FP=2

FP

TN=3

Confusion Matrix

N=15

FN- represents how much the model predicted that something was not a dog when in fact it was a dog

FP=2

FP

TN=3

FN=1

FN

Confusion Matrix

N=15

TP- represents how much the model predicted that something was a dog and that it really was a dog

FP=2

FP

TN=3

TP=9

FN=1

TP

FN

Confusion Matrix

  • We concluded from the example that the strengths of the model are TP = 9 and the weaknesses FP = 2

  • The confusion matrix calculates the main metrics evaluation classifier models which are precision and recall

Precision, recall, accuracy

  • Accuracy tells us how many times the model identified dogs correctly out of all images (9 of 15, 60%)

Precision, recall, accuracy

  • Recall the total is the actual images of dogs ( 9 of 10, 90%)

Precision, recall, accuracy

  • Precision how many images were correctly identified as dogs (9 of 11, 82%)

Precision, recall formulas

Learn more about creating dynamic, engaging presentations with Prezi