DETECTION OF POTHOLES ON ROAD SURFACE
DAEN 690 PROJECT
SPRING 2020
TEAM ALPHA
Team Members & Roles
Pavithra Yandamuri
Product Owner
Sahithi Theratipally
Scrum Master
Sushmitha Reddy Sama
Developer 1
Chandini Shaik
Developer 2
ABOUT
- Accure mission is to help data driven companies to connect to all data sources, aggregate, join and transform their data, build big data analytics based on machine learning.
About
Problem Statement
- Today, poor road conditions due to potholes have been a major cause of road accidents and damage of the vehicles.
- Study shows that over the past 5 years, pothole damage has cost U.S drivers $15 billion in vehicle repairs, or around $3 billion annually.
- If the potholes are detected in real-time, drivers can be warned to take evasive action.
Project
The datasets used in this project were open source datasets from KaggleInc.
Total of 6 datasets are merged to form a large dataset of 2562 images.
About Data
The images in the dataset had either a single pothole or multiple potholes.
We had to remove the images which are large in size which are not JPEG’S.
Indian roads and images with huge potholes are removed.
Data Cleaning
Data augmentation technique is performedto crop the images.
Microsoft Vott tool is used for labeling the images in our project.
A bounding box is drawn around the image and tag is generated.
Data Labeling
The labeled images need to be exported as tensor flow records.
In order to know the average time for labeling we have recorded the time in the spreadsheet.
Labeled image
Snip of Labeled Image on Vott tool
Spreadsheet Findings
On an average each image in the dataset contained around 3 potholes approximately.
Artificial Neural Network
Neural Network
- An artificial neural network is a network or circuit of neurons used for solving artificial intelligence problems.
- The connections in neuron are represented as weights. A positive weight shows an excitatory connection, while negative values mean inhibitory connections.
- Artificial neural networks can be used for predictive analysis, adaptive control and applications that can be trained with a dataset.
Types of
Neural Network
Radial Bias Function Neural Network
Sequence-To-Sequence Models
Recurrent Neural Network (RNN)
Feedforward Neural Network
Types
Convolutional Neural Network (CNN)
CONVOLUTIONAL NEURAL NETWORKS
- Convolutional Neural Networks is most popular Deep Learning algorithm which takes an input, assign weights and biases to various objects in the input and develop a model to differentiate one object from another.
- Its main advantage is that it automatically detects the aspects of images without any human supervision.
- ConvNets are also used in Image & Video recognition, Image Analysis & Classification, Media Recreation, Recommendation Systems, Natural Language Processing, etc..
Convolutional Neural Network
PARAMETERS
Parameters
Weights:
In a neural network, a series of linear functions represented as matrices are applied to features (usually with a nonlinear joint between them). These functions are determined by the values in the matrices, referred to as weights.
Filters:
A filter is a matrix of weights with which we convolve on the input. The filter on convolution, provides a measure for how close a patch of input resembles a feature. A feature may be vertical edge or an arch, or any shape.
Biases:
A bias vector is an additional set of weights in a neural network that require no input, and this it corresponds to the output of an artificial neural network when it has zero input.
No. of weights of the
Conv Layer. = No. of biases of the Conv Layer.
Architecture
- The main role of the Convolution Network is to reduce the images into a form which is easier to process, without losing its features which are critical for getting a good prediction.
- It begins by taking the input image and coverts it into a matrix of pixel images.
ARCHITECTURE OF CNN
Architecture
Convolution
- The element involved in carrying out the convolution operation in the first part of the Convolutional Layer is called the Kernel/ Filter, K.
- The Kernel shifts n times because of the stride length, every time performing a matrix multiplication operation between K and the portion P of the image over which the kernel is hovering.
- Its main objective is to extract high level features, so it need not be limited to one Convolutional layer.
Non-Linearity (ReLU)
- An activation function is that we use to get the output node and ReLU is one type of non-linear activation function which stands for rectified linear unit.
- This function makes model to easily adapt with variety of data and differentiate between the outputs.
- Also, it improves neural network by speeding up training.
Pooling or Sub-Sampling
- The main objective of this layer is to decrease the computational power required to process the data using dimensionality reduction.
- It is also useful to extract dominant features to maintain the process of training model effectively.
- There are two types of pooling-
Max Pooling- It returns the maximum value from the kernel covered portion of the image.
Average pooling- It returns the sum of all kernel-covered values from the image section.
Classification
(Fully Connected Layer)
- In order to learn the nonlinear combination of high level features a fully connected layer is added.
- After we get our image in suitable form, we flatten it into a column vector, and this is fed to feed forward neural network and then back propagation is applied to every iteration of training.
SINGLE SHOT MULTIBOX DETECTOR
(SSD)
Single Shot MultiBox Detector
SSD ARCHITECTURE
- It is based on a feed-forward convolutional network.
- It consists of 2 main components base network and multi-scale feature block connected in a series.
- Base Network:
- The early layers of the network are based on a standard architecture of CNN.
- It extracts the features from the original images and feeds input to the extra feature layers of the network.
- Multi-scale feature:
- Convolutional feature layers are added to the end of base network that decrease size progressively and allow predictions at multiple scale.
- SSD uses a bounding box technique, a method for fast class agnostic bounding box proposals.
- A non-maximum suppression step to produce the final detections.
- The multibox’s loss function has combined critical components: Confidence Loss, Location Loss and Multibox Loss.
Architecture
Advantages of SSD
One of the fastest algorithm for object detection with a high mAP and can run at a higher fps.
Advantages
Higher presicion than R-CNN, Fast R-CNN, Faster R-CNN and YOLO in most cases.
Uses low level feature maps with high resolution to detect small targets.
PERFORMANCE METRICS
Performance Metrics
MODEL IMPLEMENTATION
Model Implementation
- For applying CNN model, we have used the weights of pre-trained SSD inception V2 COCO model.
- For experimentation, we split the dataset of 2562 labeled images for training and testing in the ratio of 80:20.
- The model is trained in the google colab with 100,000 epochs and pipeline is configured by:
• Number of classes = 1
• batch size = 24,
• matched threshold = 0.7 and unmatched threshold t= 0.3
• number of layers in SSD anchor = 6
• optimizer = RMSEprop optimizer
The model is trained for approximately 23 hours
- Now, the model generated the frozen inference graph (. pb file) which contains the object detection classifier.
VIDEO
PREDICTION
- In order to predict the potholes in the video, Open CV is used and the model is loaded using TensorFlow library.
- The video stream is initialized and frames per second counter is started such that it loops over each and every frame.
- For every frame, object detection model detects potholes and return the dictionary which contains the parameters of the potholes.
- If the detection score is above 0.5, it will draw the boundary box showing bounding score and class label.
VIDEOS COLLECTED
Video Collection
VIDEO 1
- The video was taken over car’s dashboard in Fairfax at parking lot.
- The car is at 15mph speed.
- Length of video is 40 seconds.
- Frame size is (1920,1080)
VIDEO 2
- The video was taken from youtube of Michigan Roads.
- The car was traveling at approximately 60 mph.
- Length of video is 45 seconds.
PERFORMANCE METRICS(GRAPHS)
Performance Metrics
MEAN AVERAGE PRECISION GRAPH
mAP
PREDICTION
Prediction
• The average accuracy of the detection for video 1 is 0.66007.
• The number of pothole detections are 337.
• Total number of frames in video are 1177.
SPEED ANALYSIS
- Video of Michigan Roads is used for this analysis.
- Video is converted into multiple videos with different fps using OpenCV.
- For uniformity, the videos are again converted to 30 fps without changing the speed by using an online application.
Speed Analysis
Original Video Prediction:
• The average accuracy of the detection for original video is 0.72162.
• The number of pothole detections are 1211.
• Total number of frames in video are 1359.
SPEED ANALYSIS
ANALYSIS
• As the speed increased, Number of detection boxes are decreased.
• Accuracy is constant through all speed with approximately 73%.
OBSERVATIONS
From our observations after 80 fps, model is not detecting few potholes in the video as the detection boxes are reducing.
Observations
CONCLUSION
SUMMARY
• The pothole detection using SSD algorithm helps in realtime detection of potholes with good detection box score.
• The speed from above analysis with real world scenario, the speeds are near to or faster than 80 mph.
• We conclude by saying the model can detect the greatest number of potholes when tested on real roads.
Challenges Faced
Challenges
Google Colab GPU got disconnected frequently while training the model and it was taking a lot of time to run.
Due to the Covid-19 situation, taking videos was difficult as we could not find any roads with potholes nearby.
Due to unavailability of high quality GPS cameras we were unable to extract the frame by frame location of the potholes.
- Data cleaning was time-consuming as we need to remove the images which are big in size and images of different formats in order to make dataset consistent.
- Also, we removed the Indian road images to overcome biased model. These are done manually by looking at the dataset after merging images from various Kaggle datasets.
FUTURE WORK
Future Work
- The presence of potholes on the roads is one of the major cause of road accidents such that this work can be extended by developing a real-time mobile application that can detect the potholes on the roads and assists the driver to avoid them.
- Detecting potholes is a labor-intensive and time consuming task such that in order to address this issue we need to find the geo-location of the potholes by using high quality GPS cameras and that can be reported to the traffic authorities for quick repairs.
LESSONS LEARNT
Lessons Learnt
• This project extended our knowledge in the field of computer vision and deep learning algorithms.
• It also helped us in learning how to label the images using Microsoft VoTT tool and what type images need to be given to the model.
• We learnt how to train a CNN model using pre-trained weights of SSD inception V2 coco model and what parameters to tweak in order to improve the performance of the model.
• This project introduced new python packages like OpenCV and Exif. We also learnt how to change video parameters like frames per second (fps), size pixels etc.