Yolo-like network for vehicle detection using KITTI dataset

  1. Getting anchor boxes (Link here): YOLO9000, like SSD predicts corrections on top of anchor boxes of different shapes and aspect ratios. However, in YOLO9000, the bounding boxes are computed directly from the data, instead of hand-selecting them before. One way to generate these anchor boxes is to use a clustering method. YOLO9000 uses k-means clustering with 5 centroids, with 1 minus intersection over union (IOU) as the distance measure. The number 5 was obtained by varying the number of centroids, as the value that gave the best trade-off between mean IOU and number of centroids. In this post, I will go over steps needed to load image data and compute candidate anchor boxes using K-means clustering.
  2. Preprocessing ground truth bounding box and images data, making a test network (to write): In YOLO9000, the output of the network is a large convolution map, where each item filter corresponds to a specific prediction. In YOLO9000, for each anchor box, 5 predictions are made regarding the quality of anchor box; 4 correspond to error between ground truth bounding box and anchor boxes, 1 prediction of IOU between bounding box and ground truth, and class of object at each grid point.
  3. The next step is to preprocess the ground truth bounding box labels and image data to generate target predictions (to write). Note that in YOLO9000, all the layers are convolution layers. I have seen some implementations where users predict a fully connected layer and then convert that to convolution type layer to compute losses and predictions, but I will stay true to the original implementation and present it.
  4. Overfitting a deep Learning framework for detection and localicalization (to write). This is perhaps the most crucial step of designing a neural network model. In this step objective is to design a neural network, and fit input-output for a training set comprised of 1 entry alone. If a neural network cannot overfit and predict ground truth for a single image, it most likely to fail on a bigger network. Note that vice versa is not true. This is perhaps the most crucial and most overlooked step in training deep learning models. Its always easier to overfit model on your data set, and then add aspects to not overfit, than to have a network with poor representation power and modify it.
  5. Data augmentation (to write) can be used to generate new ‘unseen’ data. Data augmentation for detection is more difficult than for classification tasks. .
  6. Using pretrained classifier (to write): Last step is to use a pretrained classifier, like VGG16 or inception model to precompute bottleneck features (output from the last layer of base model), and then use these features for training only the convolution layers for final bounding box prediction and classification. This makes the training much faster, and as we precompute the bottleneck features, the larger batch sizes can be used.




Staff Software Engineer at Lockheed Martin-Autonomous System with research interest in control, machine learning/AI. Lifelong learner with glassblowing problem.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

NLP News Cypher | 09.13.20

Confusion Matrix Use case in the evaluation of Cyberattacks

S2R-DepthNet: Learning a Generalizable Depth-specific Structural Representation

A tour of awesome features of spaCy (part 1/2)

Essential Math for Deep Learning

Unsupervised Text Classification

Machine Learning (ML) Salary in India | How Much Does an ML Engineer Earn

Creating, Hosting & Inferencing Machine Learning Model using Google Cloud Platform AutoML

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Vivek Yadav

Vivek Yadav

Staff Software Engineer at Lockheed Martin-Autonomous System with research interest in control, machine learning/AI. Lifelong learner with glassblowing problem.

More from Medium

Installing TensorFlow, CUDA, cuDNN with Anaconda for GeForce GTX 1050 Ti

Tensorflow Basics In 5 Minutes

Key takeaways from building multiple machine learning models

Understanding Gramian Angular Field to convert your time-series data into an image