How to use YOLOv8 and YOLO-NAS for Object Detection

Introduction

In the field of computer vision, object detection is a fundamental technology that enables machines to perceive and interpret the visual environment with exceptional accuracy. This advanced technique empowers computers not only to identify objects within images or videos, but also to localize them by drawing bounding boxes around them. Object detection has transformed numerous industries, from self-driving cars to surveillance systems, and is constantly advancing the frontiers of artificial intelligence.

The core concept of object detection resides in its capacity to efficiently merge two essential tasks: object localization and object classification. The process of object localization entails determining the precise position of an object within an image or video frame, usually by means of bounding boxes or masks.

What are YOLO and YOLO-NAS?

YOLO (You Only Look Once) comprises a range of algorithms developed by Joseph Redmon, et al. for real-time object detection. It is widely used owing to its high speed and accuracy for detecting objects. YOLO algorithms detect objects in a single stage, thereby making them much faster than two-stage detectors that need to examine an image twice. Nevertheless, YOLO algorithms usually exhibit less accuracy compared to the two-stage detectors.

YOLO-NAS is a cutting-edge object detection model created by Deci AI. It’s founded on the YOLOv5 framework and implements Neural Architecture Search (NAS) to determine the optimal model architecture for any task. YOLO-NAS surpasses YOLOv5 in terms of accuracy and speed, and it’s more efficient,which enables it to operate on lower-powered devices.

How to use YOLO for images and videos

Step 1: Installing the necessary libraries

pip install opencv-python ultralytics

Step 2: Importing libraries

import cv2
from ultralytics import YOLO

Step 3: Choose your model

model = YOLO("yolov8n.pt")

On this website, you can compare different models and weigh up their respective advantages and disadvantages. In this case we have chosen yolov8n.pt.

Step 4: Write a function to predict and detect objects in images and videos

def predict(chosen_model, img, classes=[], conf=0.5):
    if classes:
        results = chosen_model.predict(img, classes=classes, conf=conf)
    else:
        results = chosen_model.predict(img, conf=conf)

    return results


def predict_and_detect(chosen_model, img, classes=[], conf=0.5):
    results = predict(chosen_model, img, classes, conf=conf)

    for result in results:
        for box in result.boxes:
            cv2.rectangle(img, (int(box.xyxy[0][0]), int(box.xyxy[0][1])),
                          (int(box.xyxy[0][2]), int(box.xyxy[0][3])), (255, 0, 0), 2)
            cv2.putText(img, f"{result.names[int(box.cls[0])]}",
                        (int(box.xyxy[0][0]), int(box.xyxy[0][1]) - 10),
                        cv2.FONT_HERSHEY_PLAIN, 1, (255, 0, 0), 1)
    return img, results

predict() function

This function takes three arguments:

chosen_model: The trained model to use for prediction
img: The image to make a prediction on
classes: (Optional) A list of class names to filter predictions to
conf: (Optional) The minimum confidence threshold for a prediction to be considered

The function first checks if the classes argument is provided. If it is, then the chosen_model.predict() method is called with the classes argument, which filters the predictions to only those classes. Otherwise, the chosen_model.predict() method is called without the classes argument, which returns all predictions.

The conf argument is used to filter out predictions with a confidence score lower than the specified threshold. This is useful for removing false positives.

The function returns a list of prediction results, where each result contains the following information:

name: The name of the predicted class
conf: The confidence score of the prediction
box: The bounding box of the predicted object

predict_and_detect() function

This function takes the same arguments as the predict() function, but it also returns the annotated image in addition to the prediction results.

The function first calls the predict() function to get the prediction results. Then, it iterates over the prediction results and draws a bounding box around each predicted object. The name of the predicted class is also written above the bounding box.

The function returns a tuple containing the annotated image and the prediction results.

Here is a summary of the differences between the two functions:

The predict() function only returns the prediction results, while the predict_and_detect() function also returns the annotated image.
The predict_and_detect() function is a wrapper around the predict() function, which means that it calls the predict() function internally.

Step 5: Detecting Objects in Images with YOLOv8

# read the image
image = cv2.imread("YourImagePath")

result_img = predict_and_detect(model, image, classes=[], conf=0.5)

If you want to detect specific classes, which you can find here, simply write the ID number of the object in the list of classes.

Step 6: Save and Plot the result Image

cv2.imshow("Image", result_img)
cv2.imwrite("YourSavePath", result_img)
cv2.waitKey(0)

Step 7: Detecting Objects in Videos with YOLOv8

video_path = r"YourVideoPath"

cap = cv2.VideoCapture(video_path)

while True:

    success, img = cap.read()

    if not success:
        break

    result_img, _ = predict_and_detect(model, img, classes=[], conf=0.5)

    cv2.imshow("Image", result_img)
    
    cv2.waitKey(1)

Step 8: Save the result Video

# defining function for creating a writer (for mp4 videos)

def create_video_writer(video_cap, output_filename):
    # grab the width, height, and fps of the frames in the video stream.
    frame_width = int(video_cap.get(cv2.CAP_PROP_FRAME_WIDTH))
    frame_height = int(video_cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
    fps = int(video_cap.get(cv2.CAP_PROP_FPS))

    # initialize the FourCC and a video writer object
    fourcc = cv2.VideoWriter_fourcc(*'MP4V')
    writer = cv2.VideoWriter(output_filename, fourcc, fps,
                             (frame_width, frame_height))

    return writer

Just use the function and code above

output_filename = "YourFilename"
writer = create_video_writer(cap, output_filename)

video_path = r"YourVideoPath"

cap = cv2.VideoCapture(video_path)

while True:

    success, img = cap.read()

    if not success:
        break

    result_img, _ = predict_and_detect(model, img, classes=[], conf=0.5)
    writer.write(result_img)
    cv2.imshow("Image", result_img)
    
    cv2.waitKey(1)

writer.release()

How to use YOLO-NAS for images and videos

Step 1: Installing the necessary libraries

pip install opencv-python torch super-gradients

Step 2: Importing libraries

from super_gradients.training import models
import cv2
import torch

Step 3: Choose your model

device = 'cuda' if torch.cuda.is_available() else "cpu"

model = models.get("yolo_nas_l", pretrained_weights="coco")

On this website, you can compare different models and weigh up their respective advantages and disadvantages. In this case we have chosen yolo_nas_l.

Step 4: Detecting Objects in Images with YOLO-NAS

img_path = r"YourImagePath"
model.predict(img_path, conf=0.25).show()

Step 5: Detecting Objects in Videos with YOLO-NAS

input_video_path = "YourInputVideoPath"
output_video_path = "YourOutputVideoPath"

model.to(device).predict(input_video_path).save(output_video_path)

Conclusion

In this tutorial we have learned how to detect objects with YOLOv8 and YOLO-NAS in images and videos. If you found this code helpful, please clap your hands and comment on this post! I would also love for you to follow me to learn more about Data Science and other related topics. Thanks for reading!