An Easy Way to Add NMS to YOLOv8 graph

A journey to seamlessly incorporate NMS into YOLOv8 graph, streamlining the inference process and simplifying your workflow.

Sep 11, 2023

In the rapidly advancing realm of computer vision, object detection holds a crucial role in a wide array of applications, spanning from autonomous vehicles and surveillance systems to the intricate field of medical imaging. Amidst the plethora of object detection frameworks available, YOLO (You Only Look Once) has garnered immense popularity, celebrated for its real-time capabilities and precision. YOLOv8, the latest evolution in the YOLO lineage, introduces substantial improvements in object detection performance. However, it presents a challenge: the absence of built-in Non-Maximum Suppression (NMS) within the graph necessitates manual integration as part of your post-processing code.

In this article, we embark on a journey to seamlessly incorporate NMS into the graph, streamlining the inference process and simplifying your workflow.

You can find the code on GitHub

First of all, we need to understand the essential operations that we are going to use from `onnx.helper`

make_node
- Used to create a processing node that will be added later to the graph.
- which takes as arguments
  - op_type (string) – The name of the operator to construct
  - inputs (list of string) – list of input names
  - outputs (list of string) – list of output names
  - name (string, default None) – optional unique identifier for NodeProto
  - doc_string (string, default None) – optional documentation string for NodeProto
  - domain (string, default None) – optional domain for NodeProto. If it’s None, we will just use default domain (which is empty)
  - **kwargs (dict) – the attributes of the node.
- example
```
transpose_bboxes_node = onnx.helper.make_node("Transpose", inputs=["/model.22/Mul_2_output_0"], outputs=["bboxes"], perm=(0, 2, 1))
```
make_tensor
- Used to create a tensor with specified arguments
- which takes as arguments
  - name (string) – tensor name
  - data_type (int) – a value such as onnx.TensorProto.FLOAT
  - dims (List[int]) – shape
  - vals – values
  - raw (bool) – if True, vals contains the serialized content of the tensor, otherwise, vals should be a list of values of the type defined by data_type
- example
```
score_threshold = onnx.helper.make_tensor("score_threshold", onnx.TensorProto.FLOAT, [], [0.25])
```
make_tensor_value_info
- Used to create a placeholder for a tensor
- which takes as arguments
  - name (string) – tensor name
  - elem_type (int) a value such as onnx.TensorProto.FLOAT
  - shape (List of string or integer) optional
  - doc_string (string) optional
  - shape_denotation (List of string) optional
- example
```
sigmoid_node = onnx.helper.make_tensor_value_info("/model.22/Sigmoid_output_0", onnx.TensorProto.FLOAT, shape=["batch", 80, 8400])
```

Now that we've familiarized ourselves with the essential operations to be employed, let's begin by the imports part and then utilizing a function to read the YOLOv8 ONNX file.

import onnx
import torch
import onnxsim
import onnxruntime as ort
import numpy as np

from torch import nn
from onnx.tools import update_model_dims
from onnx.compose import merge_models
from onnx.version_converter import convert_version

def load_model(path):
    onnx_model = onnx.load_model(path)
    print(f"{path} Loaded")
    onnx.checker.check_model(onnx_model)
    return onnx_model

onnx_model = load_model("best.onnx")
graph = onnx_model.graph

Now, we are going to define 3 tensors for score threshold, IOU threshold and maximum number of bboxes per class

score_threshold = onnx.helper.make_tensor(
                              "score_threshold",
                              onnx.TensorProto.FLOAT,
                              [],
                              [0.25])

iou_threshold = onnx.helper.make_tensor(
                              "iou_threshold",
                              onnx.TensorProto.FLOAT,
                              [],
                              [0.45])

max_output_boxes_per_class = onnx.helper.make_tensor(
                              "max_output_boxes_per_class",
                              onnx.TensorProto.INT64,
                              [],
                              [300])

graph.initializer.append(score_threshold)
graph.initializer.append(iou_threshold)
graph.initializer.append(max_output_boxes_per_class)

After we initialized the previous tensors we need to get the names of the scores and boxes nodes from the graph, you can do this by opening the ONNX model using Netron, scroll down then select Mul node, Sigmoid node and Concat node

After selecting the Sigmoid node you can see the output name at the bottom right of the window which is “/model.22/Sigmoid_output_0“

Now lets define three variables that will hold the names of the nodes to make it easy to use later

Mul = "/model.22/Mul_5_output_0"
Sig = "/model.22/Sigmoid_output_0"
Con = "/model.22/Concat_25"

Now lets create a node to transpose the boxes before passing it to the NMS and add it to the graph

transpose_bboxes_node = onnx.helper.make_node(
                               "Transpose",
                               inputs=[Mul],
                               outputs=["bboxes"],
                               perm=(0, 2, 1))
graph.node.append(transpose_bboxes_node)

Now lets define the NMS node that will take the pre-defined tensors and the bboxes and sigmoid nodes.

Note that we need to set center_point_box=1 as we are dealing with PyTorch model that outputs [x_center, y_center, W, H] while the NMS node expect [x_min, y_min, x_max, y_max]

inputs = ['bboxes', Sig, 'max_output_boxes_per_class', 'iou_threshold', 'score_threshold']

outputs = ["selected_indices"]

nms_node = onnx.helper.make_node('NonMaxSuppression',
                                  inputs, 
                                  outputs, 
                                  center_point_box=1)

graph.node.append(nms_node)

Now let’s create a node for the selected indices which will hold [batch_index, class_index, box_index]

output_value_info = onnx.helper.make_tensor_value_info(
                                   "selected_indices",
                                    onnx.TensorProto.INT64,
                                    shape=[None, 3])

graph.output.append(output_value_info)

Next, we should remove the unused Concat node and the original output node from the model. Instead, we'll replace them with the Mul and Sigmoid nodes to create a model that provides the original boxes, confidence scores, and selected indices as output.

nc = 80 # Number of classes
last_concat_node = [
    node for node in onnx_model.graph.node if node.name == Con
][0]
onnx_model.graph.node.remove(last_concat_node)

output0 = [
    o for o in onnx_model.graph.output if o.name == "output0"
][0]
onnx_model.graph.output.remove(output0)

mul_node = onnx.helper.make_tensor_value_info(
                          Mul,
                          onnx.TensorProto.FLOAT,
                          shape=["batch", 4, 8400])

sig_node = onnx.helper.make_tensor_value_info(
                          Sig, 
                          onnx.TensorProto.FLOAT, 
                          shape=["batch", nc, 8400])

graph.output.append(mul_node)
graph.output.append(sig_node)

Now Let’s save the model and check our progress

onnx.checker.check_model(onnx_model)
onnx.save(onnx_model, "best_nms.onnx")

Our model now produces three values: selected indices, bounding box coordinates, and class information. However, we still need to process 8400 different records in post-processing and select from them using the specified indices. Can we address this challenge within the graph itself?

Indeed, we can. Let's explore how.

Initially, we need to create a transformation module that takes the selected indices, bounding boxes, and class information as input. This module will return only the selected boxes along with their corresponding class and associated scores.

class Transform(nn.Module):
    def forward(self, idxTensor, boxes, scores):
        batches = idxTensor[:, 0]
        bbox_result = self.gather(boxes, idxTensor)

        score_intermediate_result = self.gather(
                                    scores,
                                    idxTensor).max(axis=-1)
        
        score_result = score_intermediate_result.values
        classes_result = score_intermediate_result.indices
        
        concatenated = torch.concat([
                                    bbox_result[0], 
                                    score_result.T,  
                                    classes_result.T], -1)
                
        return concatenated, batches

    def gather(self, target, idxTensor):
        pick_indices = idxTensor[..., -1:].repeat(1, target.shape[1])
        if len(pick_indices.shape) == 2:
            pick_indices = pick_indices.unsqueeze(0)
        return torch.gather(target.permute(0, 2, 1), 1, pick_indices)

As you can see this module outputs two tensors

The selected boxes, score, class with shape [number of boxes, 6]
The index of the box in the batch with shape [number of boxes]

Now we need to export this module to ONNX

# Use the nms model to simulate the input of the module

session = ort.InferenceSession("best_nms.onnx")
outname = [i.name for i in session.get_outputs()]
inname = [i.name for i in session.get_inputs()]
image = np.random.rand(8, 3, 640, 640).astype(np.float32)
output = session.run(outname, {inname[0]: image})

torch.onnx.export(Transform(), (
                          torch.tensor(output[0], dtype=torch.int64),
                          torch.Tensor(output[1]),
                          torch.Tensor(output[2])
                          ),
                    "./NMS_after.onnx", 
                    input_names=outname, 
                    output_names=["det_bboxes", "batches"], 
                    dynamic_axes={
                        "det_bboxes": {0: "num_results"},
                        "batches": {0: "num_results"},
                    })

nms_postprocess_onnx_model = onnx.load_model("./NMS_after.onnx")
nms_postprocess_onnx_model_sim, check = onnxsim.simplify(nms_postprocess_onnx_model)
onnx.save(nms_postprocess_onnx_model_sim, "./NMS_after_sim.onnx")

After saving the model we have now two onnx files and we need to combine them in one file

best_nms.onnx which is the model + NMS
NMS_after_sim.onnx which is the transformation module

input_dims = {
    "images": ["batch", 3, 640, 640],
}

output_dims = {
    "selected_indices": ["batch", 3],
    Mul: ["batch", "boxes", "num_anchors"],
    Sig: ["batch", "classes", "num_anchors"],
}

target_ir_version = 18
updated_onnx_model = update_model_dims.update_inputs_outputs_dims(
                                                 onnx_model, 
                                                 input_dims,
                                                 output_dims)
core_model = convert_version(updated_onnx_model, target_ir_version)
onnx.checker.check_model(core_model)

core_model.ir_version = 8
post_process_model = convert_version(
                               nms_postprocess_onnx_model_sim,
                               target_ir_version)
onnx.checker.check_model(post_process_model)
post_process_model.ir_version = 8

combined_onnx_model = merge_models(core_model, post_process_model, io_map=[
    (Mul, Mul),
    (Sig, Sig),
    ('selected_indices', 'selected_indices')
])
onnx.save(combined_onnx_model, './final_model.onnx')

Now let’s try our final YOLOv8 with NMS model

Note that you may need to add the following code block to your post-process function, where boxes is the first output of your onnx model and indx is the second one.

final_result = [[] for _ in range(max(indx) + 1)]
    for value, batch in zip(boxes, indx):
        final_result[batch].append(value)

Reference: Stitching non max suppression (NMS) to YOLOv8n on exported ONNX model

Deep Diaries By Ali

Discussion about this post