Practical Challenges in Machine Vision


In this blog post, I'll discuss some of the challenges we faced and how we are addressing them. This is especially regarding our Deep Learning Object Detection Model - YOLOv8 nano.


Our Current Setup




Our Robot's vision system performs 2 tasks:

  1. Object Detection
  2. Coordinate Transformation

We achieve Object Detection by running inference on a pre-trained YOLOv8-nano model trained to a custom dataset. More about the model's selection criteria and the training process will be discussed in a different blog post. 

Coordinate transformation is the process of converting the camera coordinates (in pixels) to world coordinates (in mm) for the Robot Arm to move and successfully pick the detected object.

In summary, our robot stops in front of an object, we use OpenCV to capture a frame from the camera, the frame is then passed to the YOLO model for inference, and afterward, the coordinates of the detected object are passed onto a transform function that converts the camera coordinates to world coordinates.

The coordinates are then passed to the Robot Arm's microcontroller via Serial (UART) for the arm to move and pick the object.


Challenge 1: Noise from the Environment




As you can see from the image on the left above, the model incorrectly identifies the white tile at the bottom of the frame as a trailer. This will obviously lead to the arm picking an unwanted object (white tile).

By having a contrasting background i.e. the cardboard, this error is immediately removed. Furthermore, the inference result shows that the model detects the trailer with 91% confidence.

In case the background cannot be physically changed, image processing techniques can be utilized - to extract the foreground from the background, for example - before passing the frame onto the model for inference. OpenCV has numerous tools to aid in image processing.

For our case, however, we will not need further image processing as the objects will be placed on a black background.


Another example: The image above shows the model incorrectly identifying my colleague's hand as a red wheel! Albeit at a relatively low confidence of 36%

This problem could easily be eliminated by placing a threshold on the model's confidence. i.e. not showing objects detected with a confidence of less than 50%. This is as easy as passing a confidence threshold on the YOLO's Python Inference Function.

Placing a high confidence threshold risks actual objects not being detected, so the actual value needs to be determined experimentally. 45% seems to be ideal in our case.

Challenge 2: Detecting Multiple Objects (The Wheel Rack Problem)





From the images above, we can note that the model doesn't detect all the wheels in the image. In the first image, it fails to detect the blue and white wheel at the top, while in the second image, the model fails to detect the blue wheel. This inconsistency is a big problem as it will lead to some objects not being picked up by our robot.

The model is very unreliable at the wheel rack i.e. it detects a few of the wheels sometimes just one of the wheels. It rarely detects all four of the wheels (as far as our tests so far are concerned)

Lighting

Lighting is part of the problem here. Let's look at the image on the right.

Let's focus on the white wheels for now. The white wheel at the top is well-lit while the one at the bottom of the shelf isn't as much (furthermore, the white wheel at the bottom sits on a white tile, which is noise that affects the model's performance)

The model detects the white wheel at the top with 74% confidence while it detects the white wheel at the bottom with 27% confidence⚠️

One solution is adding LEDs to the vision system to illuminate the scene. We could also re-train the model to handle low-light situations. We could also combine both solutions and then test to determine the most optimal way forward.


Challenge 3: Similar Objects




The objects shown in the images above are very similar and thus I won't blame the model for not correctly identifying these objects. The image on the left contains an engine - which is incorrectly identified as being a white wheel. While the image on the right contains a cabin - which is incorrectly identified as being both a trailer and a white wheel.

The images below show the trailer, engine, and cabin respectively.




The objects (trailer, cabin, and engine) are pretty distinct, but from our camera's position, they all have a similar profile. Thus the model will always detect the cabin as a trailer or white wheel.

Secondly, our dataset was not balanced during training. It had more images of the trailer than any other object, resulting in the model being biased toward the trailer. [More about the model training process will be discussed in a different blog post]

There are many proposed solutions; one of which is to make the robot move close enough to the object to distinguish their profiles. This might not be a good solution as the robot arm needs some clearance from the objects to pick them up - this clearance might not be close enough for the objects' profiles to be distinguishable.

The second proposed solution is to mount the camera at a different position or configuration to get a better look at the objects.

Thirdly, the YOLO model could be re-trained with a balanced dataset, and with more examples (images) from the robot's perspective.


Summary


  • Background Noise: Can be reduced by image processing where the physical environment cannot be altered. Additionally, applying a confidence threshold to the model will reduce False Positives
  • Low-light scenes: Add LEDs for illumination of the scene. Additionally, re-train the model with more examples in low-light conditions.
  • Similar Profiles: Reduce the distance to objects where possible, or change the camera's position relative to the objects. Additionally, re-train the model with more examples from the robot's perspective


Further Reading

Get your feet wet by understanding the tech behind our Vision Stack: YOLOv8, OpenCV & the Raspberry Pi Camera






Comments

Popular posts from this blog

ROBOT MOBILE PLATFORM