Generative Data Intelligence

OpenCV Adaptive Thresholding in Python with cv2.adaptiveThreshold()

Date:

Introduction

Thresholding is a simple and efficient technique to perform basic segmentation in an image, and to binarize it (turn it into a binary image) where pixels are either 0 or 1 (or 255 if you’re using integers to represent them).

Typically, you can use thresholding to perform simple background-foreground segmentation in an image, and it boils down to variants on a simple technique for each pixel:

if pixel_value > threshold:
    pixel_value = MAX
else:
    pixel_value = 0

Simple thresholding has glaring issues and requires fairly pristine input, which makes it not-so-practical for many use cases. The main offender is a global threshold which is applied to the entire image, whereas images are rarely uniform enough for blanket thresholds to work, unless they’re artificial.

A global threshold would work well on separating characters in a black and white book, on scanned pages. A global threshold will very likely fail on a phone picture of that same page, since the lighting conditions may be variable between parts of the page, making a global cut-off point too sensitive to real data.

To combat this – we can employ local thresholds, using a technique known as adaptive thresholding. Instead of treating all parts of the image with the same rule, we can change the threshold for each local area that seems fitting for it. This makes thresholding partly invariant to changes in lighting, noise and other factors. While much more useful than global thresholding, thresholding itself is a limited, rigid technique, and is best applied for help with image preprocessing (especially when it comes to identifying images to discard), rather than segmentation.

For more delicate applications that require context, you’re better off employing more advanced techniques, including deep learning, which has been driving the recent advancements in computer vision.

Adaptive Thresholding with OpenCV

Let’s load in an image with variable lighting conditions, where one part of the image is in more focus than another, with the picture being taken from an angle. A picture I took of Harold McGee’s “On Food and Cooking” will serve great!

img = cv2.imread('book.jpg')
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
plt.imshow(img)

Now, using regular thresholding, we can try to separate out the letters from the background, since there’s a clear color diffeence between them. All paper-color will be treated as the background. Since we don’t really know what the threshold should be – let’s apply Otsu’s method to find a good value, anticipating that the image is somewhat bi-modal (dominated by two colors mostly):

img = cv2.imread('book.jpg')


gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
blurred = cv2.GaussianBlur(gray, (7, 7), 0)

ret, mask = cv2.threshold(blurred, 0, 255, cv2.THRESH_OTSU)
print(f'Threshold: {ret}')

fig, ax = plt.subplots(1, 2, figsize=(12, 5))
ax[0].imshow(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
ax[1].imshow(cv2.cvtColor(mask, cv2.COLOR_BGR2RGB))

Let’s take a look at the result:

Ouch. The left part of the text is mainly faded, the shadow around the gutter totally ate a portion of the image, and the text is too saturated! This is an image “in the wild”, and blanket rules such as global thresholding don’t work well. What should the threshold be? It depends on the part of the image!

The cv2.adaptiveThreshold() method allows us to do exactly this:

cv2.adaptiveThreshold(img, 
                      max_value, 
                      adaptive_method, 
                      threshold_method, 
                      block_size, 
                      C)

The adaptive_method can be a cv2.ADAPTIVE_THRESH_MEAN_C or cv2.ADAPTIVE_THRESH_GAUSSIAN_C, where C is the last argument you set. Both of these methods calculate the threshold according to the neighbors of the pixel in question, where the block_size dictates the number of neighbors to be considered (the area of the neighborhood).

ADAPTIVE_THRESH_MEAN_C takes the mean of the neighbors and deducts C, while ADAPTIVE_THRESH_GAUSSIAN_C takes the gaussian-weighted sum of the neighbors and deducts C.

Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!

It also allows you to set a binarization strategy, but is limited to THRESH_BINARY and THRESH_BINARY_INV, and changing between them will effectively switch what’s “background” and what’s “foreground”.

The method just returns the mask for the image – not the return code and the mask. Let’s try segmenting the characters in the same image as before, using adaptive thresholding:


img = cv2.imread('book.jpg')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
blurred = cv2.GaussianBlur(gray, (7, 7), 0)


mask = cv2.adaptiveThreshold(blurred, 
                              255, 
                              cv2.ADAPTIVE_THRESH_MEAN_C, 
                              cv2.THRESH_BINARY, 
                              31, 
                              10)


fig, ax = plt.subplots(1, 2, figsize=(12, 5))
ax[0].imshow(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
ax[1].imshow(cv2.cvtColor(mask, cv2.COLOR_BGR2RGB))
plt.tight_layout()

This results in a much clearner image:

Note: The block_size argument must be an uneven number.

In much the same way, we can apply gaussian thresholding:

mask = cv2.adaptiveThreshold(blurred, 
                              255, 
                              cv2.ADAPTIVE_THRESH_GAUSSIAN_C, 
                              cv2.THRESH_BINARY, 
                              31, 
                              10)

Which also produces a pretty satisfactory image in the end:

Both the block size (neighbor area) and C are hyperparameters to tune here. Try out different values and choose the one that works best on your image. In general, gaussian thresholding is less sensitive to noise and will produce a bit bleaker, cleaner images, but this varies and depends on the input.

Limitations of Adaptive Thresholding

With adaptive thresholding, we were able to avoid the overarching limitation of thresholding, but it’s still relatively rigid and doesn’t work great for colorful inputs. For example, if we load in an image of scissors and a small kit with differing colors, even adaptive thresholding will have issues truly segmenting it right, with certain dark features being outlined, but without entire objects being considered:

If we tweak the block size and C, we can make it consider larger patches to be part of the same object, but then run into issues with making the neighbor sizes too global, falling back to the same overarching issues with global thresholding:

Conclusion

In recent years, binary segmentation (like what we did here) and multi-label segmentation (where you can have an arbitrary number of classes encoded) has been successfully modeled with deep learning networks, which are much more powerful and flexible. In addition, they can encode global and local context into the images they’re segmenting. The downside is – you need data to train them, as well as time and expertise.

For on-the-fly, simple thresholding, you can use OpenCV, and battle some of the limitations using adaptive thresholding rather than global thresholding strategies. For accurate, production-level segmentation, you’ll want to use neural networks.

Going Further – Practical Deep Learning for Computer Vision

Your inquisitive nature makes you want to go further? We recommend checking out our Course: “Practical Deep Learning for Computer Vision with Python”.

Another Computer Vision Course?

We won’t be doing classification of MNIST digits or MNIST fashion. They served their part a long time ago. Too many learning resources are focusing on basic datasets and basic architectures before letting advanced black-box architectures shoulder the burden of performance.

We want to focus on demystification, practicality, understanding, intuition and real projects. Want to learn how you can make a difference? We’ll take you on a ride from the way our brains process images to writing a research-grade deep learning classifier for breast cancer to deep learning networks that “hallucinate”, teaching you the principles and theory through practical work, equipping you with the know-how and tools to become an expert at applying deep learning to solve computer vision.

What’s inside?

  • The first principles of vision and how computers can be taught to “see”
  • Different tasks and applications of computer vision
  • The tools of the trade that will make your work easier
  • Finding, creating and utilizing datasets for computer vision
  • The theory and application of Convolutional Neural Networks
  • Handling domain shift, co-occurrence, and other biases in datasets
  • Transfer Learning and utilizing others’ training time and computational resources for your benefit
  • Building and training a state-of-the-art breast cancer classifier
  • How to apply a healthy dose of skepticism to mainstream ideas and understand the implications of widely adopted techniques
  • Visualizing a ConvNet’s “concept space” using t-SNE and PCA
  • Case studies of how companies use computer vision techniques to achieve better results
  • Proper model evaluation, latent space visualization and identifying the model’s attention
  • Performing domain research, processing your own datasets and establishing model tests
  • Cutting-edge architectures, the progression of ideas, what makes them unique and how to implement them
  • KerasCV – a WIP library for creating state of the art pipelines and models
  • How to parse and read papers and implement them yourself
  • Selecting models depending on your application
  • Creating an end-to-end machine learning pipeline
  • Landscape and intuition on object detection with Faster R-CNNs, RetinaNets, SSDs and YOLO
  • Instance and semantic segmentation
  • Real-Time Object Recognition with YOLOv5
  • Training YOLOv5 Object Detectors
  • Working with Transformers using KerasNLP (industry-strength WIP library)
  • Integrating Transformers with ConvNets to generate captions of images
  • DeepDream
  • Deep Learning model optimization for computer vision
spot_img

Latest Intelligence

spot_img

Chat with us

Hi there! How can I help you?