Detect tip of tongue using OpenCV

Mansi Kataria
3 min readFeb 10, 2022

--

Quite often we find ourselves in a situation where we want to do more with a face image than just detect the common facial landmarks like mouth, lips, eyes, jaw, etc.

I recently worked on a social media related project where I had to detect the tip of the tongue in real-time video and track the tip as it moves.

My first approach was to find a solution using existing libraries and if the accuracy is not up to the mark, create a custom model.

In this article, I’ll show you a solution using existing libraries.

List of libraries required for the project:

  1. imutils — for real-time video capture
  2. OpenCV — for image processing
  3. dlib — for facial landmark detection
  4. scipy — for distance calculation on facial landmarks
  5. math
  6. numpy

Let’s create 2 python scripts:

  1. face_utils.py — for facial landmark detection and detect is mouth is open
  2. detect-tongue-tip-real-time.py — for real-time video capture and processing each frame to detect the tip of the tongue

Here I’m not explaining the facial landmark detection part in detail, because that is covered in a separate article.

Utility functions in face_utils.py:

get_mouth_loc_with_height

It returns, mouth coordinates, the height of mouth and inner mouth, shape coordinated of mouth

draw_mouth

draws the dotted line on the mouth part on the face in the image, based on landmarks

mouth_aspect_ratio

Calculate aspect ratio of mouth based on landmarks (Ref: https://github.com/mauckc/mouth-open/blob/master/detect_open_mouth.py)

Moving on to actual logic. Let’s discuss it step by step:

  1. Enhance image for that rest of image processing will have a better accuracy.

2. Check whether the face is detected and the mouth is open.

3. If the mouth is open, use local feature detector ORB in OpenCV, after experimentation I found that ORB works better than blob detectors.

Once we have key points, we find the key point that is lowest in the image to give us the tip of the tongue, and then get corresponding key points in the actual image, because in the first step we resized the image.

And that is all….

Here is a demo in the real-time video:

For the entire codebase, more demo, and other details, here is the project.

Is this the accurate way?

This process gives us a pretty good approximation of the tip of the tongue. Still, there are a few assumptions that are made in this process, which will not be always true, like:

  1. The tip of the tongue is always at the bottom of the mouth area.
  2. The tip of the tongue won't be going below the lower lip.

In order to make the process more accurate, I am working on a custom model (will publish an article about it once that is ready).

That’s all for this one folk! ciao…

Feel free to connect with me on Twitter or LinkedIn and drop me a note if you have any questions. If you like this article, follow me right here on Medium.

Mansi Kataria

Email: mkataria920@gmail.com
LinkedIN: https://www.linkedin.com/in/mansirkataria/
Twitter: https://twitter.com/_mansi___
Medium: https://zoomout.medium.com

--

--

Mansi Kataria

A Neophile. An Optimist. Shamelessly Persistent. Freelance #ML Engineer. Travel vlogger (https://rb.gy/yalt2l)