Real-time scoreboard digit recognition (OCR) with a webcam
June 7th, 2016
Broadcasts and streams of sports matches require clear and accurate graphics of the game clock and current score. Having an all-in-one hardware solution to read this data from the venue scoreboard is difficult, as protocols vary widely between vendors and scoreboard types. Using a regular webcam with optical character recognition, reading these numbers in realtime is efficient and flexible for any scoreboard type and production system.
The first attempt was to use an existing OCR program called SSOCR (seven-segment optical character recognition).
This proved to be too resource-intensive as multiple processes had to be run for each digit of the scoreboard (can be up to 10 depending on the sport). It also couldn’t handle decimal numbers or read scoreboards that do not use the traditional 7-segment font, like the shot clock seen later.
The USB webcam I tested with had a replaceable (but fixed focal length) lens, like one of those that FPV cameras on RC planes use. This is essential for USB webcams as the resolution and picture quality is quite poor. With a zoomed in picture, more detail of the scoreboard is available.
Here are a couple examples of the video quality coming from the webcam with a longer focal length lens:
Some webcams have a software configurable exposure setting which helps with getting the best video signal for character recognition.
The incoming video needs to be processed before character recognition can take place. The goal is to provide a clear binary black and white image. The following transformations are applied in order.
The basic transformations are frame rate, crop, and rotation, which can be configured from the GUI. A lower frame rate reduces the computing power needed. As each digit on the scoreboard only changes at most once every 0.1 sec, a frame rate of 10 fps (rather than the default 30 fps) is sufficient. Other frames coming in from the webcam are discarded.
Threshholding is applied to convert the color image to black and white. The values are very dependent on the lighting conditions, webcam, scoreboard brightness, exposure settings etc to get the best resulting black and white image possible. Each red, green, and blue color are individually thresholded.
At the end of thresholding, the image consists only of full-black and full-white pixels. Erosion is applied after thresholding to enlarge the individual segments of the digits. Doing so provides a cleaner picture for character recognition. Multiple iterations can be configured in the GUI to further enlarge the dots.
img_HSV = cvtColor(img_cropped, COLOR_BGR2HSV) threshA = inRange(img_HSV, (20, 40, 40), (40, 255, 255)) threshB = inRange(img_HSV, (170, 60, 60), (180, 255, 255)) threshC = inRange(img_HSV, (0, 60, 60), (10, 255, 255)) th3 = threshA + threshB + threshC ret3, th3 = threshold(th3, 127, 255, THRESH_BINARY_INV) img_processed = erode(th3, numpy.ones((2,2),numpy.uint8), iterations = self.erosion)
Each individual digit of the scoreboard is cropped to a manually configured bounding box, as shown with the red outlines above. For basketball, there would be 6 total digits (4 from the game clock, 2 from the shot clock) and 2 indicators. Some scoreboards count in 0.1s intervals when below a certain clock time, so the indicators are used to determine if the game clock is below 60 seconds and also if the shot clock is below 10.
The indicators are placed on the top dot of the colon (‘:’) for the game clock, and where the decimal would be for the shot clock.
Each digit on the scoreboard can be expressed by a 5x7 array of pixels. This is the traditional dot-matrix character size.
After each of the digits are cropped to the bounding box, they are resized to a dimension of 5x7. A nearest-neighbor interpolation is used to maintain the edges between white and black.
All the digits are compared to a database of pre-generated reference digits. For example, here are all the reference digits for the number ‘1’:
Artifacts are introduced because of webcam quality, poor configuration, bad cropping, and just general noise. Every digit has 10-40 reference digits that were generated from an actual recording of the scoreboard. By comparing every cropped digit to every reference digit, a match can be found.
Once the digits are read, the final step formats the clocks to a human-readable form. Depending on the indicators, decimals or colons are placed. If a digit is unrecognizable, the previous recognized digit is used, so no change to the digit will be made.
The method of character recognition developed in this post is quite brute-force, as at its core it is just comparing the digit images to a fixed-list of reference images. However, the simplicity of this solution gives it some advantages over more traditional OCR methods:
- it is faster and significantly less resource-intensive, able to handle hundreds of digits in real time
- with the faster processing, 0.1s clocks will not lag behind
- it is able to handle different fonts, from 7-segment displays to 5x7 dot matrix displays
- indicators can be used to determine <60s, <10s clocks when it starts counting 0.1s
The implementation is quite rudimentary at the moment as it was developed for a specific tournament, so many improvements can be made to make it more general-purpose:
- Cleaner GUI for webcam configurations
- More advanced video processing options
- Auto tracking of where the digits drift as webcam position drifts
- Easier way to mark the bounding boxes
- More output options (serial ports, websockets, CSV files, etc) for compatibility with graphics generators (LiveText, Chyron, VMix, etc)
- Finer tuning of reference digits and comparison algorithm
Code is published on GitHub at Scoreboard-webcam-OCR.