AI-generated image from Midjourney representing AI + OCR in auto lending
Optical Character Recognition (OCR) is often considered a magic wand, transforming documents into usable data, with a single “wave”. This myth starts waning as the documents get more complex, while the business use cases still need data extraction at high accuracy levels – That’s where you need to leverage AI. In this blog post, we allude to the significant challenges faced in auto loan documents that contain checkboxes, radio buttons, signatures, tables, itemized text in complex formats, and text encased in boxes. Here, traditional OCR methods, wrapped with simple heuristics, struggle.
Limitations of Traditional OCR in Auto Lending
OCR’s one-size-fits-all approach is a significant blocker in specific business use cases. One such challenge is that of reading text encased in boxes from documents such as Title Application and Odometer Statement. See Figure 1 for examples. Underneath each of the six image segments we show the corresponding OCR text and desired text.
Seemingly simple to a human eye, this presents a unique challenge for a traditional OCR, that is designed to “faithfully” interpret and convert every black pixel on an image segment into some character. See, for example, the image segment “025707” in Figure 1. In this case the OCR text obtained is “0||2||59yg7 07”. It contains every character from the desired text, but also present are many extraneous characters. The problem of deriving the right desired text in presence of many intervening extraneous characters is a challenge for a traditional computer program.
We showcase more scenarios in Figure 1, such as wrong characters (“3983”), missing characters (“079221”), mixing up text from different lines (“3983”), etc. These result in considerable inaccuracies, highlighting the limitations of OCR when applied to complex document formats.
Experience, Expertise & Innovation
Informed has been automating auto loans for over six years, and we have a deep understanding of the unique challenges inherent in auto loan deal jackets. We have not only navigated these challenges but built a robust arsenal of tailored AI tools. These go beyond the constraints of traditional OCR, delivering solutions ensuring reliable information extraction, reduced manual effort, and significantly shorter processing times.
One recent innovation solves the problem with mileage & VIN extraction from Title Application where characters are encased in individual boxes. There are several states using this format such as CA, MI, UT, WI, MO, and MN, and failure to extract them results in a lot of manual work. Let’s face it, squinting at 17 seemingly random characters (for VIN) and ensuring they are accurate is tedious and prone to errors.
A Deep Dive into Segment OCR
The Segment OCR is an image-to-text model that given an image segment of text (for example, the ones shown in Figure 1) is designed to output just the relevant desired text – and omitting unnecessary characters. For the mileage use case, we constrained this sequence-to-sequence neural network to output only characters between 0 through 9, and furthermore restricted it to output sequences no longer than 6.
Figure 2 gives an overall system pipeline to extract relevant fields from a Title Application, as example. From this document type we wish to extract values of numerous fields: applicant name, vin number, make (of the car), model, mileage, etc. Here we show Segment OCR being applied only to mileage field.
Given an IMAGE of a Title Application page we first run a traditional OCR to get TEXT on that page. For all fields other than mileage we simply pick the output of the Extraction Model. This model could be constructed using any one or combination of a variety of methods: heuristics, neural networks, transformers, large language models, etc. It is to be noted that the Extraction Model not only outputs the values, but also is aware where in the page this value resides.
For mileage we use only the bounding box from the Extraction Model, this is so that the appropriate image segment containing mileage text can be cropped. This image segment is then processed by Segment OCR to obtain the value of mileage. It must be emphasized that the OCR text in a mileage image segment is often not reliable (see Figure 1 for examples), which is what necessitates doing a second round of image-to-text conversion. Furthermore, we have found OCR text for mileage to be sometimes so bad that a neural network type of implementation for Extraction Model struggles to even output reliable bounding boxes. For example, we have encountered truncated bounding boxes (preventing Segment OCR from reliable image-to-text conversion), boxes much larger than necessary (for example, segment “3983” in Figure 1; this is not a fundamental impediment), or sometimes no bounding boxes at all (this impedes the Segment OCR pipeline). We have devised custom heuristics to help derive reliable bounding boxes in such cases.
The neural network for Segment OCR for mileage was trained on a dataset exceeding 100k image segments with known ground truths. Since these are text segments, with horizontally separable digits we can produce a large training set using data augmentation techniques, and thereby getting away with only a small set of annotations to start with.
Figure 3 illustrates how a single image segment is used to generate multiple image segments with known ground truths. In this example “3201” we localize the four digits “3”, “2”, “0”, “1” on the image, and then use the four crops to compose up to 256 = 4^4 different image segments with known ground truths.
The neural network for Segment OCR was fitted to the dataset and achieved 97.8% accuracy on the testset. For end-to-end mileage extraction on customer docs, we saw a boost in passing rate from 28% to 86% on a population of 170 Title Applications from MI, UT, FL. In an another testset of 800, we saw an increase from 55% to 85%.
Lastly, we show a collage of image segments of varied quality, format, and visual features that the Segment OCR can successfully process.
Nishit Kumar is Head of Machine Learning at Informed and has been an engineering leader in tech startups for over 20 years. He is developer of AI products with expertise in machine learning, deep learning, computer vision, and AI.
Jatin Agrawal is a product lead at Informed. With over five years of experience in Deep Learning and a background at AI startups and Microsoft, he is committed to improving auto lending with innovative AI products. His work is published in numerous publications.