Has anyone used OCR software to batch extract invoice data during due diligence?

June 27, 2025
by a searcher from INSEAD in San Francisco, CA, USA
We’re mid-way through diligence on a deal and uncovered that the seller’s homegrown portal stores ~20,000 invoices as PDFs but with no built-in reporting. We may need to extract key fields like invoice amount, vendor, and date to support the QoE analysis.
Has anyone gone through something similar and used OCR tools (e.g., Amazon Textract) to automate this? My initial thought is that we can just pull a representative sample, but trying to assess feasibility if it ends up being needed.
Thanks in advance!
in Lucknow, Uttar Pradesh, India
from University of Virginia in Los Angeles, CA, USA