1 min readfrom Machine Learning

[D] Large scale OCR [D]

I need to OCR 50 million pages of legal documents. I'm only interested in the text, layout is not very important.

What is the most cost effective way on how I could tackle this while it not taking longer than 1 week?

submitted by /u/vroemboem
[link] [comments]

Want to read more?

Check out the full article on the original site

View original article

Tagged with

#rows.com
#large dataset processing
#OCR
#optical character recognition
#legal documents
#50 million pages
#text extraction
#large scale
#cost effective
#automation
#text
#document processing
#machine learning
#text-only
#tackle
#efficiency
#data processing
#information retrieval
#layout
#scalability