http://www.cse.lehigh.edu/~baird/Pubs/drr07.pdf

“We report an investigation into strategies, algorithms, and software tools for document image content extraction