
In most business cases, you have certain target structures you want to fill with the document information. Recognizing hand written documents is often called ICR (intelligent Character Recognition). OCR can be called the 'highest' bottom up technology, where the system has no or only little knowledge about the business context. OCR ItselfĪs a next step, OCR itself interprets pixel-based images to layout and text elements. These pre-processing functions include noise reduction and angle correction, for instance. Image ProcessingĪlthough the scanning devices are getting better, a couple of methods can be used to increase the image quality.
#OCR TOOL IN MICROSOFT OFFICE HOW TO#
There are (at least) two good articles in CodeProject on how to use these APIs. This process is called 'scanning.' There are two important standards used for interacting with the scanning hardware: TWAIN and WIA. To get a more qualified access to your paper based document information, usually a couple steps and techniques are required: Scanningīefore documents are available as images, they have to be digitalized. OCR is only one step in document processing. You just need to rerun the setup.exe (of your Office installation) again and choose the package as in the screenshot below. Good news: Office 2007 and Vista, both support MODI! It's not installed by default, but you can easily add the package via installing options of your Office 2007. But for a few lines of integration code, it is an impressive functionality. In most real world applications, you will need some kind of fuzzy searching since your text results may be corrupted by single OCR errors. Sure, it is restricted to search for plain text. The MODI search has impressive features, and works very well. You will find the search results in the referenced SelectableItem argument. Search.Search( null, ref SelectableItem) MODI.IMiSelectableItem SelectableItem = null MODI offers several arguments to customize your search. Since a document may contain several pages, you can use the search method to browse through the pages.
#OCR TOOL IN MICROSOFT OFFICE FULL#
MODI also offers a full featured built-in search. MessageBox.Show("Document Statistic:\r\n " +statistic) Statistic += " Page "+i+ " : Avarage character height is: "+ MODI.MiRect rect = (MODI.MiRect) word.Rects ĬharactersHeights += įloat avHeight = ( float )charactersHeights/numOfCharacters getting the word's characters for ( int k = 0 k < k++) MODI.Word word = (MODI.Word) layout.Words getting the page's words for ( int j= 0 j< j++) MODI.Image image = (MODI.Image)_MODIDocument.Images string statistic = " " įor ( int i = 0 i < _ i++) iterating through the document's structure doing some statistics.
