OCR

Optical Character Recognition Process Rule

Select PROCESS > OCR > NEW.

Input Tab

  1. This the first part of the “OCR Process“ screen, where you create OCR process rules.
  2. Enter a Name and Description.
  3. Select the ”Make OCR Results Searchable” to create and attach an index of the OCR results to the file. The index attached to the file is searchable.
  4. Check Retain Original Indices if you want any files generated by the process to be indexed the same as the original file. For example, if this OCR Process outputs a file for each page, the new one-page files will have the same index information as the original input file.
  5. Choose the Language of the files to which you will apply this OCR Process Rule.

  1. Under the Original’s Content section choose the type of document you are expecting: Text and Images, Text Only, or Images Only.
  2. Select either ”All pages” to search all pages of the file(s) or the ”These pages only” option and specify a range or pages, separated by a comma.
  3. Choose either the ”Search entire page for all pages in range, or choose the ”Search in these zones only” and specify coordinate zones.
  4. To search in only a specific area, or zone, of a document, select the "Search in these zones only" radio button.

IMPORTANT: Please note that when you define one or more zones, the page range selection will become disabled. The page range cannot be edited when zones are defined for the OCR process.

  1. Select the size of the pages in the file. Choices include most common paper sizes. IMPORTANT: If you choose Auto as the page size, and the resolution of the image cannot be determined by DocuBreeze, the image will not be processed.
  2. Select the unit of measurement for the zone area. Choices are: pixels, millimeters, or inches.
  3. You may upload a previously defined file to be used as the template for creating the OCR zone. To do this, however, you must choose pixels as the unit of measure. For example, if you want to OCR only the text contained in a letterhead, you can upload a sample of that letterhead to help define the desired region. Click BROWSE, find the file, and click UPLOAD to get your sample file.
  4. To define the zone, click on the ADD button at the bottom of the form. This brings up the Define OCR Zone window.

  1. Select All Pages or only Specified Pages in the allowable range. The allowable range is defined on the main OCR form.
  2. Set the coordinates of the zone by defining the upper-left and lower-right corners of a rectangle. For example, the coordinates (X1 = 0, Y1 = 0), (X2 = 100, Y2 = 500) will create a zone 100 pixels tall and 500 pixels wide that starts at the upper-left corner.
  3. If you uploaded a document as a sample, the Region Tool will be available. Click on the Region Tool to open the dialog. Use your mouse to physically draw the rectangle on the desired location of the document.

IMPORTANT: DocuBreeze, by default, will automatically rotate the input page to a readable form (i.e. words written upright, left to right). The OCR Zone will then be re-applied to match the correct coordinates which you have set up. This means if a portrait document accidentally gets scanned with a landscape orientation, DocuBreeze will correct the image and apply the OCR Zone as intended. It is important to know that changes made to document orientation could override this ability and apply the OCR Zone differently than intended.


OCR Tab

Use this tab to establish the settings for your OCR. Filters and Options. Check Yes to enable the use of DocuBreeze's OCR filters; descriptions of what the filters do are underneath their names. Select all applicable filters.

Note: OCR filters do not alter the output, they assist the OCR engine in producing higher quality, more accurate, results. The more filters applied to a process, the more accurate that process will be (and the longer the processing will take).

Apply Dot-Matrix Filter: Filter for documents printed on dot-matrix printers.

Apply Newspaper Filter: Filter for scanned images of newspaper articles.

Deskew: Straightens crooked images in cases where the original document may have scanned slightly askew.

Document Orientation: Automatic setting will rotate the document so that the majority of the printed text is right-side-up. Manual settings will rotate the document the specified number of degrees (90, 180 or 270) regardless of the text orientation on the page.

Invert: Filter will read the white portions of the document image as black and the black portions as white. Therefore, the resulting output file will be a black/white inverse of the original.

Photo Mode: Filter will scan for text within images and include that text as part of the OCR process.

Remove All Pictures: Removes images from the document prior to processing, leaving only the document text.

Remove Fax Noise: Removes speckles and other fax-induced artifacts from the document image.

Questionable OCR Characters. This feature lets you determine what kind of character is inserted when DocuBreeze cannot readily recognize a character. This affects text output only.

IMPORTANT: Be sure to define all settings in all tabs before clicking SAVE. If you save the settings before you are actually finished, you can edit the rule later.


Output Tab

  1. Use this tab to establish the output criteria for the files on which you have performed OCR.
  2. OCR Results. Select whether you would like the final file to be an image file or a text file, and then select the file type.
  3. File Naming Options. Check “Yes” to include prefix or suffix for the file name and enter a numerical starting point for sequential file naming. Once this is established, file names will contain prefixes showing these settings.

.


Using an OCR Process

IMPORTANT: The DocuBreeze OCR process supports the following file types: GIF, JPEG, TIFF, PDF, PNG. All other file types, when encountered by the OCR process, will simply be transferred to the task destination point. Tip: To make sure that any file to be handled by DocuBreeze OCR is processed correctly, use a filter on the Collection Point to only accept the file types listed above.

  1. To run an OCR task, you must first have created an OCR Process Rule.
  2. Then select TASKS > STANDARD TASK > NEW.
  3. Enter a Name for the OCR Task.
  4. Enter a Description for the OCR Task.
  5. Choose a Collection Group or Point from the lists.
  6. Choose a Process Type.
  7. The drop down menu will contain the available Process Rules you have created. Choose a Process from this menu and whether you want the documents to Copy or Move.
  8. Choose a Distribution Group or Point from the list.
  9. Click SAVE to save the task.
  10. To schedule the task so that it will run automatically, select TASKS from the navigation menu. Choose SCHEDULER and click the NEW button and complete the desired Schedule.
    To run the task once, select QUICKTASK >NEW.