OCR
Optical Character Recognition Process Rule
Select PROCESS
> OCR > NEW.
Input Tab
- This the first part of the “OCR Process“ screen, where you create
OCR process rules.
- Enter a Name and Description.
- Select the ”Make OCR Results Searchable” to create and attach
an index of the OCR results to the file. The index attached to the file is
searchable.
- Check Retain Original Indices if you want any files generated by
the process to be indexed the same as the original file. For example, if this
OCR Process outputs a file for each page, the new one-page files will have
the same index information as the original input file.
- Choose the Language of the files to which you will apply this OCR
Process Rule.
- Under the Original’s Content section choose the type of document you
are expecting: Text and Images, Text Only, or Images Only.
- Select either ”All pages” to search all pages of the file(s)
or the ”These pages only” option and specify a range or pages,
separated by a comma.
- Choose either the ”Search entire page for all pages in range, or choose
the ”Search in these zones only” and specify coordinate zones.
- To search in only a specific area, or zone, of a document, select the "Search
in these zones only" radio button.
IMPORTANT: Please note that when
you define one or more zones, the page range selection will become disabled.
The page range cannot be edited when zones are defined for the OCR process.
- Select the size of the pages in the file. Choices include most common paper
sizes. IMPORTANT: If you choose
Auto as the page size, and the resolution of the image cannot be determined
by DocuBreeze, the image will not be processed.
- Select the unit of measurement for the zone area. Choices are: pixels, millimeters,
or inches.
- You may upload a previously defined file to be used as the template for
creating the OCR zone. To do this, however, you must choose pixels
as the unit of measure. For example, if you want to OCR only the text contained
in a letterhead, you can upload a sample of that letterhead to help define
the desired region. Click BROWSE, find the file, and click UPLOAD to get your
sample file.
- To define the zone, click on the ADD button at the bottom of the form. This
brings up the Define OCR Zone window.
- Select All Pages or only Specified Pages in the allowable
range. The allowable range is defined on the main OCR form.
- Set the coordinates of the zone by defining the upper-left and lower-right
corners of a rectangle. For example, the coordinates (X1 = 0, Y1 = 0), (X2
= 100, Y2 = 500) will create a zone 100 pixels tall and 500 pixels wide that
starts at the upper-left corner.
- If you uploaded a document as a sample, the Region Tool will be available.
Click on the Region Tool to open the dialog. Use your mouse to physically
draw the rectangle on the desired location of the document.
IMPORTANT: DocuBreeze, by default, will automatically
rotate the input page to a readable form (i.e. words written upright, left to
right). The OCR Zone will then be re-applied to match the correct coordinates
which you have set up. This means if a portrait document accidentally gets scanned
with a landscape orientation, DocuBreeze will correct the image and apply the
OCR Zone as intended. It is important to know that changes made to document
orientation could override this ability and apply the OCR Zone differently than
intended.
OCR
Tab
Use this tab to establish the settings for your OCR. Filters and Options.
Check Yes to enable the use of DocuBreeze's OCR filters;
descriptions of what the filters do are underneath their names. Select all applicable
filters.
Note: OCR filters do not alter the output, they assist
the OCR engine in producing higher quality, more accurate, results. The more
filters applied to a process, the more accurate that process will be (and the
longer the processing will take).
Apply Dot-Matrix Filter: Filter for documents printed on dot-matrix
printers.
Apply Newspaper Filter: Filter for scanned images of newspaper articles.
Deskew: Straightens crooked images in cases where the original document
may have scanned slightly askew.
Document Orientation: Automatic setting will rotate the document so
that the majority of the printed text is right-side-up. Manual settings will
rotate the document the specified number of degrees (90, 180 or 270) regardless
of the text orientation on the page.
Invert: Filter will read the white portions of the document image as
black and the black portions as white. Therefore, the resulting output file
will be a black/white inverse of the original.
Photo Mode: Filter will scan for text within images and include that
text as part of the OCR process.
Remove All Pictures: Removes images from the document prior to processing,
leaving only the document text.
Remove Fax Noise: Removes speckles and other fax-induced artifacts from
the document image.
Questionable OCR Characters. This feature lets you determine what kind
of character is inserted when DocuBreeze cannot readily recognize a character.
This affects text output only.
IMPORTANT: Be sure to define all
settings in all tabs before clicking SAVE. If you save the settings before you are actually
finished, you can edit the rule later.
Output Tab
- Use this tab to establish the output criteria for the files on which you
have performed OCR.
- OCR Results. Select whether you would like the final file to be
an image file or a text file, and then select the file type.
- File Naming Options. Check “Yes” to include prefix or
suffix for the file name and enter a numerical starting point for sequential
file naming. Once this is established, file names will contain prefixes showing
these settings.
.
Using an OCR Process
IMPORTANT: The DocuBreeze OCR process
supports the following file types: GIF, JPEG, TIFF, PDF, PNG. All other file
types, when encountered by the OCR process, will simply be transferred to the
task destination point. Tip: To make sure that any file to be handled by DocuBreeze
OCR is processed correctly, use a filter on the Collection Point to only accept
the file types listed above.
- To run an OCR task, you must first have created an OCR Process Rule.
- Then select TASKS
> STANDARD TASK > NEW.
- Enter a Name for the OCR Task.
- Enter a Description for the OCR Task.
- Choose a Collection Group
or Point from
the lists.
- Choose a Process Type.
- The drop down menu will contain the available Process Rules you have created.
Choose a Process from this menu and whether you want the documents
to Copy or Move.
- Choose a Distribution
Group or Point
from the list.
- Click SAVE to save the task.
- To schedule the task so that it will run automatically, select TASKS
from the navigation menu. Choose SCHEDULER and click the NEW
button and complete the desired Schedule.
To run the task once, select QUICKTASK >NEW.