OCR
Function: OCR
This function uses Optical Character Recognition (OCR) to extract text and data from image files or documents. It can process entire documents or specific pages, and can even structure the extracted information into a predefined format if needed.
Input
- File
- Description: The image or document file you want to process. This could be a scanned invoice, a photo of a form, or a PDF document.
- Type: FILE
- Required: Yes
- Pages
- Description: A list of specific page numbers you want to process, separated by commas (e.g., '1,3,5'). If you leave this empty, the function will process all pages in the file.
- Type: STRING
- Required: No
- Table Format
- Description: Choose how you want any detected tables in your document to be formatted in the output.
- Type: SELECT_ONE
- Options: Markdown, HTML
- Default Value: Markdown
- Required: No
- API Token (optional)
- Description: If your company has a specific API key for this service, you can enter it here. Otherwise, leave it blank to use the default key configured for your company.
- Type: STRING
- Required: No
Output
- Result
- Description: This is the name of the variable where the extracted text or structured data from your file will be stored. You can use this variable in subsequent steps of your application.
- Type: VARIABLE
- Default Value: RESULT
- Response format
- Description: This allows you to define a specific structure (like a template for your data) for the OCR output. If you provide a format, the OCR will try to extract information and fit it into this structure (e.g., a JSON object). If left blank, the output will be plain text or Markdown.
- Type: DATA_FORMAT
Execution Flow
Real-Life Examples
Example 1: Extracting Text from a Scanned Invoice
Imagine you receive many scanned invoices as PDF files and need to extract the total amount and vendor name.
- Inputs:
- File:
invoice_2023_001.pdf(a scanned PDF invoice) - Pages: (empty, processes all pages)
- Table Format:
Markdown - API Token (optional): (empty, uses default)
- Response format: A
DATA_FORMATnamedInvoiceDetailswith fields likeVendorName(STRING),TotalAmount(DOUBLE),InvoiceDate(DATE).
- File:
- Result: The function extracts the vendor name, total amount, and invoice date from the PDF and stores them as a structured object in a variable named
RESULT. For example,RESULTmight contain:\{
"VendorName": "Tech Solutions Inc.",
"TotalAmount": 1250.75,
"InvoiceDate": "2023-10-26"
\}
Example 2: Getting Specific Pages from a Multi-Page Document
You have a long legal document and only need the text from the introduction and conclusion sections, which are on pages 1 and 10.
- Inputs:
- File:
legal_contract.pdf(a multi-page PDF document) - Pages:
1,10 - Table Format:
Markdown - API Token (optional): (empty, uses default)
- Response format: (empty, for plain text output)
- File:
- Result: The function extracts only the text content from page 1 and page 10 of the
legal_contract.pdfand stores it as a single block of plain text in a variable namedRESULT.
Example 3: Converting a Table in an Image to HTML
You have an image of a product catalog with a pricing table, and you want to display this table on a webpage.
- Inputs:
- File:
product_pricing.png(an image file containing a table) - Pages: (empty, processes the single image page)
- Table Format:
HTML - API Token (optional):
your_custom_api_key_123 - Response format: (empty, for plain text/HTML output)
- File:
- Result: The function extracts the table from the
product_pricing.pngimage and converts it into an HTML table string, storing it in a variable namedPRODUCT_TABLE_HTML. This HTML string can then be directly embedded into a web page.