OCR

Function: OCR

This function uses Optical Character Recognition (OCR) to extract text and data from image files or documents. It can process entire documents or specific pages, and can even structure the extracted information into a predefined format if needed.

Input

File
- Description: The image or document file you want to process. This could be a scanned invoice, a photo of a form, or a PDF document.
- Type: FILE
- Required: Yes
Pages
- Description: A list of specific page numbers you want to process, separated by commas (e.g., '1,3,5'). If you leave this empty, the function will process all pages in the file.
- Type: STRING
- Required: No
Table Format
- Description: Choose how you want any detected tables in your document to be formatted in the output.
- Type: SELECT_ONE
- Options: Markdown, HTML
- Default Value: Markdown
- Required: No
API Token (optional)
- Description: If your company has a specific API key for this service, you can enter it here. Otherwise, leave it blank to use the default key configured for your company.
- Type: STRING
- Required: No

Output

Result
- Description: This is the name of the variable where the extracted text or structured data from your file will be stored. You can use this variable in subsequent steps of your application.
- Type: VARIABLE
- Default Value: RESULT
Response format
- Description: This allows you to define a specific structure (like a template for your data) for the OCR output. If you provide a format, the OCR will try to extract information and fit it into this structure (e.g., a JSON object). If left blank, the output will be plain text or Markdown.
- Type: DATA_FORMAT

Execution Flow

Real-Life Examples

Example 1: Extracting Text from a Scanned Invoice

Imagine you receive many scanned invoices as PDF files and need to extract the total amount and vendor name.

Inputs:
- File: invoice_2023_001.pdf (a scanned PDF invoice)
- Pages: (empty, processes all pages)
- Table Format: Markdown
- API Token (optional): (empty, uses default)
- Response format: A DATA_FORMAT named InvoiceDetails with fields like VendorName (STRING), TotalAmount (DOUBLE), InvoiceDate (DATE).
Result: The function extracts the vendor name, total amount, and invoice date from the PDF and stores them as a structured object in a variable named RESULT. For example, RESULT might contain:
```
\{
  "VendorName": "Tech Solutions Inc.",
  "TotalAmount": 1250.75,
  "InvoiceDate": "2023-10-26"
\}
```

Example 2: Getting Specific Pages from a Multi-Page Document

You have a long legal document and only need the text from the introduction and conclusion sections, which are on pages 1 and 10.

Inputs:
- File: legal_contract.pdf (a multi-page PDF document)
- Pages: 1,10
- Table Format: Markdown
- API Token (optional): (empty, uses default)
- Response format: (empty, for plain text output)
Result: The function extracts only the text content from page 1 and page 10 of the legal_contract.pdf and stores it as a single block of plain text in a variable named RESULT.

Example 3: Converting a Table in an Image to HTML

You have an image of a product catalog with a pricing table, and you want to display this table on a webpage.

Inputs:
- File: product_pricing.png (an image file containing a table)
- Pages: (empty, processes the single image page)
- Table Format: HTML
- API Token (optional): your_custom_api_key_123
- Response format: (empty, for plain text/HTML output)
Result: The function extracts the table from the product_pricing.png image and converts it into an HTML table string, storing it in a variable named PRODUCT_TABLE_HTML. This HTML string can then be directly embedded into a web page.

Function: OCR​

Input​

Output​

Execution Flow​

Real-Life Examples​

Example 1: Extracting Text from a Scanned Invoice​

Example 2: Getting Specific Pages from a Multi-Page Document​

Example 3: Converting a Table in an Image to HTML​