Ask AI
Skip to main content

Select links from HTML

This action helps you automatically find and extract all the web links (URLs) from any HTML content you provide. You can choose to get all links, only links that stay within the same website (internal), or only links that go to other websites (external). This is useful for tasks like analyzing website content, building sitemaps, or gathering data from web pages.

Input,

  • HTML (STRING): The full HTML content (as text) from which you want to extract links. This is a required input.
  • Link type (SELECT_ONE): Choose the type of links you want to find. This is a required input.
    • All: Extracts every link found in the HTML.
    • Internal: Extracts only links that point to pages within the same website or domain.
    • External: Extracts only links that point to pages on different websites or domains.
  • Base domain (URL): Specify the main web address (e.g., www.example.com) of the website the HTML content belongs to. This is crucial if you want to filter for "Internal" or "External" links. If you don't provide it, the system will try to guess the base domain from the HTML itself. This input is only required if you select "Internal" or "External" for the "Link type".

Output,

  • Result (ARRAY): A list of all the URLs (as text) that were found and matched your selected criteria.

Execution Flow,

Real-Life Examples,

Imagine you have an HTML snippet of a product description page and you want to quickly see all the links mentioned, regardless of where they lead.

  • Inputs:
    • HTML: <html><body><h1>Product A</h1><p>Check out our <a href="/features">features</a> or visit <a href="https://partner.com/promo">our partner</a>.</p></body></html>
    • Link type: All
    • Base domain: (Left blank)
  • Result: A list containing:
    • https://yourwebsite.com/features (assuming yourwebsite.com was inferred as base)
    • https://partner.com/promo

You've scraped the HTML content of a company's "About Us" page and want to identify all the links that lead to other pages within the same company website (e.g., "Careers," "Contact Us," "Our Team").

  • Inputs:
    • HTML: (The full HTML content of https://www.example.com/about-us)
    • Link type: Internal
    • Base domain: https://www.example.com
  • Result: A list containing URLs like:
    • https://www.example.com/careers
    • https://www.example.com/contact
    • https://www.example.com/team
    • (Excluding any links to https://www.facebook.com/example or https://blog.anotherdomain.com)

Example 3: Identifying external resources linked from a blog post

You have the HTML of a blog post and want to find all the links that point to external websites, such as research papers, news articles, or other blogs, to understand the sources or references used.

  • Inputs:
    • HTML: (The full HTML content of a blog post from https://myblog.org/post-title)
    • Link type: External
    • Base domain: https://myblog.org
  • Result: A list containing URLs like:
    • https://www.researchgate.net/publication/123
    • https://www.nytimes.com/article-about-topic
    • https://anotherblog.com/related-post
    • (Excluding any links to https://myblog.org/category/tech or https://myblog.org/author/john-doe)