Ask AI
Skip to main content

Select links from HTML

This function helps you extract all the web links from a piece of HTML content. You can choose to get all links, only links that stay within the same website (internal links), or only links that go to other websites (external links). This is useful for tasks like analyzing website content, building sitemaps, or gathering information from web pages.

Input

  • HTML: The full HTML content (as text) from which you want to extract links. This is a required input.
  • Link type: Choose the type of links you want to find. This is a required selection.
    • All: Extracts every link found in the HTML.
    • Internal: Extracts only links that point to pages within the same website.
    • External: Extracts only links that point to pages on different websites.
  • Base domain: (Optional, but recommended for 'Internal' or 'External' link types) The main web address (domain name) of the website the HTML belongs to. For example, www.example.com. If you select 'Internal' or 'External' link types and don't provide this, the system will try to guess the base domain from the HTML, which might not always be accurate.

Output

  • Result: A list of all the extracted web links (URLs) as text. By default, this list will be stored in a variable named RESULT.

Execution Flow

Real-Life Examples

Here are some examples of how you can use the "Select links from HTML" function:

Imagine you have the HTML content of a product page and you want to get a list of all links on that page, including navigation, product images, and related items.

  • Inputs:
    • HTML: <html><body><h1>Product A</h1><a href="/products/product-b">Related Product</a><img src="/images/product-a.jpg"><a href="https://www.facebook.com/company">Facebook</a></body></html>
    • Link type: All
    • Base domain: (Left blank)
  • Result: The RESULT variable will contain a list like:
    • https://yourdomain.com/products/product-b (assuming yourdomain.com is inferred as base)
    • https://www.facebook.com/company

You want to analyze the internal navigation structure of your company's "About Us" page to ensure all internal links are working correctly.

  • Inputs:
    • HTML: <html><head><base href="https://www.mycompany.com/"></head><body><a href="/contact">Contact Us</a><a href="https://www.mycompany.com/team">Our Team</a><a href="https://www.partner.com/solutions">Partner Solutions</a></body></html>
    • Link type: Internal
    • Base domain: https://www.mycompany.com
  • Result: The RESULT variable will contain a list like:
    • https://www.mycompany.com/contact
    • https://www.mycompany.com/team

Example 3: Identifying External Resources from a Blog Post

You've published a blog post and want to quickly see all the external websites it links to, perhaps to check for broken links or to track references.

  • Inputs:
    • HTML: <html><body><p>Read more on <a href="https://www.researchsite.org/article1">Research Site</a> and check out our <a href="/blog/related-post">related post</a>.</p><a href="https://twitter.com/mycompany">Twitter</a></body></html>
    • Link type: External
    • Base domain: https://www.mycompanyblog.com
  • Result: The RESULT variable will contain a list like:
    • https://www.researchsite.org/article1
    • https://twitter.com/mycompany