Select links from HTML
Function: Select links from HTML
This function helps you extract all the web links from a piece of HTML content. You can choose to get all links, only links that stay within the same website (internal links), or only links that go to other websites (external links). This is useful for tasks like analyzing website content, building sitemaps, or gathering information from web pages.
Input
- HTML: The full HTML content (as text) from which you want to extract links. This is a required input.
- Link type: Choose the type of links you want to find. This is a required selection.
- All: Extracts every link found in the HTML.
- Internal: Extracts only links that point to pages within the same website.
- External: Extracts only links that point to pages on different websites.
- Base domain: (Optional, but recommended for 'Internal' or 'External' link types) The main web address (domain name) of the website the HTML belongs to. For example,
www.example.com. If you select 'Internal' or 'External' link types and don't provide this, the system will try to guess the base domain from the HTML, which might not always be accurate.
Output
- Result: A list of all the extracted web links (URLs) as text. By default, this list will be stored in a variable named
RESULT.
Execution Flow
Real-Life Examples
Here are some examples of how you can use the "Select links from HTML" function:
Example 1: Extracting All Links from a Product Page
Imagine you have the HTML content of a product page and you want to get a list of all links on that page, including navigation, product images, and related items.
- Inputs:
- HTML:
<html><body><h1>Product A</h1><a href="/products/product-b">Related Product</a><img src="/images/product-a.jpg"><a href="https://www.facebook.com/company">Facebook</a></body></html> - Link type:
All - Base domain: (Left blank)
- HTML:
- Result: The
RESULTvariable will contain a list like:https://yourdomain.com/products/product-b(assumingyourdomain.comis inferred as base)https://www.facebook.com/company
Example 2: Finding Internal Navigation Links on a Company Website
You want to analyze the internal navigation structure of your company's "About Us" page to ensure all internal links are working correctly.
- Inputs:
- HTML:
<html><head><base href="https://www.mycompany.com/"></head><body><a href="/contact">Contact Us</a><a href="https://www.mycompany.com/team">Our Team</a><a href="https://www.partner.com/solutions">Partner Solutions</a></body></html> - Link type:
Internal - Base domain:
https://www.mycompany.com
- HTML:
- Result: The
RESULTvariable will contain a list like:https://www.mycompany.com/contacthttps://www.mycompany.com/team
Example 3: Identifying External Resources from a Blog Post
You've published a blog post and want to quickly see all the external websites it links to, perhaps to check for broken links or to track references.
- Inputs:
- HTML:
<html><body><p>Read more on <a href="https://www.researchsite.org/article1">Research Site</a> and check out our <a href="/blog/related-post">related post</a>.</p><a href="https://twitter.com/mycompany">Twitter</a></body></html> - Link type:
External - Base domain:
https://www.mycompanyblog.com
- HTML:
- Result: The
RESULTvariable will contain a list like:https://www.researchsite.org/article1https://twitter.com/mycompany