How do I make a feed from a webpage?
Create feeds from websites that don’t offer RSS by using CSS selectors.
Not every website publishes an RSS feed. Web page feeds let you turn any HTML page into a feed by telling LightWatch where to find the content using CSS selectors.
Webpages use HTML as the structure for their content. CSS selectors provide a way to target specific elements within that structure. The webpage feed feature combines these two concepts to let you pick content out of a webpage and turn it into a feed.
This feature works on static webpages. It may not be able to extract content from pages that load dynamically. If you need to create a feed from a dynamic site and the webpage feed feature doesn't work, see How do I create my own feed? for other options.
Setting up a web page feed
- From the Home tab, tap the More menu in the top right.
- Tap Add feed.
- Enter the page URL and tap Search for Feeds.
- If no RSS feed is found, tap Make feed from webpage.
Visual wizard
The easiest way to get started is the visual wizard. Tap Open Visual Wizard to load the page, then tap elements directly to prefill the selectors. LightWatch will try to figure out the right selectors for you. You can fine-tune them manually afterward.
Configuring selectors manually
The configuration screen has fields for each part of a post that LightWatch needs to extract:
- Post — The repeating element that wraps each item on the page (e.g.,
article,.post,.card). This is the only required field. - Title — The element containing the title text within each post. Has optional Attribute, Format, and Titlecase fields (see below).
- Post URL — The element linking to the full post. Defaults to reading the
hrefattribute. - Date — The element containing the publication date within each post.
- Post Content — The element containing images and videos. Supports comma-separated selectors for content in different containers.
- Post ID — A unique identifier for each post, used for deduplication.
Additional fields
Some sections have extra fields for fine-tuning:
- Attribute — Which attribute to read from the matched element. Leave blank to use the element’s text content.
- Format — A regular expression with a single capture group
()to extract part of the matched text. Whatever the group captures becomes the final value. For example,^(.+?) \|on “My Photo | Blog Name” extracts “My Photo”. - Titlecase — Converts the result to title case.
LightWatch validates your selectors against the actual page and shows match counts as you type.
Article content
If the posts link to pages with additional images or content, you can enable Enable article scraping to have LightWatch also visit each linked post and extract media from there. See getting images from linked posts for more specific information, but it’s basically the same idea as what you’ve read here.
Using an AI agent to find selectors
If you’re not sure what selectors to use, paste the following prompt into an AI agent like ChatGPT or Claude along with the page URL.
I'm configuring an RSS reader called LightWatch to create a feed
from a webpage. Please visit this URL and inspect the HTML to find
the right CSS selectors.
URL: [paste page URL here]
The page has a repeating list of items (posts, articles, photos,
etc). I need values for these specific text inputs in the app.
Leave a field blank if it's not needed.
**Post** (required)
- Selector: CSS selector targeting the repeating element that
wraps each item (e.g. article, .post, .card). Must match
multiple items on the page.
**Title**
- Selector: the element within each post containing the title.
- Attribute: which attribute to read, or blank for text content.
- Format: a regex with one capture group () to extract the clean
title. Whatever the group captures becomes the final value. For
example, ^(.+?) \| on "My Photo | Blog Name | 2024" extracts
"My Photo". Leave blank if the title is already clean.
- Titlecase: yes or no. Use yes if the title is ALL CAPS or all
lowercase.
**Post URL**
- Selector: the element within each post linking to the full
post. The app reads the href attribute by default.
- Attribute: which attribute to read if not href.
**Date**
- Selector: the element within each post containing the date.
- Attribute: which attribute to read, or blank for text content.
- Format: a regex with one capture group () if the date needs
extracting from surrounding text.
**Post Content**
- Selector: the element within each post containing images or
videos. Supports comma-separated selectors if media is in
different containers within the post.
**Post ID**
- Selector: an element with a unique identifier per post, used
for deduplication.
- Attribute: which attribute to read (e.g. id, data-id), or
blank for text content.
Test that the Post selector matches multiple items on the page.
Only include fields where you can find a match.
Limitations
Web page feeds depend on the structure of the source page staying reliable. If the site redesigns its HTML, the selectors may need updating. If this happens, you will receive a status notification every time the feed checks for updates informing you that it isn’t finding content where it expects to.