XML
RSS
XPATH
Not a lot , but a little. When setting up your feed meta data in the program, you can see some standard values set, this should work for most feeds, but lets look at them to understand
So find your feed, and view the source, you will see it is probably structured
Code: Select all
<item>
....details of item1
</item>
<item>
....details of item2
</item>
etc
Code: Select all
<item>
It may also be
Code: Select all
<entry>
Then we have
Code: Select all
<title>
<link>
<description>
Now images are a bit more complex, because not only do you want the above information, you also want the image URL. This is unlikely to be contained nicely in an element, so we need to scrape it from the page.
There are a few ways to do this
1) Using XPATH with an element (note you cannot use XPATH in an :encoded element
Code: Select all
<description/img[@id='image_id_1379182']>
Code: Select all
<description[@src]>
so
<description>
...blah html blah <img src="http://twitterdub.wordpress.com/images/ ... 7490u3.jpg"> blah html
</descripion>
so we can use the token along with a pattern match like
[srclike=twitterdub.wordpress.com/images]
so our meta value is
<description>[srclike=twitterdub.wordpress.com/images]
Note this token does work with :encoded elements if your image is within an encoded tag this works fine:
<content:encoded>[srclike=XXX]
Similar helper functions are
<item>[json=XXX]
This will find the JSON value within the text of the XML element <item>. Where JSON format is
"name":"value"
<item>[urllike=XXX]
Similar to srclike, this will find the first URL to that contains the pattern XXX, so you can specify
<item>[urllike=rootjazz.com/videos]
and it will return the first URL within the text of <item> tag that contains a URL that contains rootjazz.com/videos, i.e.
https://rootjazz.com/videos/a-video-file.mp4
Another way to get at the image, is to scrape a path that is linked from the RSS feed using the token:
[scrapexpath=xpathtocontent]
Someone wanted to setup a post images from WORDPRESS feed, but the image URL wasn't included in the feed, but was on the wordpress page.
This is how to do it
goto POST IMAGES FROM RSS FEED
Click ASSIGN META
Click CUSTOM
Now we can leave
ITEM
TITLE
etc as the default, but you may want to change the description, your call.
But we will need to look at
IMAGE URL
The WORDPRESS FEED uses the element <LINK> to specify the page URL.
So we want to go to that URL then scrape the image URL
so we enter
Code: Select all
<link>[scrapexpath=xpathtocontent]
GOTO the URL noted by the <link>
Then scrape the page and use xpath to pull the image URL
e.g.
Code: Select all
<link>[scrapexpath=//meta[@property='og:image']/@content]
Code: Select all
//meta[@property='og:image']/@content
Note we can still use the Rss element attribute notation if required (perhaps the URL link is not the element text but an attribute
Code: Select all
<link[@href]>[scrapexpath=//meta[@property='og:image']/@content]