RSS Post Tweet Help
Posted: Mon Jun 29, 2015 7:30 pm
In order to setup custom RSS posting, you need to understand
XML
RSS
XPATH
Not a lot , but a little. When setting up your feed meta data in the program, you can see some standard values set, this should work for most feeds, but lets look at them to understand
So find your feed, and view the source, you will see it is probably structured
So we know, each item in the feed has the element
It may also be
but usually <item>
Then we have
All self explanatory. These can be used for the title / body / media link depending if you are posting a tweet / media link
Now images are a bit more complex, because not only do you want the above information, you also want the image URL. This is unlikely to be contained nicely in an element, so we need to scrape it from the page.
There are a few ways to do this
1) Using XPATH with an element (note you cannot use XPATH in an :encoded element
Another possibility is the value you want is an attribute of the RSS element, this can be accessed with the attribute code placed within brackets within the element specification
2) Using the token [srclike=XXX], this means, if the element contains an image tag with a src attribute, we can get it without worrying about a complex xpath statement
so
<description>
...blah html blah <img src="http://twitterdub.wordpress.com/images/ ... 7490u3.jpg"> blah html
</descripion>
so we can use the token along with a pattern match like
[srclike=twitterdub.wordpress.com/images]
so our meta value is
<description>[srclike=twitterdub.wordpress.com/images]
Note this token does work with :encoded elements if your image is within an encoded tag this works fine:
<content:encoded>[srclike=XXX]
Similar helper functions are
<item>[json=XXX]
This will find the JSON value within the text of the XML element <item>. Where JSON format is
"name":"value"
<item>[urllike=XXX]
Similar to srclike, this will find the first URL to that contains the pattern XXX, so you can specify
<item>[urllike=rootjazz.com/videos]
and it will return the first URL within the text of <item> tag that contains a URL that contains rootjazz.com/videos, i.e.
https://rootjazz.com/videos/a-video-file.mp4
Another way to get at the image, is to scrape a path that is linked from the RSS feed using the token:
[scrapexpath=xpathtocontent]
Someone wanted to setup a post images from WORDPRESS feed, but the image URL wasn't included in the feed, but was on the wordpress page.
This is how to do it
goto POST IMAGES FROM RSS FEED
Click ASSIGN META
Click CUSTOM
Now we can leave
ITEM
TITLE
etc as the default, but you may want to change the description, your call.
But we will need to look at
IMAGE URL
The WORDPRESS FEED uses the element <LINK> to specify the page URL.
So we want to go to that URL then scrape the image URL
so we enter
this means,
GOTO the URL noted by the <link>
Then scrape the page and use xpath to pull the image URL
e.g.
where your xpath to the image is
Note we can still use the Rss element attribute notation if required (perhaps the URL link is not the element text but an attribute
XML
RSS
XPATH
Not a lot , but a little. When setting up your feed meta data in the program, you can see some standard values set, this should work for most feeds, but lets look at them to understand
So find your feed, and view the source, you will see it is probably structured
Code: Select all
<item>
....details of item1
</item>
<item>
....details of item2
</item>
etc
Code: Select all
<item>
It may also be
Code: Select all
<entry>
Then we have
Code: Select all
<title>
<link>
<description>
Now images are a bit more complex, because not only do you want the above information, you also want the image URL. This is unlikely to be contained nicely in an element, so we need to scrape it from the page.
There are a few ways to do this
1) Using XPATH with an element (note you cannot use XPATH in an :encoded element
Code: Select all
<description/img[@id=''image_id_1379182']>
Code: Select all
<description[@src]>
so
<description>
...blah html blah <img src="http://twitterdub.wordpress.com/images/ ... 7490u3.jpg"> blah html
</descripion>
so we can use the token along with a pattern match like
[srclike=twitterdub.wordpress.com/images]
so our meta value is
<description>[srclike=twitterdub.wordpress.com/images]
Note this token does work with :encoded elements if your image is within an encoded tag this works fine:
<content:encoded>[srclike=XXX]
Similar helper functions are
<item>[json=XXX]
This will find the JSON value within the text of the XML element <item>. Where JSON format is
"name":"value"
<item>[urllike=XXX]
Similar to srclike, this will find the first URL to that contains the pattern XXX, so you can specify
<item>[urllike=rootjazz.com/videos]
and it will return the first URL within the text of <item> tag that contains a URL that contains rootjazz.com/videos, i.e.
https://rootjazz.com/videos/a-video-file.mp4
Another way to get at the image, is to scrape a path that is linked from the RSS feed using the token:
[scrapexpath=xpathtocontent]
Someone wanted to setup a post images from WORDPRESS feed, but the image URL wasn't included in the feed, but was on the wordpress page.
This is how to do it
goto POST IMAGES FROM RSS FEED
Click ASSIGN META
Click CUSTOM
Now we can leave
ITEM
TITLE
etc as the default, but you may want to change the description, your call.
But we will need to look at
IMAGE URL
The WORDPRESS FEED uses the element <LINK> to specify the page URL.
So we want to go to that URL then scrape the image URL
so we enter
Code: Select all
<link>[scrapexpath=xpathtocontent]
GOTO the URL noted by the <link>
Then scrape the page and use xpath to pull the image URL
e.g.
Code: Select all
<link>[scrapexpath=//meta[@property='og:image']/@content]
Code: Select all
//meta[@property='og:image']/@content
Note we can still use the Rss element attribute notation if required (perhaps the URL link is not the element text but an attribute
Code: Select all
<link[@href]>[scrapexpath=//meta[@property='og:image']/@content]