Follow action is pausing for bad URLs and when i tried redoing the process it still paused for the same bad URL.
Is there a way it could skip those bad URLs on next runs?
Follow action & Bad URLs
Follow action & Bad URLs
Last edited by Moses on Fri May 30, 2014 3:50 pm, edited 1 time in total.
- martin@rootjazz
- Site Admin
- Posts: 34712
- Joined: Fri Jan 25, 2013 10:06 pm
- Location: The Funk
- Contact:
Re: Follow action & Bad URLs
These are bad URLs, the program is pausing because it is unexpected. The URL has not been skipped due to previous processing, nor filtered out for whatever reason.
I don't think it should be skipped, for the safety of the account.
If you want quicker actions, you will need to filter the list via a 404 checker. If you do not have a 404 checker, I can add that to the features suggestion list
I don't think it should be skipped, for the safety of the account.
If you want quicker actions, you will need to filter the list via a 404 checker. If you do not have a 404 checker, I can add that to the features suggestion list
Re: Follow action & Bad URLs
Which 404 checker do you use?
I am using the free Scrapbox link checker and it looks like it will take all day or more to check my list that has over 116,000 blog URLs and growing.
I think it would be more easier if the app can save those bad URLs and not process them on next runs, OR when scrapping users in SCRAPE tab there should be an option to filter out bad URLs.
I am using the free Scrapbox link checker and it looks like it will take all day or more to check my list that has over 116,000 blog URLs and growing.
I think it would be more easier if the app can save those bad URLs and not process them on next runs, OR when scrapping users in SCRAPE tab there should be an option to filter out bad URLs.
- martin@rootjazz
- Site Admin
- Posts: 34712
- Joined: Fri Jan 25, 2013 10:06 pm
- Location: The Funk
- Contact:
Re: Follow action & Bad URLs
I don't, I checked a couple and they were 404sWhich 404 checker do you use?
all day to check 116 URLs isn't so bad.I am using the free Scrapbox link checker and it looks like it will take all day or more to check my list that has over 116,000 blog URLs and growing.
Your suggestion has been noted and added to the feature suggestions list. Depending on the demand for the change / feature will alter how quickly it gets implemented.I think it would be more easier if the app can save those bad URLs and not process them on next runs, OR when scrapping users in SCRAPE tab there should be an option to filter out bad URLs.
Re: Follow action & Bad URLs
No, i said 116,000 URLs.all day to check 116 URLs isn't so bad.
The Scrapbox software started checking the URLs this morning and now it's the afternoon and the current status is 16,860 / 116,744. So i'm guessing it will take the whole day to check all of them.
I scrapped these users in the SCRAPER tab and saved them to list, so i think the app didn't filter out the 404 URLs when scrapping them or perhaps the users deleted their accounts or were banned since the last time i scrapped them.
I believe anyone who uses a list of URLs instead of search terms when Following / Liking would have 404 issues, especially with a big list since users in that list will delete their accounts or get banned
- martin@rootjazz
- Site Admin
- Posts: 34712
- Joined: Fri Jan 25, 2013 10:06 pm
- Location: The Funk
- Contact:
Re: Follow action & Bad URLs
yes typo from me, obviously 116 checks isn't going to take all dayNo, i said 116,000 URLs.
The scrape just pulls what it finds, if tumblr is showing accounts on the page that do not exist, then they will be scraped.I scrapped these users in the SCRAPER tab and saved them to list, so i think the app didn't filter out the 404 URLs when scrapping them or perhaps the users deleted their accounts or were banned since the last time i scrapped them.
The problem is, if the routine auto verifies, it is going to take as long as scrape box. If scrapebox is taking 24 hours to verify your list, then TJ will take just as long. Whenever those checks are made, whether it is in one go or as required, that 24 hours of checks has to be madeI believe anyone who uses a list of URLs instead of search terms when Following / Liking would have 404 issues, especially with a big list since users in that list will delete their accounts or get banned
Re: Follow action & Bad URLs
What would happen if they were skipped?These are bad URLs, the program is pausing because it is unexpected. The URL has not been skipped due to previous processing, nor filtered out for whatever reason.
I don't think it should be skipped, for the safety of the account.
That's true, but i think if the bad urls were saved in each run then skipped / ignored on next runs it wouldn't take as much time and it would be like already processed urlsThe problem is, if the routine auto verifies, it is going to take as long as scrape box. If scrapebox is taking 24 hours to verify your list, then TJ will take just as long. Whenever those checks are made, whether it is in one go or as required, that 24 hours of checks has to be made
- martin@rootjazz
- Site Admin
- Posts: 34712
- Joined: Fri Jan 25, 2013 10:06 pm
- Location: The Funk
- Contact:
Re: Follow action & Bad URLs
I believe I added a check so that one an error occurs, if it was a 404 it is skipped, as that is a known error and can be skipped safely.Moses wrote:What would happen if they were skipped?These are bad URLs, the program is pausing because it is unexpected. The URL has not been skipped due to previous processing, nor filtered out for whatever reason.
I don't think it should be skipped, for the safety of the account.
That's true, but i think if the bad urls were saved in each run then skipped / ignored on next runs it wouldn't take as much time and it would be like already processed urlsThe problem is, if the routine auto verifies, it is going to take as long as scrape box. If scrapebox is taking 24 hours to verify your list, then TJ will take just as long. Whenever those checks are made, whether it is in one go or as required, that 24 hours of checks has to be made
As for maintaining lists of all processed items across all actions. I will add it to the features list, but the existing code is not made to store logs of thousands and thousands. Thus the whole system would need to be written for pooling to avoid loading large files direct to memory, which is no small update.
Re: Follow action & Bad URLs
I just ran some Follow / Like runs and noticed there were failures and drop outs in the beginning that were paused for instead of skipped like you said.I believe I added a check so that one an error occurs, if it was a 404 it is skipped, as that is a known error and can be skipped safely.
The failures were 404 errors of Follow & Drops Outs were of Like. When i checked the Drop Out urls they were all RSS pages (not sure if that's bad or not)
ID: 38320
- martin@rootjazz
- Site Admin
- Posts: 34712
- Joined: Fri Jan 25, 2013 10:06 pm
- Location: The Funk
- Contact:
Re: Follow action & Bad URLs
Ok, re-added.
Today is not my day...
Today is not my day...