Follow action & Bad URLs

Ask any support / help / issues / problem or question related to TumblingJazz
Moses
Posts: 289
Joined: Fri Feb 28, 2014 11:59 pm

Follow action & Bad URLs

Post by Moses »

Follow action is pausing for bad URLs and when i tried redoing the process it still paused for the same bad URL.

Is there a way it could skip those bad URLs on next runs?
Last edited by Moses on Fri May 30, 2014 3:50 pm, edited 1 time in total.
User avatar
martin@rootjazz
Site Admin
Posts: 34712
Joined: Fri Jan 25, 2013 10:06 pm
Location: The Funk
Contact:

Re: Follow action & Bad URLs

Post by martin@rootjazz »

These are bad URLs, the program is pausing because it is unexpected. The URL has not been skipped due to previous processing, nor filtered out for whatever reason.

I don't think it should be skipped, for the safety of the account.

If you want quicker actions, you will need to filter the list via a 404 checker. If you do not have a 404 checker, I can add that to the features suggestion list
Moses
Posts: 289
Joined: Fri Feb 28, 2014 11:59 pm

Re: Follow action & Bad URLs

Post by Moses »

Which 404 checker do you use?

I am using the free Scrapbox link checker and it looks like it will take all day or more to check my list that has over 116,000 blog URLs and growing.

I think it would be more easier if the app can save those bad URLs and not process them on next runs, OR when scrapping users in SCRAPE tab there should be an option to filter out bad URLs.
User avatar
martin@rootjazz
Site Admin
Posts: 34712
Joined: Fri Jan 25, 2013 10:06 pm
Location: The Funk
Contact:

Re: Follow action & Bad URLs

Post by martin@rootjazz »

Which 404 checker do you use?
I don't, I checked a couple and they were 404s
I am using the free Scrapbox link checker and it looks like it will take all day or more to check my list that has over 116,000 blog URLs and growing.
all day to check 116 URLs isn't so bad.
I think it would be more easier if the app can save those bad URLs and not process them on next runs, OR when scrapping users in SCRAPE tab there should be an option to filter out bad URLs.
Your suggestion has been noted and added to the feature suggestions list. Depending on the demand for the change / feature will alter how quickly it gets implemented.
Moses
Posts: 289
Joined: Fri Feb 28, 2014 11:59 pm

Re: Follow action & Bad URLs

Post by Moses »

all day to check 116 URLs isn't so bad.
No, i said 116,000 URLs.

The Scrapbox software started checking the URLs this morning and now it's the afternoon and the current status is 16,860 / 116,744. So i'm guessing it will take the whole day to check all of them.

I scrapped these users in the SCRAPER tab and saved them to list, so i think the app didn't filter out the 404 URLs when scrapping them or perhaps the users deleted their accounts or were banned since the last time i scrapped them.

I believe anyone who uses a list of URLs instead of search terms when Following / Liking would have 404 issues, especially with a big list since users in that list will delete their accounts or get banned
User avatar
martin@rootjazz
Site Admin
Posts: 34712
Joined: Fri Jan 25, 2013 10:06 pm
Location: The Funk
Contact:

Re: Follow action & Bad URLs

Post by martin@rootjazz »

No, i said 116,000 URLs.
yes typo from me, obviously 116 checks isn't going to take all day :)
I scrapped these users in the SCRAPER tab and saved them to list, so i think the app didn't filter out the 404 URLs when scrapping them or perhaps the users deleted their accounts or were banned since the last time i scrapped them.
The scrape just pulls what it finds, if tumblr is showing accounts on the page that do not exist, then they will be scraped.
I believe anyone who uses a list of URLs instead of search terms when Following / Liking would have 404 issues, especially with a big list since users in that list will delete their accounts or get banned
The problem is, if the routine auto verifies, it is going to take as long as scrape box. If scrapebox is taking 24 hours to verify your list, then TJ will take just as long. Whenever those checks are made, whether it is in one go or as required, that 24 hours of checks has to be made
Moses
Posts: 289
Joined: Fri Feb 28, 2014 11:59 pm

Re: Follow action & Bad URLs

Post by Moses »

These are bad URLs, the program is pausing because it is unexpected. The URL has not been skipped due to previous processing, nor filtered out for whatever reason.

I don't think it should be skipped, for the safety of the account.
What would happen if they were skipped?
The problem is, if the routine auto verifies, it is going to take as long as scrape box. If scrapebox is taking 24 hours to verify your list, then TJ will take just as long. Whenever those checks are made, whether it is in one go or as required, that 24 hours of checks has to be made
That's true, but i think if the bad urls were saved in each run then skipped / ignored on next runs it wouldn't take as much time and it would be like already processed urls
User avatar
martin@rootjazz
Site Admin
Posts: 34712
Joined: Fri Jan 25, 2013 10:06 pm
Location: The Funk
Contact:

Re: Follow action & Bad URLs

Post by martin@rootjazz »

Moses wrote:
These are bad URLs, the program is pausing because it is unexpected. The URL has not been skipped due to previous processing, nor filtered out for whatever reason.

I don't think it should be skipped, for the safety of the account.
What would happen if they were skipped?
The problem is, if the routine auto verifies, it is going to take as long as scrape box. If scrapebox is taking 24 hours to verify your list, then TJ will take just as long. Whenever those checks are made, whether it is in one go or as required, that 24 hours of checks has to be made
That's true, but i think if the bad urls were saved in each run then skipped / ignored on next runs it wouldn't take as much time and it would be like already processed urls
I believe I added a check so that one an error occurs, if it was a 404 it is skipped, as that is a known error and can be skipped safely.

As for maintaining lists of all processed items across all actions. I will add it to the features list, but the existing code is not made to store logs of thousands and thousands. Thus the whole system would need to be written for pooling to avoid loading large files direct to memory, which is no small update.
Moses
Posts: 289
Joined: Fri Feb 28, 2014 11:59 pm

Re: Follow action & Bad URLs

Post by Moses »

I believe I added a check so that one an error occurs, if it was a 404 it is skipped, as that is a known error and can be skipped safely.
I just ran some Follow / Like runs and noticed there were failures and drop outs in the beginning that were paused for instead of skipped like you said.

The failures were 404 errors of Follow & Drops Outs were of Like. When i checked the Drop Out urls they were all RSS pages (not sure if that's bad or not)

ID: 38320
User avatar
martin@rootjazz
Site Admin
Posts: 34712
Joined: Fri Jan 25, 2013 10:06 pm
Location: The Funk
Contact:

Re: Follow action & Bad URLs

Post by martin@rootjazz »

Ok, re-added.


Today is not my day...
Post Reply