Scraping notes strange behaviour

Ask any support / help / issues / problem or question related to TumblingJazz
erichon
Posts: 128
Joined: Sun May 01, 2016 10:12 am

Scraping notes strange behaviour

Post by erichon »

log ID:76218

strange behaviour. Does not scrape like before but get a lot of invalid next url and a lot of "pausing for x sec"

Eric
User avatar
martin@rootjazz
Site Admin
Posts: 34360
Joined: Fri Jan 25, 2013 10:06 pm
Location: The Funk
Contact:

Re: Scraping notes strange behaviour

Post by martin@rootjazz »

your logs are all but empty (do you have logging turned off by any chance?)

Or if you can describe the search I can replicate there
User avatar
martin@rootjazz
Site Admin
Posts: 34360
Joined: Fri Jan 25, 2013 10:06 pm
Location: The Funk
Contact:

Re: Scraping notes strange behaviour

Post by martin@rootjazz »

I just tried scraping NOTES FROM RECENT POSTS from the scrape tab for the keyword "happy" and it worked correctly.


So perhaps you are doing something different, if can let me know exactly, cheers
erichon
Posts: 128
Joined: Sun May 01, 2016 10:12 am

Re: Scraping notes strange behaviour

Post by erichon »

in the log files ID:57760 the scrape files are the ones. Recent posts of + scrape notes of posts (A combination). It worked some versions before quite well, but now it takes hours to scrape 50000 notes, although the user I scrape from has lots of posts and lots of notes in eah post.
User avatar
martin@rootjazz
Site Admin
Posts: 34360
Joined: Fri Jan 25, 2013 10:06 pm
Location: The Funk
Contact:

Re: Scraping notes strange behaviour

Post by martin@rootjazz »

ok I see there are some pauses in the logs with no other logs around there, that is causing the delay, no idea what those pauses are doing or why they are there, I will run a test of your scrape now and try and figure it out.
User avatar
martin@rootjazz
Site Admin
Posts: 34360
Joined: Fri Jan 25, 2013 10:06 pm
Location: The Funk
Contact:

Re: Scraping notes strange behaviour

Post by martin@rootjazz »

ok found the issue, was a protection mechanism added to delay if filtering the results, but was incorrectly pausing when no filter applied

The next update will fix this. I shall let you know when it is ready.



Regards,
Martin
erichon
Posts: 128
Joined: Sun May 01, 2016 10:12 am

Re: Scraping notes strange behaviour

Post by erichon »

still the same:

Next page: https://www.tumblr.com/svc/tumblelog/xx ... 1393848524
Pull notes url: https://www.tumblr.com/svc/tumblelog/xx ... 1393848524
Total scrape: 556/50000
Next page rel:
* STOPPING: Invalid next url
Scraped: 556 from: http://xxxx.tumblr.com/post/1760533
Total scraped: 556
Results of search: Scrape Notes Of Post with: http://xxxx.tumblr.com/post/1760533
Handle results: 556 nextstep: 2/2
End of chain: Store results: 556
StoreResults: checking: 556
Pausing for 3secs
Paused for 3secs
Pausing for 3secs
Paused for 3secs
Pausing for 9secs
Paused for 9secs
Pausing for 2secs
User avatar
martin@rootjazz
Site Admin
Posts: 34360
Joined: Fri Jan 25, 2013 10:06 pm
Location: The Funk
Contact:

Re: Scraping notes strange behaviour

Post by martin@rootjazz »

let me check again, maybe I didn't build the installer correctly
User avatar
martin@rootjazz
Site Admin
Posts: 34360
Joined: Fri Jan 25, 2013 10:06 pm
Location: The Funk
Contact:

Re: Scraping notes strange behaviour

Post by martin@rootjazz »

my apologies, the fix wasn't pulled into the build. Will update now
Post Reply