resume scrapping

user6845
Posts: 77
Joined: Sat Sep 23, 2017 10:04 pm

resume scrapping

Post by user6845 » Wed Nov 07, 2018 10:43 am

hi

you asked me to open a ticket when i see a situation where the scrapping of account followers stopped, so you can see if you can implement a resume

log: 29009
version: 3.4.90

1. i tried to scrap /....isor/ and it stopped because my connection was interpreted
2. as you can see up to 200k it scraped fast, but after that the scrapping became very slow

thank you in advance

user6845
Posts: 77
Joined: Sat Sep 23, 2017 10:04 pm

Re: resume scrapping

Post by user6845 » Thu Nov 08, 2018 12:29 am

this was the original topic where you asked me to post once i see that a scrapping stoped - viewtopic.php?f=28&t=7106&sid=46e6017e9 ... 8&start=20

so you can check if you can implement a resume mechanism

User avatar
martin@rootjazz
Site Admin
Posts: 18126
Joined: Fri Jan 25, 2013 10:06 pm
Location: The Funk
Contact:

Re: resume scrapping

Post by martin@rootjazz » Fri Nov 09, 2018 7:01 pm

ok, so from your logs, the last ID that was processed was
QVFDbDg0QVUySHZXZE5hem9RZkxXbUdZekhUdUF1Q1hPdTlxRmdKRjNGMTYyTlF1eXRzRzdDSWplTVQ5TG5BZ1ZjR1ZKbWtNQUZGSXVBUVhOZDlKN0hTRw==
Just need a way getting that (or any other) from you into the program and starting from there.


I'll knock something up for you, but probably won't be *that* pretty, but should do the trick

User avatar
martin@rootjazz
Site Admin
Posts: 18126
Joined: Fri Jan 25, 2013 10:06 pm
Location: The Funk
Contact:

Re: resume scrapping

Post by martin@rootjazz » Fri Nov 09, 2018 7:42 pm

The next update will include this feature. I shall let you know when it is ready.



Regards,
Martin

User avatar
martin@rootjazz
Site Admin
Posts: 18126
Joined: Fri Jan 25, 2013 10:06 pm
Location: The Funk
Contact:

Re: resume scrapping

Post by martin@rootjazz » Fri Nov 09, 2018 9:04 pm

https://rootjazz.com/instadub/updatetesting.html

scraper tab now has a link allowing you to enter the max_id as required.

For multi step searches, the MAX_ID can apply to the first step only

user6845
Posts: 77
Joined: Sat Sep 23, 2017 10:04 pm

Re: resume scrapping

Post by user6845 » Sat Nov 10, 2018 4:47 pm

hi quick questions

1. max ID you mean , lets say it stoped to scrap at user 180250, then i should write in max ID 180250?
2. why does it start to scrap slower after 200,000 scraps?
3. when i set proxies and thread to 4, i can only scrap 1 account, why is that?
PS. what is the propose of thread?

thanks

User avatar
martin@rootjazz
Site Admin
Posts: 18126
Joined: Fri Jan 25, 2013 10:06 pm
Location: The Funk
Contact:

Re: resume scrapping

Post by martin@rootjazz » Mon Nov 12, 2018 3:00 pm

user6845 wrote:
Sat Nov 10, 2018 4:47 pm
hi quick questions

1. max ID you mean , lets say it stoped to scrap at user 180250, then i should write in max ID 180250?
No, you need to pull the ID from the logs, it is what ISG uses to know your place in the search results. Probably you need to view the log file

HELP >LOGS > VIEW

as I don't think real time logs include it

2. why does it start to scrap slower after 200,000 scraps?
no idea.

Maybe ISG slows down?

Check the logs file, the lines are time stamped, does this indicate which line is causing the issue

3. when i set proxies and thread to 4, i can only scrap 1 account, why is that?
you cannot scrape ISG with proxies, that is only for email scrape which uses the website. In order to scrape ISG you have to use the API, only accounts can use API.

user6845
Posts: 77
Joined: Sat Sep 23, 2017 10:04 pm

Re: resume scrapping

Post by user6845 » Tue Nov 13, 2018 1:02 am

1. what do you mean - you cannot scrape ISG with proxies, that is only for email scrape which uses the website. In order to scrape ISG you have to use the API, only accounts can use API. (what is ISG?)

in the scrapping screen i have added proxies
+ why can i still scrap just 1 file for its emails, and not several together?

2. here is an example of LOG, what shall i put inside MAXID

followers: total scraped: 367064
Scrape page: 1854
Existing: 367064: Scraped new: 197
Total results: 367075
Total results: 367100
Total results: 367125
Total results: 367150
Total results: 367175
Total results: 367200
Total results: 367225
Total results: 367250
followers: total scraped: 367261
Paused for 2secs
Scrape page: 1855
Request failed: (ulRtglxLf) https://i.instagram.com/api/v1/friendsh ... hOczFVSQ== -1

* ERROR: Friendship: Object reference not set to an instance of an object.
Paused for 2secs


3. as for it being slow after 200000, well there is no timestamp, it just becomes slower, i am sure.

User avatar
martin@rootjazz
Site Admin
Posts: 18126
Joined: Fri Jan 25, 2013 10:06 pm
Location: The Funk
Contact:

Re: resume scrapping

Post by martin@rootjazz » Tue Nov 13, 2018 7:42 pm

user6845 wrote:
Tue Nov 13, 2018 1:02 am
1. what do you mean - you cannot scrape ISG with proxies, that is only for email scrape which uses the website. In order to scrape ISG you have to use the API, only accounts can use API. (what is ISG?)
ISG = Instagram

You cannot access ISG data with just proxies alone, you have to have a logged in account to scrape ISG data. So proxies ON THEIR OWN cannot scrape ISG. However, they CAN scrape teh website, so the email scrape module allows threading as you don't need accounts to do it. Just thread through the provided file.

ISG search does not work in a way to allow this


in the scrapping screen i have added proxies
+ why can i still scrap just 1 file for its emails, and not several together?
Sorry I don't understand your question.
If you have added proxies, as I have said, that only works for EMAIL SCRAPE actions.
But as to why you scrape one file? one action scrapes to one file.

What do you mean "several together". Several what? It really helps me to help you if you can provide as much information as possible.
Please, read through this post and submit your issue accordingly. Not everything will be relevant to your issue, but have a read as it gives an idea of the information you can provide to help me help you as quickly as possible without delays in having to request additional information :)

https://rootjazz.com/forum/viewtopic.php?f=23&t=1634






2. here is an example of LOG, what shall i put inside MAXID

followers: total scraped: 367064
Scrape page: 1854
Existing: 367064: Scraped new: 197
Total results: 367075
Total results: 367100
...
Request failed: (ulRtglxLf) https://i.instagram.com/api/v1/friendsh ... 91075483f7&
max_id=QVFCV0Q4SUdWVXd3NWoyc09qWkkxYUFhcmVtR2h1cGJNM2Jpak9RZjdxWXY5eWw5UUJfY1gzajVVSEVGaTAzSWtjVnlUVDdod2dHNGFXQ25oa3hOczFVSQ==

:roll:
the line that says max_id= <this_is_the_max_id>

:D
3. as for it being slow after 200000, well there is no timestamp, it just becomes slower, i am sure.
No, you HAVE to use the log file, not the in app real time logs. You can find the LOG FILES via HELP > LOGS VIEW

user6845
Posts: 77
Joined: Sat Sep 23, 2017 10:04 pm

Re: resume scrapping

Post by user6845 » Tue Nov 13, 2018 7:53 pm

according to the log extract i sent in my previous message on this forum thread.

what i shall i fill in the max_id in order for it to start from where it left off?

Post Reply