Filtering scraping via parameters?

Post Reply
chupakabra
Posts: 93
Joined: Sun Feb 09, 2020 3:29 pm

Filtering scraping via parameters?

Post by chupakabra » Sat Jun 27, 2020 5:01 pm

Hey Martin,

I was thinking about it since a week and thought to contact you regarding this. Actually, some folks I work with are interested in specific data of a niche. For example, they want ONLY the contact details of users. So I scrape all the details whether it is verified, anonymous profile pic, profile pic url etc etc. Recently I asked you if you can tweak the software to get HD profile pic and you did that: https://rootjazz.com/forum/viewtopic.ph ... 290#p58290 Thanks again!

It is awesome to get all the detail but what if I ONLY want to scrpae URL, ID, and EMAILS only. Or only Phone number, cpuntry codes, and usernames. Or only profile pics and nothing else?

^ It is a simple question which I think would require simple solution correct? Can you have something in the option on Scrape User area where you ask to "choose what you want to scrape" and have a button to select all if you want whole data or lets user select from list of check boxes for these things (In my opinion, you can keep the fullname, username and URL checked by default so that newbies know that it is important for them and note on the top saying these 3 are checked to help you understand the accounts feel free to uncheck them as per your requirements!) ?

pk username full_name is_private profile_pic_url hd_profile_pic_url is_verified has_anonymous_profile_picture media_count geo_media_count follower_count following_count biography external_url usertags_count is_favorite is_business public_email public_phone_number public_phone_country_code contact_phone_number city_id city_name zip address_street direct_messaging category business_contact_method Id ProfileUrl

Beneifts for having this:
1. It would really really save my time enormously to take data EXACTLY what I need instead of ALL data.
2. If I want to scrape all data for my analysis I will do it but since I am an advanced user of Instadub, I dont think it is necessary to extract city, bio, DM, verified etc etc. when I only want 1 or 2 or maybe 3 things somethings?

Custom Search is already there you know, can we also add scraping filter option to choose what field(s) they want to scrape?

This will be extremely helpful Martin. Honest to God, I will publish a blueprint of 0 to 1000 followers a week guide using Instadub on BHW in a month after I can scrape things more faster than before with exactly what I need. Also when I have sorted this out: https://rootjazz.com/forum/viewtopic.php?f=28&t=9338

Let me know Martin. Looking forward :)

User avatar
martin@rootjazz
Site Admin
Posts: 24707
Joined: Fri Jan 25, 2013 10:06 pm
Location: The Funk
Contact:

Re: Filtering scraping via parameters?

Post by martin@rootjazz » Mon Jun 29, 2020 3:15 pm

chupakabra wrote:
Sat Jun 27, 2020 5:01 pm

It is awesome to get all the detail but what if I ONLY want to scrpae URL, ID, and EMAILS only. Or only Phone number, cpuntry codes, and usernames. Or only profile pics and nothing else?
Then scrape it all,
open in excel
delete the columns you don't want
Re-save
:)

^ It is a simple question which I think would require simple solution correct?
Very simple for me, you do it :)
Can you have something in the option on Scrape User area where you ask to "choose what you want to scrape"
I'll be honest, no. To code something like that is going to take a couple of hours, whereas the alternative to just delete columns in excel is seconds. I'll note your interest, but at this time it is just on the "suggested features list" with no plans to implement
Beneifts for having this:
1. It would really really save my time enormously to take data EXACTLY what I need instead of ALL data.
Another quicker solution is to write a little script, load in CSV, remove unwanted columns, if you know a little programming / scripting is straight forward. Much quicker than a custom GUI solution

2. If I want to scrape all data for my analysis I will do it but since I am an advanced user of Instadub, I dont think it is necessary to extract city, bio, DM, verified etc etc. when I only want 1 or 2 or maybe 3 things somethings?
There is no scraping benefit, as all the data is present.
Custom Search is already there you know, can we also add scraping filter option to choose what field(s) they want to scrape?
As above: Your suggestion has been noted and added to the feature suggestions list. Depending on the demand for the change / feature will alter how quickly it gets implemented.





Regards,
Martin

chupakabra
Posts: 93
Joined: Sun Feb 09, 2020 3:29 pm

Re: Filtering scraping via parameters?

Post by chupakabra » Mon Jun 29, 2020 10:12 pm

Thanks for considering it into your suggestion list.

Thats what I do - I delete the coloums I don't want. I was talking about saving time here. For example, if I onyl want emails an dphone numbers i'll only check usernames, phonen umbers and email address...don't you think that will scrape faster?

User avatar
martin@rootjazz
Site Admin
Posts: 24707
Joined: Fri Jan 25, 2013 10:06 pm
Location: The Funk
Contact:

Re: Filtering scraping via parameters?

Post by martin@rootjazz » Tue Jun 30, 2020 9:03 pm

chupakabra wrote:
Mon Jun 29, 2020 10:12 pm
don't you think that will scrape faster?
No it won't. The same scrape is made to get all the data or some. There is no difference what so ever in speed / efficiency. The exact same number and size of requests will be made.

chupakabra
Posts: 93
Joined: Sun Feb 09, 2020 3:29 pm

Re: Filtering scraping via parameters?

Post by chupakabra » Wed Jul 01, 2020 9:48 pm

I see. But hasn't been the scraping speed affected since adding HD images into it? I feel it has become slow.

Is there any way you can increase speed in any way?

User avatar
martin@rootjazz
Site Admin
Posts: 24707
Joined: Fri Jan 25, 2013 10:06 pm
Location: The Funk
Contact:

Re: Filtering scraping via parameters?

Post by martin@rootjazz » Wed Jul 01, 2020 10:50 pm

chupakabra wrote:
Wed Jul 01, 2020 9:48 pm
I see. But hasn't been the scraping speed affected since adding HD images into it? I feel it has become slow.
No, that data was always present, just the program was pulling the value from a property

Code: Select all

media_url
instead of

Code: Select all

hd_media.media_url
or properties to that effect. There is no additional scraping and no additional processing to access the data.

Is there any way you can increase speed in any way?
Depends why things are running slow for you, but it isn't due to hd_media_url or scraping the data itself.

How much slower are things? Is it substantial? Do you have actual data (logs) showing it is slower (this would be ideal if so) or is it just something you feel is running slower?

chupakabra
Posts: 93
Joined: Sun Feb 09, 2020 3:29 pm

Re: Filtering scraping via parameters?

Post by chupakabra » Thu Jul 02, 2020 12:07 am

How much slower are things? Is it substantial?
^ Substantial? Absolutely not. But a little bit. I am sure that speed reduced after 3.805 11/05/2020 update on https://rootjazz.com/instadub/updatetesting.html

Maybe I am wrong. But again, I used to get 10k data in 6-8 hours I guess but now it takes 10 hours after that update. Yes, it is now 1 hour and 1000 data scraping full csv details I am talking about.


Do you have actual data (logs) showing it is slower (this would be ideal if so) or is it just something you feel is running slower?
^ I dont have any logs to prove it. I feel it thats it.

User avatar
martin@rootjazz
Site Admin
Posts: 24707
Joined: Fri Jan 25, 2013 10:06 pm
Location: The Funk
Contact:

Re: Filtering scraping via parameters?

Post by martin@rootjazz » Thu Jul 02, 2020 1:30 pm

chupakabra wrote:
Thu Jul 02, 2020 12:07 am

Maybe I am wrong. But again, I used to get 10k data in 6-8 hours I guess but now it takes 10 hours after that update. Yes, it is now 1 hour and 1000 data scraping full csv details I am talking about.
Same search?
Unique search results?

With a filter, times will always vary because different results will match / fail the filter. If unique search used and same search, you will be ignoring already pulled results, so have to page through them, this takes time. If same search, then as above, just different % meet your filter.

chupakabra
Posts: 93
Joined: Sun Feb 09, 2020 3:29 pm

Re: Filtering scraping via parameters?

Post by chupakabra » Fri Jul 03, 2020 9:49 am

martin@rootjazz wrote:
Thu Jul 02, 2020 1:30 pm
chupakabra wrote:
Thu Jul 02, 2020 12:07 am

Maybe I am wrong. But again, I used to get 10k data in 6-8 hours I guess but now it takes 10 hours after that update. Yes, it is now 1 hour and 1000 data scraping full csv details I am talking about.
Same search?
Unique search results?

With a filter, times will always vary because different results will match / fail the filter. If unique search used and same search, you will be ignoring already pulled results, so have to page through them, this takes time. If same search, then as above, just different % meet your filter.
No I am not talking about the filter. I didnt said anything about it haha.

I was saying a simple hashtags scraping or follower scraping task is taking longer time than before that update I mentioned.

User avatar
martin@rootjazz
Site Admin
Posts: 24707
Joined: Fri Jan 25, 2013 10:06 pm
Location: The Funk
Contact:

Re: Filtering scraping via parameters?

Post by martin@rootjazz » Sat Jul 04, 2020 6:05 pm

chupakabra wrote:
Fri Jul 03, 2020 9:49 am
martin@rootjazz wrote:
Thu Jul 02, 2020 1:30 pm
chupakabra wrote:
Thu Jul 02, 2020 12:07 am

Maybe I am wrong. But again, I used to get 10k data in 6-8 hours I guess but now it takes 10 hours after that update. Yes, it is now 1 hour and 1000 data scraping full csv details I am talking about.
Same search?
Unique search results?

With a filter, times will always vary because different results will match / fail the filter. If unique search used and same search, you will be ignoring already pulled results, so have to page through them, this takes time. If same search, then as above, just different % meet your filter.
No I am not talking about the filter. I didnt said anything about it haha.

I was saying a simple hashtags scraping or follower scraping task is taking longer time than before that update I mentioned.
yes, I am giving you some reasons why perhaps it might take longer as there is no way the hd_profile_url is the cause.

But if you have logs of search actions before and actually data I can look at, there isn't much I can do

Post Reply