scrape bios and songs?

Discussions to do with Soundcloud Manager. Do not use for support, use the dedicated support forum for help requests
User avatar
martin@rootjazz
Site Admin
Posts: 34674
Joined: Fri Jan 25, 2013 10:06 pm
Location: The Funk
Contact:

Re: scrape bios and songs?

Post by martin@rootjazz »

dariush90025 wrote:
Thanks for the update. Works fine now.

If we want to scrape like 1000 Soundcloud profile URLs, should we get proxies for that?
Keep in mind we don't use SCM for any kind of activities that involve our Soundcloud account, like commenting and posting and multiple accounts etc. We only use SCM to extract info from Soundcloud.

a 1000 is fine, it isn't that many. Regarding how many is too many, previous (1-2 years ago) I scraped about 400k before I got an IP ban. So that *WAS* too many. What the limit is now, I don't know, but I imagine less. At a *GUESS* and this is a guess, I would say 50k is ok. That doesn't mean you can do 50k everyday. Just I think a one off of 50k should be ok.

However, if you got proxies, say 100, then it splits up the requests. so if you do 1000 profiles, with a 100 proxies, each proxy does 10.
User avatar
martin@rootjazz
Site Admin
Posts: 34674
Joined: Fri Jan 25, 2013 10:06 pm
Location: The Funk
Contact:

Re: scrape bios and songs?

Post by martin@rootjazz »

Due to parallelism, the order is no guaranteed.

Why is order important? What benefits does it give you?
dariush90025
Posts: 56
Joined: Fri Oct 02, 2015 4:49 pm

Re: scrape bios and songs?

Post by dariush90025 »

martin@rootjazz wrote:
dariush90025 wrote:
Thanks for the update. Works fine now.

If we want to scrape like 1000 Soundcloud profile URLs, should we get proxies for that?
Keep in mind we don't use SCM for any kind of activities that involve our Soundcloud account, like commenting and posting and multiple accounts etc. We only use SCM to extract info from Soundcloud.

a 1000 is fine, it isn't that many. Regarding how many is too many, previous (1-2 years ago) I scraped about 400k before I got an IP ban. So that *WAS* too many. What the limit is now, I don't know, but I imagine less. At a *GUESS* and this is a guess, I would say 50k is ok. That doesn't mean you can do 50k everyday. Just I think a one off of 50k should be ok.

However, if you got proxies, say 100, then it splits up the requests. so if you do 1000 profiles, with a 100 proxies, each proxy does 10.

Thanks for the info. Sounds good to me.
dariush90025
Posts: 56
Joined: Fri Oct 02, 2015 4:49 pm

Re: scrape bios and songs?

Post by dariush90025 »

martin@rootjazz wrote:Due to parallelism, the order is no guaranteed.

Why is order important? What benefits does it give you?

We want to sort each Soundcloud URL with the profile names, one of his/her track names, bio and Email.

So when the order is off, it sure will take a lot of time to manually sort out thousands of URLs with the other infos.

Please advice me on that.
User avatar
martin@rootjazz
Site Admin
Posts: 34674
Joined: Fri Jan 25, 2013 10:06 pm
Location: The Funk
Contact:

Re: scrape bios and songs?

Post by martin@rootjazz »

If you load into excel then you can order on any column you want.

But if you have a row with all the information you need, the order does not matter. you just need to extract via columns.

Or do you mean you are trying to work across multiple files? If so, again a bit of excel should allow you to merge two files if you specify a key merging the data from both files into one on the specified key so you are working with just one file so the order is not important
dariush90025
Posts: 56
Joined: Fri Oct 02, 2015 4:49 pm

Re: scrape bios and songs?

Post by dariush90025 »

Right now, the Soundcloud bio has distorted the URL rows because some of the bios have texts with a couple of lines. So when I load into Excel, the URL column would sometimes load a few bios in it. Is it possible to change so that the bio will be in one column?

Thank you.

Image
User avatar
martin@rootjazz
Site Admin
Posts: 34674
Joined: Fri Jan 25, 2013 10:06 pm
Location: The Funk
Contact:

Re: scrape bios and songs?

Post by martin@rootjazz »

can you send me that file so I can see the exact issue and confirm the fix

Code: Select all

support[at]soundcloudmanager[dot]com
User avatar
martin@rootjazz
Site Admin
Posts: 34674
Joined: Fri Jan 25, 2013 10:06 pm
Location: The Funk
Contact:

Re: scrape bios and songs?

Post by martin@rootjazz »

Nevermind, I think I have it fixed. Will test and get a new update ready soon
dariush90025
Posts: 56
Joined: Fri Oct 02, 2015 4:49 pm

Re: scrape bios and songs?

Post by dariush90025 »

Great! Looking forward to that update.
User avatar
martin@rootjazz
Site Admin
Posts: 34674
Joined: Fri Jan 25, 2013 10:06 pm
Location: The Funk
Contact:

Re: scrape bios and songs?

Post by martin@rootjazz »

sorry, seems like I forgot to post the link here, I was sure I did though....

https://soundcloudmanager.com/updatetesting.html

update was ready a few days ago
Post Reply