If we want to scrape like 1000 Soundcloud profile URLs, should we get proxies for that?
Keep in mind we don't use SCM for any kind of activities that involve our Soundcloud account, like commenting and posting and multiple accounts etc. We only use SCM to extract info from Soundcloud.
a 1000 is fine, it isn't that many. Regarding how many is too many, previous (1-2 years ago) I scraped about 400k before I got an IP ban. So that *WAS* too many. What the limit is now, I don't know, but I imagine less. At a *GUESS* and this is a guess, I would say 50k is ok. That doesn't mean you can do 50k everyday. Just I think a one off of 50k should be ok.
However, if you got proxies, say 100, then it splits up the requests. so if you do 1000 profiles, with a 100 proxies, each proxy does 10.
If we want to scrape like 1000 Soundcloud profile URLs, should we get proxies for that?
Keep in mind we don't use SCM for any kind of activities that involve our Soundcloud account, like commenting and posting and multiple accounts etc. We only use SCM to extract info from Soundcloud.
a 1000 is fine, it isn't that many. Regarding how many is too many, previous (1-2 years ago) I scraped about 400k before I got an IP ban. So that *WAS* too many. What the limit is now, I don't know, but I imagine less. At a *GUESS* and this is a guess, I would say 50k is ok. That doesn't mean you can do 50k everyday. Just I think a one off of 50k should be ok.
However, if you got proxies, say 100, then it splits up the requests. so if you do 1000 profiles, with a 100 proxies, each proxy does 10.
If you load into excel then you can order on any column you want.
But if you have a row with all the information you need, the order does not matter. you just need to extract via columns.
Or do you mean you are trying to work across multiple files? If so, again a bit of excel should allow you to merge two files if you specify a key merging the data from both files into one on the specified key so you are working with just one file so the order is not important
Right now, the Soundcloud bio has distorted the URL rows because some of the bios have texts with a couple of lines. So when I load into Excel, the URL column would sometimes load a few bios in it. Is it possible to change so that the bio will be in one column?