Problem with scraping users from tag search

Discussions to do with Soundcloud Manager. Do not use for support, use the dedicated support forum for help requests
Partemp55
Posts: 5
Joined: Tue Nov 26, 2013 9:17 am

Problem with scraping users from tag search

Post by Partemp55 »

Hello,
I got soundcloud manager a couple of days ago (so many good features!). But I have a problem with scraping users via tracks in tag search, is it only me? No matter what search I enter (even nonexistent long words) the results are in the region of 3200 and irrelevant to the search words. I also tried v2 scraping but I get an "unhandled exception" error ("object reference not set to an instance of an object"). I tried a couple of other scrapes and they work well. I have the latest version and I tried from two pcs (win7 or xp), . What is the problem?

Thanks in advance for the help!
User avatar
martin@rootjazz
Site Admin
Posts: 34375
Joined: Fri Jan 25, 2013 10:06 pm
Location: The Funk
Contact:

Re: Problem with scraping users from tag search

Post by martin@rootjazz »

Let me check it out
What is the problem?
Sounds like I cocked something up tbh :(
User avatar
martin@rootjazz
Site Admin
Posts: 34375
Joined: Fri Jan 25, 2013 10:06 pm
Location: The Funk
Contact:

Re: Problem with scraping users from tag search

Post by martin@rootjazz »

Yes, was a bug. The code was using the text from the wrong text box. So most likely, it was doing an empty scrape each time.

Fixed now and will be in the update 1.315 which will go out in the next few hours to
http://soundcloudmanager.com/updatetesting.html
User avatar
martin@rootjazz
Site Admin
Posts: 34375
Joined: Fri Jan 25, 2013 10:06 pm
Location: The Funk
Contact:

Re: Problem with scraping users from tag search

Post by martin@rootjazz »

I also tried v2 scraping but I get an "unhandled exception" error
Was this also on the SCRAPING tab, if so I have removed it, it should have been removed a while back but I must have missed it. All search functionality is now V2 as Soundcloud killed classic search weeks ago
Partemp55
Posts: 5
Joined: Tue Nov 26, 2013 9:17 am

Re: Problem with scraping users from tag search

Post by Partemp55 »

Hi,
Yeah, it works! Thanks for the quick reply and the update.

Seems that tags with spaces (i.e. "punk rock" "heavy metal") are treated as separate words/tags, is there any symbol I can use to have them treated as one tag? (I tried +, /,   etc but they don't work.

Also it's weird, bands I know that exist (and have tracks with relevant tags) don't show up in the results. As I see some of my scrapes reach around 4000 results and never more, perhaps I can't find the bands because we hit on a maximum allowed number? (I tried a generic "pop" or "rock" tag for a test and the result is around 4000 again, I'm sure there are more). I wonder also, does soundcloud understand if we're searching/creating accounts/following from an application and not from a browser?

Thanks for your patience and help!
User avatar
martin@rootjazz
Site Admin
Posts: 34375
Joined: Fri Jan 25, 2013 10:06 pm
Location: The Funk
Contact:

Re: Problem with scraping users from tag search

Post by martin@rootjazz »

Not sure about tags with spaces in, I will need to check and get back to you

As for max results. Scrapings users from tracks may be less than the 8000 result limit set by soundcloud. As the program will just scrape TRACKS via tags, then extract the artist name, some tracks may be belong to the same artist. So on writing out, all duplicate artists in the list are removed.

Also, there is a limit on how many pages will be search in the SETTINGS TAB > PROXIES / SCRAPING > MAX PAGES TO SCRAPE FOR SEARCH. Try increasing this to get more results.

As for bands not showing, if you do a tag search on the soundcloud site, do these groups show up? It might just be soundcloud are not showing them. Or it could be more bugs.
I wonder also, does soundcloud understand if we're searching/creating accounts/following from an application and not from a browser?
No, SCM works as it if is a browser. Basically all your browser does is send text commands to the website, the website processes these commands and send you back the page source. Your browser then renders the page. So all SCM does is replicate the text commands sent to the site. There is no difference at all in the commands. The reason why SCM is quicker is it doesn't parse the returned code for images / stylesheets / JS files then request / download those (which is what takes all the time / bandwidth, not the actual page source)
Partemp55
Posts: 5
Joined: Tue Nov 26, 2013 9:17 am

Re: Problem with scraping users from tag search

Post by Partemp55 »

Hi,
Ah, very interesting about the browser, thanks. :)

It's not a problem with the max pages on the settings (I had searched with the default 0 and also 10000, the results were identical, as expected). And since there's a 8000 limit from soundcloud (didn't know that), I guess that's why bands are missing, that makes sense.

But the user search doesn't remove duplicates, I checked the txt files from both of my pcs and they contain duplicate users. So perhaps that's one bug (personally, I don't care, I can remove the users on open office).

About the low numbers, perhaps the issue is, is soundcloud manager searching strictly by tag, or by "track search" (as on the soundcloud site)? In the first case, we should have in theory 8000 results (since duplicates were included), and perhaps it's a bug? In the second case, it makes sense if there are fewer results. If you search "tango" track search brings the tracks tagged tango but also tracks containing the word elsewhere, so I guess that SM accepts the first but rejects the second, so fewer results, everything is working. But in this case, why reject the ones where tango appears on the title and not on the tag? It would be better if the program kept them on the txt file and if one wants, one can filter those out from the filter feature of sm anyway.

Thanks also for looking into the space issue!
User avatar
martin@rootjazz
Site Admin
Posts: 34375
Joined: Fri Jan 25, 2013 10:06 pm
Location: The Funk
Contact:

Re: Problem with scraping users from tag search

Post by martin@rootjazz »

Fixed: unique list of artists return

Max search pages = 0 means unlimited so that isn't the issue.

Tag search is a different search mechanism than the main search box
https://soundcloud.com/tags/tango

and it does return results with "tango" anywhere, wasn't aware of that I thought it was just a tag search. Or maybe it was and SC have changed it.
If you search "tango" track search brings the tracks tagged tango but also tracks containing the word elsewhere, so I guess that SM accepts the first but rejects the second, so fewer results, everything is working
No, the program will scrape anything that soundcloud returns from the tag search, it doesn't check the tags. Good idea, but does not explain the results number

Anyway, they do not include a number of results found, which makes it annoying to test if the program is working on not >:(


Am now on the space in tags issue so will test the above. Will get back to you in about 10 minutes....
User avatar
martin@rootjazz
Site Admin
Posts: 34375
Joined: Fri Jan 25, 2013 10:06 pm
Location: The Funk
Contact:

Re: Problem with scraping users from tag search

Post by martin@rootjazz »

Issue fixed with tag search not returning enough results.

Tag search with space was replacing the space with a - but should have encoded the space to a %20. So that should be fixed too

updated
http://soundcloudmanager.com/updatetesting.html

1.318
Post Reply