Email scrape function fails to produce name/email matches

Discussions to do with Soundcloud Manager. Do not use for support, use the dedicated support forum for help requests
Post Reply
navigator7477
Posts: 27
Joined: Fri Oct 25, 2013 3:31 am

Email scrape function fails to produce name/email matches

Post by navigator7477 »

I am scraping using a file with a List of Profile URLs. The expected result is to get a txt file with at least the following information: user name + email address posted for that user. Instead, my text results file contains tons of emails and names that are not found anywhere within the profiles of the users whose URLs are included in the list provided. Example (totally random - and the information here is public and doesn't belong to me or to anyone that I love or hate - administrator, feel free to edit/remove characters to make this info private):
One of of the Soundcloud URLs included in a profile search is http:[SLASH][SLASH]soundcloud.com[SLASH]patrik-sykorjak - who is some random fellow that I included in the list of users whose URLs I wanted to scrape for emails during this test. This fellow *doesn't* have an email address anywhere in his profile. But the resulting text file included this information about this user (wow):
http:[SLASH][SLASH]soundcloud.com[SLASH]patrik-sykorjak [TAB] Armin_van_Buuren [TAB] arminvanbuuren [TAB] supersonicxproductions[[AT]]gmail.com
Clearly, Armin_van_Buuren isn't following this young man. However, it turns out that this user is following van Buuren. OK. Now, what's the connection between either of them and the mysterious gmail.com address that was scraped from Patrik? None... except that I did a google-search for "supersonicxproductions[[AT]]gmail.com" and it turns out that this email is publicly listed in the profile of a SC user known as http:[SLASH][SLASH]soundcloud.com[SLASH]supersonic-9 which also happens to follow van Buuren. So what if they both follow van Buuren? Scraping doesn't mean finding email addresses of a cousin's friend's mother's co-worker. It means scraping a specific group of users and generating a list of names paired with email addresses if they have public email addresses in their profiles...... I am totally lost here. Opinions? Help? Anyone? - THANKS -
Last edited by navigator7477 on Wed Oct 30, 2013 7:01 pm, edited 1 time in total.
User avatar
martin@rootjazz
Site Admin
Posts: 34712
Joined: Fri Jan 25, 2013 10:06 pm
Location: The Funk
Contact:

Re: Email scrate function fails to produce name/email matche

Post by martin@rootjazz »

Something seems to have gone wrong with the email scrape routine.

I will test / confirm / fix today
User avatar
martin@rootjazz
Site Admin
Posts: 34712
Joined: Fri Jan 25, 2013 10:06 pm
Location: The Funk
Contact:

Re: Email scrate function fails to produce name/email matche

Post by martin@rootjazz »

Tested and all working correctly for me.

Can you send through the list you are using.


Regarding http://soundcloud.com/patrik-sykorjak, if you view source, then the page DOES contain the email address: supersonicxproductions@gmail.com

Code: Select all

<meta content="SuperSonic [Official]" itemprop="byArtist" />
<meta content="///For All Questions Contact-&#x000A;supersonicxproductions@gmail.com&#x000A;&#x000A;SuperSonic is a 5 member Producer/DJ group based out of Marist College in Poughkeepsie NY&#x000A;&#x000A;New and young SuperSonic plans to test the boundaries of the current EDM scene, and create some of the hardest, most unique, festival worthy tracks they can.&#x000A;&#x000A;With a common love and passion for music and alot of hard work and effort, SuperSonic hopes to fulfill their lifelong dream of turning a hobby into a career, and one day become a major act in the EDM industry &#x000A;&#x000A;We appreciate all support and hope you enjoy our music&#x000A;&#x000A;Thank You!&#x000A;&#x000A;&#x000A;&#x000A;&#x000A;&#x000A;&#x000A;&#x000A;" itemprop="description" /></div>
<span class="info"><span>3.</span>
The email scraper just scrapes the source code of the page, not only the bio area. This user seems to have just created sets out of other peoples tracks, so the scrape routines are finding this information. If you notice, soundcloud doesn't load this users /sounds page, but the /sets page. The sets page contains different information to the /sounds page, which is why things are slightly off.

I will have a think about how best to handle these instances.....
navigator7477
Posts: 27
Joined: Fri Oct 25, 2013 3:31 am

Re: Email scrate function fails to produce name/email matche

Post by navigator7477 »

I will email you the lists shortly... And definitely, there is no point scraping track information when scraping for users because people post random things of random popular artists all the time to look cool. In fact, many bot/fake accounts are set up exactly this way.
User avatar
martin@rootjazz
Site Admin
Posts: 34712
Joined: Fri Jan 25, 2013 10:06 pm
Location: The Funk
Contact:

Re: Email scrape function fails to produce name/email matche

Post by martin@rootjazz »

Post Reply