I am scraping using a file with a List of Profile URLs. The expected result is to get a txt file with at least the following information: user name + email address posted for that user. Instead, my text results file contains tons of emails and names that are not found anywhere within the profiles of the users whose URLs are included in the list provided. Example (totally random - and the information here is public and doesn't belong to me or to anyone that I love or hate - administrator, feel free to edit/remove characters to make this info private):
One of of the Soundcloud URLs included in a profile search is http:[SLASH][SLASH]soundcloud.com[SLASH]patrik-sykorjak - who is some random fellow that I included in the list of users whose URLs I wanted to scrape for emails during this test. This fellow *doesn't* have an email address anywhere in his profile. But the resulting text file included this information about this user (wow):
http:[SLASH][SLASH]soundcloud.com[SLASH]patrik-sykorjak [TAB] Armin_van_Buuren [TAB] arminvanbuuren [TAB] supersonicxproductions[[AT]]gmail.com
Clearly, Armin_van_Buuren isn't following this young man. However, it turns out that this user is following van Buuren. OK. Now, what's the connection between either of them and the mysterious gmail.com address that was scraped from Patrik? None... except that I did a google-search for "supersonicxproductions[[AT]]gmail.com" and it turns out that this email is publicly listed in the profile of a SC user known as http:[SLASH][SLASH]soundcloud.com[SLASH]supersonic-9 which also happens to follow van Buuren. So what if they both follow van Buuren? Scraping doesn't mean finding email addresses of a cousin's friend's mother's co-worker. It means scraping a specific group of users and generating a list of names paired with email addresses if they have public email addresses in their profiles...... I am totally lost here. Opinions? Help? Anyone? - THANKS -
Email scrape function fails to produce name/email matches
-
- Posts: 27
- Joined: Fri Oct 25, 2013 3:31 am
Email scrape function fails to produce name/email matches
Last edited by navigator7477 on Wed Oct 30, 2013 7:01 pm, edited 1 time in total.
- martin@rootjazz
- Site Admin
- Posts: 34712
- Joined: Fri Jan 25, 2013 10:06 pm
- Location: The Funk
- Contact:
Re: Email scrate function fails to produce name/email matche
Something seems to have gone wrong with the email scrape routine.
I will test / confirm / fix today
I will test / confirm / fix today
- martin@rootjazz
- Site Admin
- Posts: 34712
- Joined: Fri Jan 25, 2013 10:06 pm
- Location: The Funk
- Contact:
Re: Email scrate function fails to produce name/email matche
Tested and all working correctly for me.
Can you send through the list you are using.
Regarding http://soundcloud.com/patrik-sykorjak, if you view source, then the page DOES contain the email address: supersonicxproductions@gmail.com
The email scraper just scrapes the source code of the page, not only the bio area. This user seems to have just created sets out of other peoples tracks, so the scrape routines are finding this information. If you notice, soundcloud doesn't load this users /sounds page, but the /sets page. The sets page contains different information to the /sounds page, which is why things are slightly off.
I will have a think about how best to handle these instances.....
Can you send through the list you are using.
Regarding http://soundcloud.com/patrik-sykorjak, if you view source, then the page DOES contain the email address: supersonicxproductions@gmail.com
Code: Select all
<meta content="SuperSonic [Official]" itemprop="byArtist" />
<meta content="///For All Questions Contact-
supersonicxproductions@gmail.com

SuperSonic is a 5 member Producer/DJ group based out of Marist College in Poughkeepsie NY

New and young SuperSonic plans to test the boundaries of the current EDM scene, and create some of the hardest, most unique, festival worthy tracks they can.

With a common love and passion for music and alot of hard work and effort, SuperSonic hopes to fulfill their lifelong dream of turning a hobby into a career, and one day become a major act in the EDM industry 

We appreciate all support and hope you enjoy our music

Thank You!







" itemprop="description" /></div>
<span class="info"><span>3.</span>
I will have a think about how best to handle these instances.....
-
- Posts: 27
- Joined: Fri Oct 25, 2013 3:31 am
Re: Email scrate function fails to produce name/email matche
I will email you the lists shortly... And definitely, there is no point scraping track information when scraping for users because people post random things of random popular artists all the time to look cool. In fact, many bot/fake accounts are set up exactly this way.
- martin@rootjazz
- Site Admin
- Posts: 34712
- Joined: Fri Jan 25, 2013 10:06 pm
- Location: The Funk
- Contact: