Do I need to scrape all lists first?

Ask any support / help / issues / problem or question related to Soundcloud Manager
Post Reply
bananarama
Posts: 60
Joined: Wed Jun 17, 2015 1:33 pm

Do I need to scrape all lists first?

Post by bananarama »

Seems like I'm having a recurring issue where no matter how lax I am about filtering, or how large of a list it should be scraping, bot only tries some much smaller number to follow, then stops well short of goal. For instance, got this back after asking it to follow all within a large group (tens of thousands of members), and filtering only for users active in past 3 days (and play track first).

Processed: 28
NumSuccess: 28
NumFiltered: 344
NumFailed: 0

There is no "daily limit reached" or other warning.

Correct me if I'm wrong, but it's saying 344 weren't active in past few days, and 28 were, right? So why did it stop? Daily stats confirm this is happening again and again on all accts. It was only a fifth of the way to it's goal.

Do I just need to scrape the entire lists first and feed it list rather than url? If so, are there instructions somewhere on how to do this? Or, is there something else I can check as to likely culprit?
User avatar
martin@rootjazz
Site Admin
Posts: 34712
Joined: Fri Jan 25, 2013 10:06 pm
Location: The Funk
Contact:

Re: Do I need to scrape all lists first?

Post by martin@rootjazz »

The program works in two stages

1) scrape a list

2) process that list


It is on step 2 that the filtering takes place.


IF you only want to follow 10 users, there is no point for the program to scrape 8000 users from the group. So it scrapes 10. Then passes them to actioning routine.



So if you want to do filters.

1) scrape a list on the SCRAPE tab
2) feed that list into the action module.

So now you are checking a list of 8000 to see if you can find 10 items that pass your filter and be actioned on



An update is coming which will improve the searching / filtering / actioning flow, but it is a few weeks off
bananarama
Posts: 60
Joined: Wed Jun 17, 2015 1:33 pm

Re: Do I need to scrape all lists first?

Post by bananarama »

OK. That's pretty much what I was guessing. Will dig into that today and see if any issues pop up. Thx.
bananarama
Posts: 60
Joined: Wed Jun 17, 2015 1:33 pm

Re: Do I need to scrape all lists first?

Post by bananarama »

Ok, that's working better, but a few things popping up so far.

First, it seems to be scraping much lower numbers than the group indicates. For instance, scrape all users of group with 17k, and returns 8k results. Is this just a matter of dead users or something like that on SC end, or is there some issue about maximum number of scrapes appearing in one list or something else to do with the scraping/database side?

Second, EDIT.... just saw that I missed it copies filepath to clipboard automatically when scraping in ways that don't create processor record. Looks like you were already aware of that potential issue and have it covered.

EDIT AGAIN... nope. Some do, some don't. I just scraped users from a group, and it doesn't create processor record, so I can't copy filepath from there... and popup itself doesn't give full filepath or allow me to copy text. I can't even navigate the filepath manually as there is not "AppData" folder to be found within the user folder. I don't know if it's a hidden folder or something like that, but I do know I can't see a way to copy the filepath so I can paste it into follow action. Am I missing this? One of the other scrapes copied to clipboard for me, but several other types of scrapes do not, and I can't use any of the scraped lists since I can't copy them.

Third, When following a list, I don't see any way to just follow the most recent posts. For instance, I'm looking for currently active mashup artists. I can set USER filter, and uploaded a track within X days, but that's a roundabout and inaccurate way to go. Even if I scrape only the users in a group who contributed a track, and filter for uploaded within x days, they're not connected. In other words, they could have done a mashup 5 years ago, never done one since, but uploaded some other track to some other group within x days, and filter would say they're good to follow.

What I want to do is much more straight forward. I want people who posted recently to this group. Not seeing how to filter for that... posted recently to THIS group. Am I missing it? (Really just want to process sorted by most recent post)
User avatar
martin@rootjazz
Site Admin
Posts: 34712
Joined: Fri Jan 25, 2013 10:06 pm
Location: The Funk
Contact:

Re: Do I need to scrape all lists first?

Post by martin@rootjazz »

scrape all users of group with 17k, and returns 8k results.
These are the max limits.

If you go the group and scroll for about an hour, after 8000 results, soundcloud WILL NOT show you any more

This is true FOR ALl soundcloud searches


Second, EDIT.... just saw that I missed it copies filepath to clipboard automatically when scraping in ways that don't create processor record. Looks like you were already aware of that potential issue and have it covered.
:-)

EDIT AGAIN... nope. Some do, some don't.

yes I know. I added to some, got bored, stopped, meant to finish off, never got round to it, one day... maybe ;-)

I just scraped users from a group, and it doesn't create processor record, so I can't copy filepath from there... and popup itself doesn't give full filepath or allow me to copy text. I can't even navigate the filepath manually as there is not "AppData" folder to be found within the user folder.

HELP > SAVED DATA

But all scrapes will be updated soon with a major redesign in the works, so these little "niggles" will be ironed out


Third, When following a list, I don't see any way to just follow the most recent posts. For instance, I'm looking for currently active mashup artists. I can set USER filter, and uploaded a track within X days, but that's a roundabout and inaccurate way to go. Even if I scrape only the users in a group who contributed a track, and filter for uploaded within x days, they're not connected. In other words, they could have done a mashup 5 years ago, never done one since, but uploaded some other track to some other group within x days, and filter would say they're good to follow.
Have you checked the FILTER options?

What I want to do is much more straight forward. I want people who posted recently to this group. Not seeing how to filter for that... posted recently to THIS group. Am I missing it? (Really just want to process sorted by most recent post)
shared to group filters are lacking. On the list for re-dev. Coming soon(ish) I hope
bananarama
Posts: 60
Joined: Wed Jun 17, 2015 1:33 pm

Re: Do I need to scrape all lists first?

Post by bananarama »

OK, well if 8k limit is on SC end, and not the bot's...

I know I'm at the learning curve end of this and my workflow will improve, but right now I'm spending hours a day on this with only 3 accounts to manage vs checking in once a week for my TW bot with hundreds of accounts. Really trying to get to that stage quickly.

One major issue though (other than temp bans which are my fault) is that I'm consistently not returning enough items processed (thus this thread). So....

Take mashup groups as an example. What if I were to scrape ALL of the mashup groups, then copy all the txt files into one master txt file with 150k or so users on it, then feed THAT list along with filtered for activity in past day or uploaded in last week or something.

That should work, right? No issues on bot's end from longer list? Is there an upper limit? I know some of the twitter bots need really long lists to be split which is why I'm asking.

thx. very helpful
User avatar
martin@rootjazz
Site Admin
Posts: 34712
Joined: Fri Jan 25, 2013 10:06 pm
Location: The Funk
Contact:

Re: Do I need to scrape all lists first?

Post by martin@rootjazz »

Take mashup groups as an example. What if I were to scrape ALL of the mashup groups, then copy all the txt files into one master txt file with 150k or so users on it, then feed THAT list along with filtered for activity in past day or uploaded in last week or something.
Do remember, when filtering you are interacting with SC servers, so trying to process 150k items per day is a lot.


But you could do that


You will then need to merge the resulting files: http://stackoverflow.com/questions/6764 ... o-one-file

Code: Select all

Load Dos / CMD
CD c:\path\to\
copy *.txt merged.txt
You would then want to remove duplicate items from the file, for this I would recommend notepad++ with the TextFX extension
Post Reply