Hello Martin:
Under a specific filter, filter out a batch of followers' names (URLs) for someone. Over time, the status of users in the list may change, for example, some people have already started following me. How can I update this list? Is it possible to scraping based on this list? For example, to remove those who have already followed me or those who are inactive for a long time?
Secondary analysis of the scraping results?
- martin@rootjazz
- Site Admin
- Posts: 34712
- Joined: Fri Jan 25, 2013 10:06 pm
- Location: The Funk
- Contact:
Re: Secondary analysis of the scraping results?
You would need to run a filter on the list.hacking wrote: ↑Sun Nov 19, 2023 3:37 am Hello Martin:
Under a specific filter, filter out a batch of followers' names (URLs) for someone. Over time, the status of users in the list may change, for example, some people have already started following me. How can I update this list? Is it possible to scraping based on this list? For example, to remove those who have already followed me or those who are inactive for a long time?
Custom search step:
USER ID URL
then set your filepath.
The filter will be applied to the list as input and the new output will be profiles who meet your filter
Re: Secondary analysis of the scraping results?
Hello Martin,
Following the provided instructions, I attempted to perform the operation, but unfortunately, no new result file was generated. Here is the specific situation:
Initially, I conducted a scrape without any filtering conditions for a particular user, gathering data on 100,000 followers.
Subsequently, following the given instructions, I attempted a secondary scrape. However, the task appears to be continuously running without producing any output, even after waiting for several hours.
I would be grateful for any guidance or solution you can provide to address this matter.
Thank you for your time and assistance.
Best regards
Following the provided instructions, I attempted to perform the operation, but unfortunately, no new result file was generated. Here is the specific situation:
Initially, I conducted a scrape without any filtering conditions for a particular user, gathering data on 100,000 followers.
Subsequently, following the given instructions, I attempted a secondary scrape. However, the task appears to be continuously running without producing any output, even after waiting for several hours.
I would be grateful for any guidance or solution you can provide to address this matter.
Thank you for your time and assistance.
Best regards
- martin@rootjazz
- Site Admin
- Posts: 34712
- Joined: Fri Jan 25, 2013 10:06 pm
- Location: The Funk
- Contact:
Re: Secondary analysis of the scraping results?
Look at your logs, what is happening.
Are all the results being ignored? Then your filter is too strict - or setup wrongly (or there is a bug)
Are all the results being ignored? Then your filter is too strict - or setup wrongly (or there is a bug)
Re: Secondary analysis of the scraping results?
The criteria for the secondary scrape are not stringent; it simply includes users who have been active within the last 100 days.
The log file is:logs_84650
The log file is:logs_84650
- martin@rootjazz
- Site Admin
- Posts: 34712
- Joined: Fri Jan 25, 2013 10:06 pm
- Location: The Funk
- Contact:
Re: Secondary analysis of the scraping results?
ok, so the filter on a LIST doesn't save as it processes, it will save at the end of the action. from your logs, the action just hadn't completed
Regards,
Martin
Regards,
Martin
- martin@rootjazz
- Site Admin
- Posts: 34712
- Joined: Fri Jan 25, 2013 10:06 pm
- Location: The Funk
- Contact:
Re: Secondary analysis of the scraping results?
Are the two features(Global unique results for search regardless of account / filters usedResume
search position on repeats/ restarts) effective for the secondary scrape? I noticed that it is indeed necessary to wait for the scrape to complete before saving the results. In practice, the initial scrape yields a significant amount of data, often tens of thousands of entries. The secondary scrape of this data takes a considerable amount of time, with several days passing without any progress, leading me to believe that there might be an error in the program.
Explanation for why I am doing this: My scraping operation is, in fact, quite simple, and the filters are not complex. I just want to fetch the followers of a specific account who have been active within the last 100 days (or any other specified period). Originally, this task could be completed in one go. However, I found that when scraping accounts with a large number of followers (several hundred thousand), it is challenging to obtain a clean scrape due to Twitter's daily limits, program interruptions, and the impact of restarting the task. Therefore, I decided to scrape all the follower data for the target account and analyze it as static data gradually.
search position on repeats/ restarts) effective for the secondary scrape? I noticed that it is indeed necessary to wait for the scrape to complete before saving the results. In practice, the initial scrape yields a significant amount of data, often tens of thousands of entries. The secondary scrape of this data takes a considerable amount of time, with several days passing without any progress, leading me to believe that there might be an error in the program.
Explanation for why I am doing this: My scraping operation is, in fact, quite simple, and the filters are not complex. I just want to fetch the followers of a specific account who have been active within the last 100 days (or any other specified period). Originally, this task could be completed in one go. However, I found that when scraping accounts with a large number of followers (several hundred thousand), it is challenging to obtain a clean scrape due to Twitter's daily limits, program interruptions, and the impact of restarting the task. Therefore, I decided to scrape all the follower data for the target account and analyze it as static data gradually.
- martin@rootjazz
- Site Admin
- Posts: 34712
- Joined: Fri Jan 25, 2013 10:06 pm
- Location: The Funk
- Contact:
Re: Secondary analysis of the scraping results?
I'll have to check, I don't remember. I think is per search global
search position isn't going to work on files, it works with twitter ID for the search pagesearch position on repeats/ restarts) effective for the secondary scrape? I
Did you see the link above? Should save after every 10 results found.noticed that it is indeed necessary to wait for the scrape to complete before saving the results.
The more data to filter, the longer it takes. Look at the logs, it will tell you what's going onIn practice, the initial scrape yields a significant amount of data, often tens of thousands of entries. The secondary scrape of this data takes a considerable amount of time, with several days passing without any progress, leading me to believe that there might be an error in the program.
To find out the active date of an account, the program must make multiple requests per profile to find out the most recent: tweet / like / retweet (along with user_details). So if you have 100k results, 400k new requests are made. It takes timeExplanation for why I am doing this: My scraping operation is, in fact, quite simple, and the filters are not complex. I just want to fetch the followers of a specific account who have been active within the last 100 days (or any other specified period).