Checking for already processed items

Ask any support / help / issues / problem or question related to TumblingJazz
jrichards
Posts: 378
Joined: Tue Sep 22, 2015 11:58 pm

Re: Checking for already processed items

Post by jrichards »

So what do you suggest to do? If this keeps delaying like this, it would be useless for me in couple of weeks.
User avatar
martin@rootjazz
Site Admin
Posts: 34634
Joined: Fri Jan 25, 2013 10:06 pm
Location: The Funk
Contact:

Re: Checking for already processed items

Post by martin@rootjazz »

I honestly don't see a problem of 4 hours for 12k posts. As I said 1 second per post.

What is the issue? Just leave it running.

In a few weeks, if it is taking 12 hours or 24 hours then we can look at it again.


Or you can try and convince me why 4 hours is a problem, and why you cannot leave the action running as I don't understand that at this time.
jrichards
Posts: 378
Joined: Tue Sep 22, 2015 11:58 pm

Re: Checking for already processed items

Post by jrichards »

The main problem is, that it's taking more time every day (it was more 4,5h today) and it seriously messing up timing of all my other actions I need to run on those accounts daily. I have very tight schedule on every instance and these slowdowns is basically ruining it. It hurts my workflow and also the income.

As I said, before it took about 1,5 - 2 hours and it was perfectly fine with tumblr and I don't see problem with quicker performing.

What do you think, is there a room for improvement? I don't wanna bashing you, if it is not possible ;)
User avatar
martin@rootjazz
Site Admin
Posts: 34634
Joined: Fri Jan 25, 2013 10:06 pm
Location: The Funk
Contact:

Re: Checking for already processed items

Post by martin@rootjazz »

but it has to check 12k more items daily, so after 10 days 120k etc etc.

so it is going to take more time. You cannot make more checks in +0 time. If it is a big deal, you could clear your processed logs (not recommended, but you could)
jrichards
Posts: 378
Joined: Tue Sep 22, 2015 11:58 pm

Re: Checking for already processed items

Post by jrichards »

Hey Martin, what about this improvement:

Code: Select all

1.) action starts
2.) you take list of the already processed items and sort it
3.) load list of the blogs from the file and sort it
4.) compare the hashes and remove duplicates from the list
5.) start processing file
I think this would improve the performance, right? Mainly because:
a.) you would comparing two sorted arrays
b.) there will be no pause between each iterations, like now when it is during the actual liking process

What do you think?
User avatar
martin@rootjazz
Site Admin
Posts: 34634
Joined: Fri Jan 25, 2013 10:06 pm
Location: The Funk
Contact:

Re: Checking for already processed items

Post by martin@rootjazz »

I don't think it will make a difference. The checking is done via a hashset which is the fastest method of detecting if itemA is in listB (technically not a list but I don't want to overcomplicate things.

so if we have 10 items
item1
..
item10


whether we check all items1-10 at the start or as used, it doesn't make a difference, we are still checking 10items against listB (which isn't a list...)

also, you have the issue, that if you have 100 items, but only want to process 10, by checking all at the start, you will check 100 items, whereas just checkinng one by one, would only check 10, (or perhaps 10+x where x indicates failed / skipped / something else). But would be unlikely to check all 100
jrichards
Posts: 378
Joined: Tue Sep 22, 2015 11:58 pm

Re: Checking for already processed items

Post by jrichards »

So you are saying, that the sorting of the array is make no difference? Because that was one of the first thing, which were people mentioning when they were giving advice about comparing optimisation.

But what makes a difference is, that now you have the pause between each comparison ... let's say you set pause 10-90s between action, then the pause from this interval will be used between each comparison. When the comparison would be done at the start of the action, without pauses, it should help to save some time, right?
User avatar
martin@rootjazz
Site Admin
Posts: 34634
Joined: Fri Jan 25, 2013 10:06 pm
Location: The Funk
Contact:

Re: Checking for already processed items

Post by martin@rootjazz »

jrichards wrote: Tue Oct 17, 2017 6:25 pm So you are saying, that the sorting of the array is make no difference? Because that was one of the first thing, which were people mentioning when they were giving advice about comparing optimisation.
It isn't an array it is a hashset, the data structure is optimised for lookups, it is the fastest method for lookups. A sorted array is good for searching for certain lookup methods

But what makes a difference is, that now you have the pause between each comparison ... let's say you set pause 10-90s between action, then the pause from this interval will be used between each comparison. When the comparison would be done at the start of the action, without pauses, it should help to save some time, right?
there "shouldn't" be a paused on a lookup that results in the item being skipped. Do you have a specific log showing this so I can find the exact matching code
Post Reply