prime.log

Experimental ssetup

#8th is the new starting point

#8 fixed the peristent queue and extend it to lfqueue
#9 -> change the parity to 8B
#10 - change the parity to 32B


find the topology is incorrect
right 32K private L1 and 1MB per slice shared LLC/L2

#11 should be : fixed the topology and run 4B in all nbenchmarks, which i have never done before.

Shared-llc represent the banked llc with only last level cache. netwrok=1 hop, shared-llc acess time=10cycles.


#10 is 32B load
both regular, balanced and now shared-lls setup
#10 does not include linkedlist


11 is the best so far.
11 was in the ASPLOS paper

#8 - parity load 4 bytes. NO pa
#9 increasing parity load for 8bytes. 2 long variables. 
#10 reduce it to 32 bit or no 
#10 I identified my shared llc is buggy
#benchmark - 32B 32byte load

#11 is the base model of our results
#11 4B, shared-llc for mbank , 64K elements
#12 4B, non shared-llc as 8, but with 4B,64K
#13 again 2nd run of 11. 11 seems good 4B, shared-llc mbank, 64K
#14 again 3nd run of 11. 11 seems good 4B, shared-llc mbank, 64K

#15 again 2nd run of 11. 11 seems good 4B, shared-llc mbank, With 1 Million nodes
#16 again 2nd run of 11. 11 seems good 4B, shared-llc mbank, With 1 Million nodes, 15 again

#17 again 2nd run of 11. 11 seems good 4B, shared-llc mbank, With 8k nodes, 15 again
#18 again 2nd run of 11. 11 seems good 4B, shared-llc mbank, With 8k nodes, 15 again

#19 again 2nd run of 11. 11 seems good 4B, shared-llc mbank, With 4M nodes, 15 again

#Parity: load
#20 11 with 64K and 32B parity load, 32B, shared-llc mbank, With 64k nodes.

#21 11 with 64K and 32B parity load, 64B, shared-llc mbank, With 64k nodes.- Stop in the moddle. Regula is finished


#22 11 with 64K and 4B parity load, 4B, shared-llc mbank, With 32k nodes. only in Regular
#23 11 with 64K and 4B parity load, 4B, shared-llc mbank, With 1024 nodes. Only in regular

#11-wbfixed as same as 11 running only on balanced mode. After fixing WB count.

#24 chaning L1 cache size to 64k , 65536 from  32768
#Originally L1 num of ways are 4-way
#25- chaning L1 cache size to 32K again and change to 4-ways 

#New runs after the asplos paper
#26- chaning L1 cache size to 32K again and change to 8-ways 

#26 failed unexpectedly: only 120, 300 regular and balanced
#27  26- changing L1 cache size to 32K again and change to 8-ways 


TAG:AFTER-ASPLOS
possible changes:

2019/09/25: N
1. NOP need to be fixed.
2. Release on non-release cacheline conflicts
3. Number of epochs flushed and average. Number of writes per epoch flushed. 
4. Average Writes per epoch flushed and number of flushes in BEP and LEP.
5. CLWB needs to be fixed and so same-block visibility. May increase the LB++.
6. Need more statistics to underestand the RP's behaviour. RP's self-evictions are lazy. But hwo about inter-thread?. How many evictions are at the LLC?. Sensitivity to the LLC access time?. 
7. LCLWB: Lazy Cache-line writebacks need a mechanism on PRiME.
8. Max size of the list with 1:1 read:write ratios.


2019/09/19:
issue;
M and E persist invoke Issue: Has not effected the design. BEP and RP both handled it. 

solution: change the check to dirty==1 of the invoking cacheline.

Need a test run


issue:
M->S transition in LLC in BEP?. data needs to be written back to memory before shared.

issue:
writeback and Inv issue: 


issue: Need additional epoch table for BEP. (for proactive flushing)
Keep all the epoch id and cacheline tag.

issue: BEP. Good for RP. M->S force cacheline to writeback. To avoid that.  It need to be persisted first. Creating more conflicts in the LLC. Inter-thread ---> creates these conflicts in the LLC.


2019/10/03
Thisngs to do:
Check PF. 
Integrate PF with and without. becuase original BEP is affected by the PF.
Add ocorrect persist counts to PF.
Add epoch meters to RP.


//
34: diferent Epoch Persistency numbers to 32. 33 was running with PF.
 possible cause: lazy_wb random.
 32 was un without PF and lazy_wb. Fixed this issue in 35.

 ASPLOS-AFTER: 35
 1. Need to fix the M->I and M->S writebacks in LLC for BEP/RP. This may increase the persistency overhead if we want. 

 2. RP's proactive flushing. How to do that?. RP can coaleacse writeback cachelines.

 3. Linkelist break issue.


2019/10/04: Friday.
BEP: eviction (M and E issue): change the only M model and see. RP does not effect since. But Both BEP and RP invoke a persist operation for the eviction(M and E).

My ssumptoin fails: In BEP persist epoch time dominate than write back of individual cachelines.

Also, M->S and M->E problem in both RP and BEP. NOP is fixed.


Serious bug Found: else
    if(pmodel == RLSB){
        //printf("Release Persistence \n");
        if(!proactive_flushing)
            delay_tmp = releaseFlush(cache_cur, syncline, line_cur, rel_epoch_id, req_core_id, psrc);
        else 
            delay_tmp = releaseFlushWithPF(cache_cur, syncline, line_cur, rel_epoch_id, req_core_id, psrc);
        
        cache_cur->incPersistDelay(delay_tmp); //Counters
        //delay_tmp =  delay_tmp/2;
    }

    if(pmodel == FBRP){ //Epoch PErsistency
        //printf("Full barrier Release Persistence \n");
        delay_tmp = releaseFlush(cache_cur, syncline, line_cur, rel_epoch_id, req_core_id, psrc); 
        cache_cur->incPersistDelay(delay_tmp); //Counters
        //delay_tmp =  delay_tmp/2;
    }
    else if(pmodel == FLLB){
        //printf("Epoch Persistence \n");
        delay_tmp = fullFlush(cache_cur, syncline, line_cur, rel_epoch_id, req_core_id, psrc); //Full barrier
        cache_cur->incPersistDelay(delay_tmp); //Counters
        //delay_tmp =  delay_tmp/2;
    }

    else {
        //printf("No Persistence \n");
        //NO need to check syncmap
        delay_tmp = 0;
    }

#ERRORl: else is running regardless of RLSB.
Chaged : else if(pmodel == FBRP){ 

Record of new bug found: delay_tmp is always 0 for the RLSB.
37: without fixed
38/39 -> fixed this. But seems to have a less effect. But lfqueue acts wierd.

40: Add updated latencies for RLSB piggybacking. reduced latencies rather than full latencies.still running.