forked from PrincetonUniversity/primesim
-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathlrp.log
158 lines (112 loc) · 6.92 KB
/
lrp.log
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
Prime - Script Running Log
#8 - parity load 4 bytes. NO pa
#9 increasing parity load for 8bytes. 2 long variables.
#10 reduce it to 32 bit or no
#10 I identified my shared llc is buggy
#benchmark - 32B 32byte load
#11 is the base model of our results
#11 4B, shared-llc for mbank , 64K
#12 4B, non shared-llc as 8, but with 4B,64K
#13 again 2nd run of 11. 11 seems good 4B, shared-llc mbank, 64K
#14 again 3nd run of 11. 11 seems good 4B, shared-llc mbank, 64K
#15 again 2nd run of 11. 11 seems good 4B, shared-llc mbank, With 1 Million nodes
#16 again 2nd run of 11. 11 seems good 4B, shared-llc mbank, With 1 Million nodes, 15 again
#17 again 2nd run of 11. 11 seems good 4B, shared-llc mbank, With 8k nodes, 15 again
#18 again 2nd run of 11. 11 seems good 4B, shared-llc mbank, With 8k nodes, 15 again
#19 again 2nd run of 11. 11 seems good 4B, shared-llc mbank, With 4M nodes, 15 again
#Parity: load
#20 11 with 64K and 32B parity load, 32B, shared-llc mbank, With 64k nodes.
#21 11 with 64K and 32B parity load, 64B, shared-llc mbank, With 64k nodes.- Stop in the moddle. Regula is finished
#22 11 with 64K and 4B parity load, 4B, shared-llc mbank, With 32k nodes. only in Regular
#23 11 with 64K and 4B parity load, 4B, shared-llc mbank, With 1024 nodes. Only in regular
#11-wbfixed as same as 11 running only on balanced mode. After fixing WB count.
#24 chaning L1 cache size to 64k , 65536 from 32768
#Originally L1 num of ways are 4-way
#25- chaning L1 cache size to 32K again and change to 4-ways
#New runs after the asplos paper
#26- chaning L1 cache size to 32K again and change to 8-ways
#26 failed unexpectedly: only 120, 300 regular and balanced
#27 26- changing L1 cache size to 32K again and change to 8-ways
#asplos after
#28 balanced and regular with M and E states counters.
#Add writeback to the visibility conflicts. CLWB
#starting to change the cache hierarchy
#29 96B, 64K elements, 32K caches eveythig as old. but, No persistency: WB no dram access and Invalidations. only M->S
#30 - 29 same. with more counters to lowest epoch sizes. path for proactive flushing.
#30 has a bug. 31 again
#31 had a segmentation faults in skiplist and queue (balanced) in BEP. queue (regular).
#issue: epoch counter goes beyond 10k
#Seems like we have a problem with the epoch ID. Epoch COunter > 10000. is operations == 10000.
#L1 cache sizes
#asplos after: with visibility write back and nop fixed.
########32 is checkpointed:
#32 fixed and run. runnign ok. Without PF. RP, SB and BB numbbers are fine.
#33. BEP+Poactive flushing (with lowest epoch id). BB+PF is working/
#Check.... There is a difference between BEP+PF and Epoch Persistency in 33.
#34: Run all benchmarks with all persistency models including BEP+PF=7. 0 3 7 4 6.
#- Original BEP results were changed from 32. 32 without PF.
#34: SB went viral. lower values.
#35: Changes: Add lazy_wb=false to RP, FB, BEP properly. I think the problem of SB (FB) is that lazy_wb is randomly set. method-local.
#36 - with RP+PF as 8. Order: 0 8 3 7 4 6
#37- fixed the flispeed ~ to ! issue of RP+PF; add all aviction count to BEP
#37 had counter bug in release. add writeback twice. only affect lfqueue RP and RP+PF;
#NOTE: 37 is working fine. But RP and RP+PF has same results. proactive flushing in.
#ERROR: BUG FOUND ON RELEASEFLUSH. Delay_tmp is always set to 0;
#38 fixed the release-persistency and rerun. ELSE statement and delay_tmp=0 always. I think LRP will no longer valid.
#39 I run 38 wit the number of operations=2000
#RP does not have a much gain. This is the best time to improve the latency of the piggybacking. Currently design add the full latency.
#40 with RP piggyback latecncy. fixed the rlsb latency issue.
#NOTE: surprisingly 38, 39 and 40 is not much different ot original buggy 37. lfqueue is effected.
#38: proative flushing. fixed release delay_tmp. without piggyback latency update.
#40: proative flushing. fixed release delay_tmp. WITH piggyback latency update. I made linkedlist load to 32B
#40 is fine with improved puggybacking.
###################################
#Important.- 40 linkedlislist onward 32B
#Hashmap and linkedlist load is 32B
#41: 40 with /only hashmap/ with hashbuckets(initial size) 10 (rate=10: initial size). Assuming its equal to the linkedlist.
#41_linkedlist: 40 with only linkedlist with 8 threads.
#41_linkedlist16: 40 with only linkedlist with 16 threads.
#Becuase linkedlist takes so much time to run.
#also its good to run linkedlist with 16 threads or 8 threads.
#41-hashmap 8 ans 16 threads.
###############################
#NEW TESTS- Back again to B4. 41-linkedlist16 is still running. ---------------------------------------------
#42: starting with 4B, But with the piggybacking. also need to run the same thing without piggybacking.
#1- normal, 2-fixed stat comma, 3-again, right data2, 3-does not have pf info. So 4.
#43-Implement 75% PF threashols. Not the entire epoch 1.
#44- with LLC latency=30 cycles.
#44-5 is for all threads. with llc=30
#realized 42 needs more 300cycles ones.already done.
#44 allthreads
#44-allthreads. all threads.llc=30.
#42-allthreads with llc=10
# ASPLOS REBUTTAL is over.
# after-rebuttal
#45 same as 42. I am testing with different read DRAM access = 0.75*access-latency
#- is same as 42. with full data2. --------------------------------------------------------------
#45-1-alternative is similar to balanced model.
#46 DRAM trace running. - only for 8 threads
#46-1 was run against 1000, 2000, 4000 operations on 8 threads
#Mistake 46-1 was running with -A flag. so new running 46-2. BUT 46-1-linkedlist run with -A. correct
#47 - 32 threads. running with dram trace. NOTE: add conflciting natural-wb to systems.
#prime state was updated after running balanced. so last 4 firleds may not be there in some bencmarks
#47- 1.
#47-4 with the last modification to BB. pipeline wb delay model. not sure.
#48 with dram trace. timing fixed. BB with. LRP with piggybacking.
#48-1 hashmap is broken. 7 ihas segmentation fault. But probably 32 can be used. fixed syste,.cpp
#49 is only 7. mistake. and with A.
#-------------------------------------------------------------------------------------
#50 was right. 50-1 , 4B, upto 4000 operations. First success after timing. Traces are taken.
#50-2 ? I have to change the size of the data structure and alterative option.
######## 1.) regular, 2.) balanced 3.)alternative
#-----------------------------------------------------------------------------------------
#51 - run without DRAM Trace. SYNCBENCH = B256 (8, 32 and 1000, 2000) .
#51 NOTE: system.ccp file has been deleted. SO i rewrite the changes. So timing can be vary.
#51 shows low output.
#52 with 128B load. (all persist buffer designs)
#Mistake 51-54 : Only hashmap is changed to 128B. Others remian as B256.
#All pbuffer design is based on the
#54-1: hashmap and linkedlist, hashmap, queue1/2 is 128B, other B256
#Therefore, 54-2 : is INVERSED
#------------------------------------------------------------------------------------------
#55 : with iterate