`TMBitset` checklist #245

yakra · 2023-04-10T15:07:20Z

Benchmark:

Try out:

Kinda both:

~~Init TravelerLists at beginning; init segments with size not capacity~~
~~Or better yet, init segments with TravelerList::ids.size()~~
Segments set via TMArray::size, set via TravelerList::ids.size()
Timestamp for first task, with & without. Python Too?
TMArray<TravelerList> means threaded construction in place (via placement new) without separate read_list function.
TMBitset<HGVertex*> #251

Clean up:

The text was updated successfully, but these errors were encountered:

yakra · 2023-04-12T13:31:07Z

find crashes via

for file in sulogs/2023-04-06/siteupdate_{a423,4448,8fba,8d55,7828,96e9,2d1b,5c76}*; do
  echo -e "$file\t" `tail -n 1 $file \
  | sed 's~\[[0-9.]*\]~~' \
  | tr -d .`; done \
| grep -v 'Total run time:' \
| sed -r 's~.*siteupdate_(....)-.*~\1~' \
| uniq

yakra · 2023-04-16T17:04:40Z

dead branches

SimItOpt2 with sign bit bug before reset:
- dd1b6e0 long
- a8a3db9 int
- 9280ec4 short
  OK, not dead, but deprecated binary & logs. Hard reset here; bugfix commit added
tlist_canon @ d37dfe0979bc5a470dad5851147850503d096bdc

yakra · 2023-04-23T17:55:26Z

commits

c	incl?	descr	comments
a5a7	yes	contig RAM	Prerequisite for everything else
ecff	yes	TMB 1st draft	The main point of all this :)
d704	yes	t_l via TMB	Small speedup. Saves RAM. Potential for improvements via iteration and \|= optimizations.
a009	yes	t_l list->vec	Not slower. The Right Thing To Do. Saves RAM. Locality. Is index used in future commits? Only in `eaa6`, which won't be used as it hurts performance. What if `traveler_lists` is nixed altogether, and we just iterate `traveler_set` one more time at the end? Or, what if, instead of doing reallocations/copies, we just `reserve(TravelerList::allusers.size())` after declaring? Re-examine nixing in future.
91a0	yes	!extr include	Housekeeping. Rebase earlier?
7ce7	yes	userlog warn	Unrelated, and doesn't appear to affect performance any, fnord. Rebase earlier?
f977	no	t_l as ref	Pass a bool instead; see `4504`. Nix extraneous traveler_lists creation for master graph.
eaa6	no	nix tl::tn	Hurts performance. See `9fdc` et al instead.
4e05	no	avoid iterate	Child of `eaa6`. See `96e9` even though it's the same diff. :P
6922	no	ctor / init	Any changes to performance? See next commit. No: See `7415`.
5c76	no	integral type	Userlogs slightly slower? Test again once dust settles? No: See `4a64`.
b4ab	no	names & order	Arg order just makes sense. Name changes are more related to prev commit. No: See `c896`.
e61a	no	++u	No change to binary. Done on its own in some future commits, including `47a7`.
a423	yes	branching CBT	Faster userlogs. Rebase earlier? Maybe try again once `HighwaySegments` are stored sequentially?
4448	no	short	`unit` branch; affects `\|=` only. Prefer flavors with iterator optimization.
8fba	no	int	`unit` branch; affects `\|=` only. Prefer flavors with iterator optimization.
8d55	no	long	`unit` branch; affects `\|=` only. Prefer flavors with iterator optimization.
7828	no	no-branch CBC	Hurts performance
96e9	yes	avoid iterate	`4e05` cherrypicked. Reorder additions in `for` loop; make it look pretty.
2d1b	yes	inline mv&e	Slight speedup, simpler prototypes & includes
f28c	no	long bugfix	`unit` branch; affects `\|=` only. Prefer flavors with iterator optimization.
953a	no	! unit param	Not strictly necessary for pointer-punning `\|=`, but cleans things up. No: Different units may be useful for scenarios other than `clinched_by`. E.g., `uint64_t` may work well for `HGVertex` subgraph membership.
03ec	no	punning `\|=`	Evaluate WRT straight-up unit `\|=`. Prefer flavors with iterator optimization; 3 are simplified for 16/32-bit units; see below.
7958	no	SimItOpt1	SimItOpt2 outperforms this.
3910	no	SimItOpt2	32 > 16 > 8-bit for iterating `clinched_by`. Additionally, allows for better `\|=` optimization.
1be1	no	SIO1 short	SimItOpt2 outperforms this.
87e4	no	SIO1 int	"
2cbb	no	SIO1 long	"
9280	no	SIO2 short	32 > 16 > 8-bit for iterating `clinched_by`. Additionally, allows for better `\|=` optimization.
9347	no	sign bit fix	"
acc9	yes	unsigned int	"
8269	no	unsigned long	Performs worse than 8/16/32-bit everywhere except epoch.
f8cf	no	ulong bugfix	"
f749	no	ComItOpt	Evaluate against SimItOpt. All tasks. ~~Try with per-unit `\|=`.~~ <--Never mind. No: Underperforms SIO2 on all lab machines (but not BiggaTomato & epoch though those aren't the target). KISS principle.
9fdc	no	trav_nums	May not do much on its own. Try again after other speedups are in place. See `8be1`.
bc9d	no	TMB it idx	Pro: No arithmetic for clinchedby_code. Con: Still adding b.array+index everywhere else (deref).
0b69	no	iterator ptr	Pro: No arithmetic for dereferencing. Con: CBC subtracts out. Use `4db4` instead.
^ ^^		^ ^^	Both are superior to `9fdc`, which adds for everything, then subtracts out for CBC. Test both sans `9fdc`. Which gets called more? • 69% `clinched_by` in `clinchedby_code` @ 36,193,165, which wants indices. • 31% in everything else @ 16,024,183, which want pointers. (8,000,360 `clinched_by` in stats & SQL + 23,463 `traveler_set` in MV&E) So on paper, `bc9d` wins. But in practice we're within margin of error, with `0b69` faster on the 4 non-Ubuntu (coincidence?) machines. If foregoing or at least deferring `9fdc` because no clear benefit, it makes sense to use `0b69`.
ec4e	no	16-bit it opt	Wins on epoch, BT & lab1; lags behind SimItOpt2 everywhere else, even before accounting for lookup array setup. Constexpr would be expensive for program & code size. Ugly changes to code, a namespace I wouldn't otherwise need.
16bb	no	`[bits >> 1]`	Only needed for ComItOpt, which looks like it's not going to pan out. Only performs better on lab2, so within margin of error at best.
2d94	no	ternaries	1st place on lab2 by 0.5 ms & lab3 by 0.7; lags behind SimItOpt2 everywhere else except BT, where it's 2nd place behind `ec4e`, and more arguably epoch, where it's slower than 8 & 16-bit, but faster than 32-bit (which looks like it'll be the selected alternative). But again, BT & epoch aren't target machines. Keep it simple.
fc17	no	punning SIO2	Bigger is better.
72e2	no	64-32-16 pun	"
c310	no	64-16-bit pun	"
47a7	yes	64-32-bit pun	Bigger is better. Fewest branches & loops. 32-bit is fastest for iteration too.
4db4	yes	iterator ptr	Faster.
4504	yes	"0"	Rebase earlier
7f87	yes	trav	Better on paper; != op replaced with just sending an existing variable. Rename variable to `is_traveled` for readability.
05d7	no	reserve	No definitive diff; no-build may be faster.
8be1	...	TL::nums	No apparent diffs. Revisit when more RAM bandwidth optimizations are in place?
c896	yes	arg order	Measures faster, but probably just noise.
7415	yes	ctor / init	Keep the initialization part; no diffs to binary from that. Array start & data pointer go from 8 to 24 B apart, increasing probability of being on different cache lines. That's got to explain the drop in UserLog performance. Therefore, reorder to get them consecutive again; see `01d6`.
4a64	yes	integral type	This time, no change to binary from `7415`.
6da9	no	alignas(32)	Increases `sizeof(HighwaySegment)` from 112 to 128. Does help UserLog performance on all machines, but hurts most other tasks on most machines. Combined, still a slight advantage overall but future commits do better...
01d6	yes	reorder membr	`start` and `data` adjacent again, restoring old same/different cache line probability. Data occasionally on different cache lines appears to be outweighed by not increasing `sizeof(HighwaySegment)`, except for bandwidth-constrained machines BiggaTomato, lab1 & bsdlab. `01e7` makes this a non-issue...
01e7	yes	cbt_index	Not only avoids reading `start` (potentially from another cache line), avoids redundant recalculation of the index. A clear advantage over all prev commits on all Linux boxes.
48c8	yes	part add_idx	`add_clinched_by` takes index, calculated once per chopped route, in `store_traveled_segments`.
28af	no	local_index	No clear benefit; more diffs. Keep it simple.
c8b9	yes	const fold	readability
adcf	no	1 index/trav	No clear benefit; more diffs. Keep it simple.
6634	yes	whitespace
9536	yes	constexpr	no diff to binary on BiggaTomato
ecef	yes	ST aug index

yakra · 2023-05-01T03:11:31Z

Functions, operators, etc.

insert
No real changes, because constant folding
all via HighwaySegment::add_clinched_by:
- ConcAug
- ReadList via store_traveled_segments
operator ()
No real changes, because constant folding
- UserLog: Route::clinched_by_traveler
operator !=
Only changes in 953a (nix unit parameter) & children (ComItOpt et al): multiply by sizeof(unit) before memcmp
- Graphs: edge compression
operator |=
Type-punning variants
- Graphs: compiling traveler lists
iteration
Simple, 8 & 16-bit complex (and variants), unit choice, store index or pointer
- CompStats
- Graphs:
  - creating traveler_lists vector, or getting item count & writing traveler names to last .tmg line
  - HighwaySegment::clinchedby_code
- SQL: segments table

yakra · 2023-05-14T13:43:33Z

HTML

TMB_CBC.html
4db4 store pointer, 2effinline, 4504 "0", 7f87 trav.
Yes.
TMB_TravNums.html
No-build, contig TLs, t_l as ref, 9fdc trav_nums, bc9d TMB it idx, 0b69 iterator ptr.
0b69 cherrypicked -> 4db4.
TBD: Unsure what to do WRT trav_nums yet.
TMB_inline.html
f977 t_l as ref, 2d1b inline mv&e, 4db4 store pointer, 2eff merge, d0fanixtl branch, 13b9 merge.
Yes.
TMB_nixtl.html
4db4 store pointer, edb4 nix t_l, aecc "0", d0fa notrav, 13b9 inline
Yes? Contradicts nixtl2.
TMB_nixtl2.html
4504 "0", 13b9 nixtl, 7f87 trav (PreAlpha), 7f7d trav(nixtl)
No (Contradicts nixtl). May re-examine in future.

yakra · 2023-05-15T13:35:32Z

hidden/deleted branches

comm	branch	DoWhat	comment
ecef	ConstFold	delete	TMBPreAlpha FFWDed
adcf	ElStacko	hidden
28af	add_index	hidden
89bd	cbt_index	delete	TMBPreAlpha FFWDed
01d6	reorder	delete	merged
6da9	alignas	hidden
8be1	thread_local	hidden	re-examine in future
05d7	reserve	hidden
660c	[stash]	hidden	meta-iteration (CompStats)
6a83	[stash]	hidden	meta-iteration (traveler names)
7f7d	nixtl	hidden	re-examine in future
47a7	64-32	delete	parent of TMBPreAlpha & nixtl
c310	64-16	hidden
72e2	64-32-16	delete	parent of 64-16
fc17	SIO2pun	hidden
2d94	ternaries	hidden
16bb	bit1	delete	parent of ternaries
ec4e	ComItOpt16	delete	parent of ternaries
0b69	return_p	hidden
bc9d	idx	hidden
9fdc	nixon	delete	parent of return_p
f749	ComItOpt	delete	parent of ternaries
f8cf	SimItOpt2	hidden
2cbb	SimItOpt	hidden
03ec	pun	delete	parent of ternaries
54fb	issue588	hidden	use this
f28c	unit	hidden
2d1b	inline	delete	merged
96e9	switch	delete	merged
7828	cbycode	hidden
a423	cbt_br	delete	merged
e61a	TMBitset	hidden	Nix TL::tn dropped; the rest redone manually downstream

yakra · 2023-05-29T02:37:51Z

Alpha ToDo list

Solo items

C	Descr	Comments
`54fb`	issue 588
`91a0`	!extr include
`7ce7`	userlog warn
`96e9`	avoid iterate
ToDo	Timestamp for first task (Get list of travelers in the system). Python Too.
ToDo	Constexpr the static edge format constants.
v	Extraneous else after continue; parens can be clarified
^	extraneous visibility == 1 check just before that

traveled_tmg_line & traveler_lists cleanup

C	Descr	Comments
`4504`	"0"	HGEdge, HighwaySegment. HG.cpp: 2 tmg_line, 3 size->travnum
`7f87`	trav

contig RAM group

C	Descr	Comments
ToDo	Get TMArray sorted.	TMArray means threaded construction in place (via placement new) without separate read_list function.
`a5a7`	contig RAM
ToDo	fix comment
ToDo	TravelerList dtor, to delete traveler_num

TMB group

C	Descr	Comments
`ecff`	TMB 1st draft	Does ctor init vars in order? Don't include siteupdate.cpp changes; see below.
Nope	~~init segments with TravelerList::ids.size()~~	Never mind. `TMArray<TravelerList>::size`.
`a423`	branching CBT
`acc9`	unsigned int
`47a7`	64-32-bit pun
`4db4`	iterator ptr
`c896`	arg order
`7415`	ctor / init
`4a64`	integral type
`01d6`	reorder membr
`01e7`	cbt_index
`48c8`	part add_idx
`c8b9`	const fold
`6634`	whitespace
`9536`	constexpr
`ecef`	ST aug index
done	~~nix () and add_value until needed~~	Done as part of c8b9
ToDo	Cast (unit)1 before <<	...lest the unsigned long bug make a reappearance. See f8cf on SimItOpt2 branch.
ToDo	Review TMBitset variable names	Also reduce arithmetic in iterator construction.
ToDo	Comment for != and \|= operators
ToDo	segments SQL table: only iterate clinched_by for active/preview systems
ToDo	uint8_tetc.
ToDo	As of ecff, cmath no longer explicitly needed in HighwayGraph.cpp
ToDo	Template specialization
ToDo	Explicit instantiation?

t_l via TMB group

C	Descr	Comments
`d704`	t_l via TMB
`2d1b`	inline mv&e	Delete that extra line of whitespace from HighwayGraph.h as in `4504`
ToDo	Don't include TMBitset
`4504`	"0"	GetSubData. HG.cpp: assign traveler numbers
`a009`	t_l list->vec

yakra · 2023-06-29T02:23:56Z

partial specialization experiments
deleted stash @ 484be5074e68f31db790cc53b34cf2494f6ff114

yakra · 2023-07-09T14:40:14Z

Benchmark commits:

WIP:
HD @ 59afa75
UD @ 2126e76
Final:
HD @ 40c2356
UD @ 5b071e5

yakra · 2023-07-09T14:58:00Z

yakra · 2024-05-20T00:52:01Z

deleted branches (BiggaTomato)

g_db comments out DB here and here.
g_trav comments out everything specific to traveled graphs in mv&e, write_master and write_sub, except for assigning traveler numbers. HighwayGraph.cpp only.
g_cb comments out traveler_lists population
990a288 adds {}// at the beginning of this line

yakra added code organization list C++ speed RAM labels Apr 10, 2023

This comment was marked as resolved.

Sign in to view

yakra mentioned this issue May 27, 2023

TMBitset<HGVertex*> #251

Closed

17 tasks

yakra mentioned this issue Jul 9, 2023

TMArray, TMBitset, etc. TravelMapping/DataProcessing#592

Merged

jteresco closed this as completed in TravelMapping/DataProcessing#592 Jul 9, 2023

yakra mentioned this issue Jul 9, 2023

TMBitset leftovers #252

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`TMBitset` checklist #245

`TMBitset` checklist #245

yakra commented Apr 10, 2023 •

edited

Loading

yakra commented Apr 12, 2023 •

edited

Loading

yakra commented Apr 16, 2023 •

edited

Loading

yakra commented Apr 23, 2023 •

edited

Loading

yakra commented May 1, 2023 •

edited

Loading

This comment was marked as resolved.

yakra commented May 14, 2023 •

edited

Loading

yakra commented May 15, 2023 •

edited

Loading

yakra commented May 29, 2023 •

edited

Loading

yakra commented Jun 29, 2023

yakra commented Jul 9, 2023

yakra commented Jul 9, 2023 •

edited

Loading

yakra commented May 20, 2024

TMBitset checklist #245

TMBitset checklist #245

Comments

yakra commented Apr 10, 2023 • edited Loading

yakra commented Apr 12, 2023 • edited Loading

yakra commented Apr 16, 2023 • edited Loading

dead branches

yakra commented Apr 23, 2023 • edited Loading

commits

yakra commented May 1, 2023 • edited Loading

Functions, operators, etc.

This comment was marked as resolved.

yakra commented May 14, 2023 • edited Loading

HTML

yakra commented May 15, 2023 • edited Loading

hidden/deleted branches

yakra commented May 29, 2023 • edited Loading

Alpha ToDo list

Solo items

traveled_tmg_line & traveler_lists cleanup

contig RAM group

TMB group

t_l via TMB group

yakra commented Jun 29, 2023

yakra commented Jul 9, 2023

yakra commented Jul 9, 2023 • edited Loading

yakra commented May 20, 2024

deleted branches (BiggaTomato)

`TMBitset` checklist #245

`TMBitset` checklist #245

yakra commented Apr 10, 2023 •

edited

Loading

yakra commented Apr 12, 2023 •

edited

Loading

yakra commented Apr 16, 2023 •

edited

Loading

yakra commented Apr 23, 2023 •

edited

Loading

yakra commented May 1, 2023 •

edited

Loading

yakra commented May 14, 2023 •

edited

Loading

yakra commented May 15, 2023 •

edited

Loading

yakra commented May 29, 2023 •

edited

Loading

yakra commented Jul 9, 2023 •

edited

Loading