-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathtodo.txt
138 lines (107 loc) · 4.51 KB
/
todo.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
Done:
+ new task format
+ widths visitor
+ widths logging
+ CNN builder (and consolidate CNN building functions)
+ update growth visitor (add CNNs)
+ add run_id generation somewhere
+ CNN dataset refactor
+ CNN loader
---
Current:
+ saving pre-training metrics, loss
+ saving post-training train metrics
+ dataset-based fixed train-test split method
+ task schema redesign
+ task migration script
+ new run schema design
+ migrate task parameter connector from '.' to '_'
+ experiment with parquet compression
+ run converter
+ update postgres result logger
+ new summary schema
+ new materialization script
+ new query script
+ subprocess runner method
+ add system_name, queue_id, job_name to worker info
+ save time at end of epoch
---
Later:
+ better per-epoch histories:
+ get zero-epoch values
+ record post-epoch training loss
+ might need to adjust existing results to include this
+ filtering kwargs?
+ Do CNN merge
+ convert CNN builder code to use Layers
+ make CNN task (or figure out how to specify it)
+ might need to rename parameters in DB
+ might need to remap Tasks in DB
+ make CNN logging work properly (see below)
+ make growth visitor compatible with CNNs
+ Test/fix revised aspect and growth experiments
+ test parquet DB storage
+ possibly convert to parquet
+ update result logger
+ update summary materialization script
+ make worker run script?
+ how to pop from the queue efficiently?
+ worker just pops one job?
+ pass worker a max wait time?
+ exit codes to indicate worker status?
------------
+ Refine CNN task specification
+ Micro vs macro
+ defined by simple 'cell type' or 'shape'?
+ cell type
+ cell depth (# cells between downsample stages)
+ downsamples (# downsampling stages)
+ cell widths (# channels/filters at each level/depth)
-> alternative:
+ cell type
+ depth (in terms of levels (cells + downsamples))
+ num_downsamples
+ evenly distributed; remaining levels are normal cells
+ shape, size
+ dense output section?
+ depth
+ size (output width determined by dataset/task)
+ width(s) (maybe uniform/rectangular?)
+ shape (maybe all rectangular?)
+ num dense parameters
-> also log:
+ widths (# channels/filters at each level)
+ total num layers
+ num cell layers
+ total num levels
+ total num downsamples
+ total num cells
+ level types array (like widths but for levels not layers)
+ log cell structure?
-> experiment data like widths
-> is it forced to be unique?
+ possibly log cell structure to use with microarch searches?
+ Data/DB storage improvement?
+ convert network module structures into layer structures
+ store run data as single blob or byte array?
+ could be a parquet encoding-> perhaps a single row table?
+ wrap python bytes in BytesIO (https://docs.python.org/3/library/io.html#io.BytesIO)
+ wrap BytesIO in pyarrow.PythonFile (https://arrow.apache.org/docs/python/generated/pyarrow.PythonFile.html#pyarrow.PythonFile)
+ store as parquet table: https://arrow.apache.org/docs/python/parquet.html
- Need a script to link runs to experiments
- or, we need to keep using the experiment table when storing a result
- Need a worker script to do aggregation and materialization
- can't do work in DB query alone for this one
- load runs for experiment, compute aggregation, store in summary table
+ Easier to extract subsets into parquet files
+ smaller, possibly faster to deal with (mainly because smaller)
+ simpler in some ways (single blob to move around)
+ might reduce chances of issues with Yuma
+ add initializer config?
+ organize aspect test utils, etc
+ rename 'type' to 'name' in configs?
+ Rename AspectTestTask to TrainingExperiment?
+ could probably get away with only renaming parameters
+ Rename some parameters? (esp. based on CNN merge)
+ must rename command for pending tasks as well as parameter table entries
+ slurm re-queueing script for Vermilion, etc?