Exploring the OPRO paper (https://arxiv.org/pdf/2309.03409) to train a linear regression model on Gemini 2.0 Flash with SGD baselines.
conda create -n gemini-opro python=3.10
conda activate gemini-opro
pip install -r requirements.txt
rename .env-example to .env and add your GEMINI API KEY
set w_true and b_true values inside opro.py
to generate data points for experiments. Other experiment parameters are tunable in the settings.py
file.
python opro_optimizer/opro.py
Running SGD baselines for the same w_true and b_true values.
python opro_optimizer/sgd.py
Generating results and metrics
python eval.py
- The temperature, number of generated points per step, and num pairs params act as an exploration-exploitation control for the language model.
- The number of pairs given in the meta-prompt correlate with model performance in extreme cases (eg: <2,30>, <36, -1> pairs). If the pair counts are too low, model settles and hovers over a local minima.
- Exp 1 - Low number of steps for (15,14) can be attributed to gaussian random initialization of w,b pair between 10 and 20.
- Exp 2 - Hypothesis - The temperature parameter alone does not promote exploration in learnable params in the language model.
- Exp 4 - As shown by the results, model is not impacted since data is not fed into the language model, just the weight, bias, and loss value.
- Structured outputs work better when adding a 'reasoning' key in the output. The model seems to steer towards optimal values when reasoning tokens are added to it's context.
Temperature: 1
Num points: 50
Max Steps: 500
Num Reps: 5
Num generated points per step: 8
SGD Tolerance: 0.1
SGD Learning Rate: 1.00E-06
w_true | b_true | number of steps | number of unique (w,b) pairs explored | |||
---|---|---|---|---|---|---|
Gemini 2.0 Flash | Stochastic Gradient Descent | Gemini 2.0 Flash | ||||
mean | std | Count | mean | std | ||
15 | 14 | 1.6 | 0.49 | 121 | 10.8 | 1.72 |
17 | 17 | 6.8 | 0.75 | 118903 | 23.4 | 4.22 |
16 | 10 | 5.6 | 0.8 | 104577 | 19.8 | 3.19 |
3 | 5 | 14 | 1.9 | 178500 | 37.4 | 9.7 |
25 | 23 | 10.67 | 4.5 | 195228 | 43.67 | 17.21 |
2 | 30 | did not converge, hovers around w=4 and b=5 even after ~60 iterations | 243943 | - | - | |
36 | -1 | did not converge, hovers around w=34 and b=25 even after ~40 iterations | 231261 | - | - |
Temperature: 1.3
Num Reps: 1
w_true | b_true | number of steps | number of unique (w,b) pairs explored | ||
---|---|---|---|---|---|
Gemini 2.0 Flash | Gemini 2.0 Flash | ||||
mean | std | mean | std | ||
2 | 30 | Does not converge | - | - | |
36 | -1 | Does not converge | - | - |
Temperature: 1.5
Max num pairs: 35
Num Reps: 1
Num generated points per step: 15
w_true | b_true | number of steps | number of unique (w,b) pairs explored | ||
---|---|---|---|---|---|
Gemini 2.0 Flash | Gemini 2.0 Flash | ||||
mean | std | mean | std | ||
2 | 30 | 24 | 0 | 168 | 0 |
36 | -1 | 20 | 0 | 137 | 0 |
Num Points 100
Temperature 1.5
Max num pairs 35
Num Reps 1
Num generated points per step 15
w_true | b_true | number of steps | number of unique (w,b) pairs explored | ||
---|---|---|---|---|---|
Gemini 2.0 Flash | Gemini 2.0 Flash | ||||
mean | std | mean | std | ||
2 | 30 | 19 | 0 | 107 | 0 |
36 | -1 | 19 | 0 | 107 | 0 |
- Run baselines on the gemini models
- Try with structured json outputs
- Train with SGD model
- Train a linear regression model with varying number of data points
- Rerun experiments 3 and 4 with higher repetitions
- Train a neural network model on the same ‘linear’ data
- Fit a sine curve using LLM optimizer
- Fit data points with decimal values
References: Google-Deepmind. (n.d.). GitHub - google-deepmind/opro: official code for “Large Language Models as Optimizers.” GitHub. https://github.com/google-deepmind/opro