Skip to content

k-pratyush/gemini-optimizer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

gemini-optimizer

Exploring the OPRO paper (https://arxiv.org/pdf/2309.03409) to train a linear regression model on Gemini 2.0 Flash with SGD baselines.

Setup

conda create -n gemini-opro python=3.10
conda activate gemini-opro
pip install -r requirements.txt
rename .env-example to .env and add your GEMINI API KEY

Running the experiments

set w_true and b_true values inside opro.py to generate data points for experiments. Other experiment parameters are tunable in the settings.py file.

python opro_optimizer/opro.py

Running SGD baselines for the same w_true and b_true values.

python opro_optimizer/sgd.py

Generating results and metrics

python eval.py

Observations and Results

  • The temperature, number of generated points per step, and num pairs params act as an exploration-exploitation control for the language model.
  • The number of pairs given in the meta-prompt correlate with model performance in extreme cases (eg: <2,30>, <36, -1> pairs). If the pair counts are too low, model settles and hovers over a local minima.
  • Exp 1 - Low number of steps for (15,14) can be attributed to gaussian random initialization of w,b pair between 10 and 20.
  • Exp 2 - Hypothesis - The temperature parameter alone does not promote exploration in learnable params in the language model.
  • Exp 4 - As shown by the results, model is not impacted since data is not fed into the language model, just the weight, bias, and loss value.
  • Structured outputs work better when adding a 'reasoning' key in the output. The model seems to steer towards optimal values when reasoning tokens are added to it's context.

Experiment 1 - Initial Baselines

Temperature: 1
Num points: 50
Max Steps: 500
Num Reps: 5
Num generated points per step: 8
SGD Tolerance: 0.1
SGD Learning Rate: 1.00E-06

w_true b_true number of steps number of unique (w,b) pairs explored
Gemini 2.0 Flash Stochastic Gradient Descent Gemini 2.0 Flash
mean std Count mean std
15 14 1.6 0.49 121 10.8 1.72
17 17 6.8 0.75 118903 23.4 4.22
16 10 5.6 0.8 104577 19.8 3.19
3 5 14 1.9 178500 37.4 9.7
25 23 10.67 4.5 195228 43.67 17.21
2 30 did not converge, hovers around w=4 and b=5 even after ~60 iterations 243943 - -
36 -1 did not converge, hovers around w=34 and b=25 even after ~40 iterations 231261 - -

Experiment 2 - An attempt to converge extreme points

Temperature: 1.3
Num Reps: 1

w_true b_true number of steps number of unique (w,b) pairs explored
Gemini 2.0 Flash Gemini 2.0 Flash
mean std mean std
2 30 Does not converge - -
36 -1 Does not converge - -

Experiment 3 - Finally converged

Temperature: 1.5
Max num pairs: 35
Num Reps: 1
Num generated points per step: 15

w_true b_true number of steps number of unique (w,b) pairs explored
Gemini 2.0 Flash Gemini 2.0 Flash
mean std mean std
2 30 24 0 168 0
36 -1 20 0 137 0

Experiment 4 - Trying to converge with higher number of data points

Num Points 100
Temperature 1.5
Max num pairs 35
Num Reps 1
Num generated points per step 15

w_true b_true number of steps number of unique (w,b) pairs explored
Gemini 2.0 Flash Gemini 2.0 Flash
mean std mean std
2 30 19 0 107 0
36 -1 19 0 107 0

TODO:

  • Run baselines on the gemini models
  • Try with structured json outputs
  • Train with SGD model
  • Train a linear regression model with varying number of data points
  • Rerun experiments 3 and 4 with higher repetitions
  • Train a neural network model on the same ‘linear’ data
  • Fit a sine curve using LLM optimizer
  • Fit data points with decimal values

References: Google-Deepmind. (n.d.). GitHub - google-deepmind/opro: official code for “Large Language Models as Optimizers.” GitHub. https://github.com/google-deepmind/opro

About

Can LLMs optimize ML model weights?

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages