You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I wanted to test Zig for simple computational algorithms after going through the "ziglings" exercises.
I looked into genetic algorithms for the simplicity of these kind of algorithms. It is an optimisation algorithm: instead of a "standard" say gradient descent, you perform a sorting, an elitism and mutation of the population.
What is this challenge?
You are given a phrase (a list of characters) and the program must find it. 🤔
Why?
You can use genetic algorithms for this kind of task, and it is conceptually simple to program and is performant because this problem is 1) simple (it is a kind of sorting problem) 2) has a "global minimum" (only one possible solution).
Genetic algorithms are not silver bullet. Read the "limitations" paragraph on wikipedia.
How?
You start with a list (or "population") of random phrases (or "chromosomes"). Each phrase is made of letters (or "genes"). The length of a phrase is of course the same length as your target phrase.
You will then produce a new generation from the old one. Concretely, you modify your list of genes at each step. This is a convergente process.
Convergence measure
We need of course a metric. The measure of convergence - aka as "fitness" - is simply the number of letters ("genes") in your guess (phrases) that are equal and at the right place (in your target): if "I love you" is your target, you produce a chromosome of random genes of length 10.
Note that the implementation of a chromosome depends upon your language. In Node.js, I used a string, a concatenation of characters. With Zig, an ArrayList, a dynamic array with memory allocation.
The algorithm
Your first generation is an initial random "population" of guesses. This is done by taking a random letter (or "gene") from a given pool of letters (all a..zA..Z0..9 plus special characters), and building a "chromosome" (a string in the Node.js implementation below) of the same size as the length of your target string
You sort your population of "chromosomes" (strings in fact here) with your previously defined "fitness" measure.
You run your genetic operator. You then produce a genetic admixture from this population for your next generation:
you keep the e% best candidates, the "elitism" parameter,
for the rest of your population, you take 45% of one candidate (or "parent") and 45% of another one and 10% random and rebuild a new candidate from it.
Node.js reference code
Node.js: Find my phrase
// Number of individuals in each generationconstPOPULATION_SIZE=100;// Valid GenesconstGENES="abcdefghijklmnopqrstuvwxyz"+"ABCDEFGHIJKLMNOPQRSTUVWXYZ "+"1234567890"+", .-;:_!#%&/()=?@${[]}";constGLEN=GENES.length;// Target string to be generatedconstTARGET="I love you baby!";constTLen=TARGET.length;constELISTIM=10;// Function to generate random numbers in given rangefunctionRandomNum(start,end){returnMath.floor(Math.random()*(end-start+1))+start;}// Create random genes for mutationfunctionTakeRandomGene(){letr=RandomNum(0,GLEN-1);returnGENES.charAt(r);}// Create chromosome or string of genesfunctionCreateGnome(){letgnome="";for(leti=0;i<TLen;i++){gnome+=TakeRandomGene();}returngnome;}// Class representing individual in populationclassIndividual{constructor(chromosome){this.Chromosome=chromosome;this.Fitness=this.CalculateFitness();}// Calculate fitness score, it is the number of// characters in string which differ from target string.CalculateFitness(){letfitness=0;for(leti=0;i<this.Chromosome.length;i++){if(this.Chromosome[i]!==TARGET[i]){fitness++;}}returnfitness;}// Perform mating and produce new offspring:// take 45% of the genes from the parent 1 and// 45% of the genes from the parent 2, and the// rest from a random geneMate(parent2){letchildChromosome="";for(leti=0;i<this.Chromosome.length;i++){// get a random number between 0 and 1letp=Math.random();if(p<0.45)childChromosome+=this.Chromosome[i];elseif(p<0.9)childChromosome+=parent2.Chromosome[i];elsechildChromosome+=TakeRandomGene();}returnnewIndividual(childChromosome);}}// Overloading < operatorclassFitnessComparer{staticCompare(ind1,ind2){returnind1.Fitness-ind2.Fitness;}}// Driver codefunctionMain(){// current generationletgeneration=0;letpopulation=[];letfound=false;// create initial populationfor(leti=0;i<POPULATION_SIZE;i++){letgnome=CreateGnome();population.push(newIndividual(gnome));}while(!found){// mutate and sort the population in increasing order of fitness scorepopulation.sort((a,b)=>FitnessComparer.Compare(a,b));// if the individual having lowest fitness score ie.// 0 then we know that we have reached the target// and break the loopif(population[0].Fitness===0){found=true;break;}// Otherwise generate new offsprings for new generationletnewGeneration=[];// Perform Elitism, that means 10% of fittest population// goes to the next generationlets=Math.floor((ELISTIM*POPULATION_SIZE)/100);for(leti=0;i<s;i++)newGeneration.push(population[i]);// From 50% of fittest population, Individuals// will mate to produce offspring// s = Math.floor((90 * POPULATION_SIZE) / 100);s=POPULATION_SIZE-s;for(leti=0;i<s;i++){letr=RandomNum(0,50);letparent1=population[r];r=RandomNum(0,50);letparent2=population[r];letoffspring=parent1.Mate(parent2);newGeneration.push(offspring);}population=newGeneration;console.log("Generation: "+generation+"\t"+"String: "+population[0].Chromosome+"\t"+"Fitness: "+population[0].Fitness);generation++;}console.log("Generation: "+generation+"\t"+"String: "+population[0].Chromosome+"\t"+"Fitness: "+population[0].Fitness);}// Execute the main functionMain();
Results: plotting the influence of "elitism" on the number of runs to find the solution
For each elitism parameter, I ran 1000 times the algorithm, so for each elitism parameter, I have 1000 number of generations needed to find the guess. From this, I can extract a mean and a standard deviation and the min and the Max.
The axis is the "elitism" factor: how much of the best population do you keep at each generation.
The upper points are the mean. In average, the minimum runs is 80 at approximatively 20% of elitism. Then it is surprising to see that beyond 25% of elitism, it goes quickly up.
The middle points are the standard deviation. Above 25% of elitism, it is really increasing. You see that the variance is 50%, (stddev/avg) which is high.
The lower curve is the minimum generations needed to find the target phrase. It is rather flat between 0 and 40%.
An Elixir Livebook to explore a CSV file produced by the Zig code
This graph has been produced by importing the CSV file into a Livebook, and using Explorer to convert the CSV into a dataframe, and using Vegalite to plot the data.
I reproduced this code in Zig but sophisticated it a bit to get the data aboveo:
run the algorithm say X times and compute the mean, standard deviation, min and Max.
and then I increment the "elitism" parameter,
and finally save this into a CSV file
so that I can use Explorer in a Livebook and chart the data.
The Zig code. It takes quite a long to run (with the parameters below) and consumes at peak 2MB of memory (really low). This is because I write to disk and output to stdout.
it uses a number of hardcoded parameters. In particular, there is no check on them; for example, I used GENE_MIX=0.45 as the amount of genes a chromosome gives up to the next generation, but anything above 0.50 will make the program crash.
conststd=@import("std");
constmath=@import("std").math;
constprint=std.debug.print;
constArrayList=std.ArrayList;
constRandom=std.Random;
constArenaAllocator=std.heap.ArenaAllocator;
constPOPULATION_SIZE: comptime_int=100;
constGENES="abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ 1234567890, .-;:_!\"#%&/()=?@${[]}";
constMAX_TARGET_LENGTH: comptime_int=100;
constMAX_GENERATIONS: comptime_int=10_000; // Prevent infinite loopsconstMAX_TRIALS: comptime_int=2_000;
constMAX_ELITISM: comptime_int=75;
constELITISM_STEP: comptime_int=1;
constELITISM_START: comptime_int=2;
constGENE_MIX: comptime_float=0.45;
constTARGET="I love you baby!";
constIndividual=struct {
string: []u8,
fitness: usize,
fninit(allocator: std.mem.Allocator, string: []constu8) !Individual {
varself=Individual{
.string=tryallocator.dupe(u8, string),
.fitness=0,
};
self.calculateFitness();
returnself;
}
// mutate the Individual struct and compute the fitness:// it is the number of differences between the genes and the targetfncalculateFitness(self: *Individual) void {
self.fitness=0;
for (self.string, 0..) |char, i| {
if (char!=TARGET[i]) {
self.fitness+=1;
}
}
}
fnmate(self: *constIndividual, allocator: std.mem.Allocator, parent2: *constIndividual, rng: Random) !Individual {
constchild_string=tryallocator.alloc(u8, TARGET.len);
for (child_string, 0..) |*gene, i| {
constp=rng.float(f32);
if (p<GENE_MIX) {
gene.*=self.string[i];
} elseif (p<GENE_MIX*2) {
gene.*=parent2.string[i];
} else {
gene.*=GENES[rng.intRangeAtMost(usize, 0, GENES.len-1)];
}
}
returntryIndividual.init(allocator, child_string);
}
};
fncreateGnome(allocator: std.mem.Allocator, ran: Random) ![]u8 {
constgnome=tryallocator.alloc(u8, TARGET.len);
for (gnome) |*gene| {
gene.*=GENES[ran.intRangeAtMost(usize, 0, GENES.len-1)];
}
returngnome;
}
fnindividualLessThan(context: void, a: Individual, b: Individual) bool {
_=context;
returna.fitness<b.fitness;
}
fnrunGeneticAlgorithm(allocator: std.mem.Allocator, elitism: usize, ran: Random) !usize {
varpopulation=ArrayList(Individual).init(allocator);
deferpopulation.deinit();
// Create initial populationfor (0..POPULATION_SIZE) |_| {
constgnome=trycreateGnome(allocator, ran);
constindividual=tryIndividual.init(allocator, gnome);
trypopulation.append(individual);
}
vargeneration: usize=0;
while (generation<MAX_GENERATIONS) : (generation+=1) {
// Sort population by fitnessstd.mem.sort(Individual, population.items, {}, individualLessThan);
// Check if we've found the targetif (population.items[0].fitness==0) {
returngeneration;
}
// Generate new populationvarnew_generation=ArrayList(Individual).init(allocator);
// Elitismconstelitism_count=@as(usize, (elitism*POPULATION_SIZE) /100);
for (population.items[0..elitism_count]) |individual| {
trynew_generation.append(tryIndividual.init(allocator, individual.string));
}
// Matingconstmating_count=POPULATION_SIZE-elitism_count;
for (0..mating_count) |_| {
constparent1=&population.items[ran.intRangeAtMost(usize, 0, 50)];
constparent2=&population.items[ran.intRangeAtMost(usize, 0, 50)];
constoffspring=tryparent1.mate(allocator, parent2, ran);
trynew_generation.append(offspring);
}
// Replace old populationpopulation.deinit();
population=new_generation;
}
returnMAX_GENERATIONS; // If solution not found
}
constRunningStats=struct {
count: usize,
mean: f64,
m2: f64,
min: f64,
max: f64,
fninit() RunningStats {
returnRunningStats{
.count=0,
.mean=0,
.m2=0,
.min=0,
.max=0,
};
}
// mutate the Individual structfnupdate(self: *RunningStats, value: usize) void {
self.count+=1;
constdelta=@as(f64, @floatFromInt(value)) -self.mean;
self.mean+=delta/@as(f64, @floatFromInt(self.count));
constdelta2=@as(f64, @floatFromInt(value)) -self.mean;
self.m2+=delta*delta2;
self.mixMax(value);
}
fngetStandardDeviation(self: *constRunningStats) f64 {
if (self.count<2) return0;
returnstd.math.sqrt(self.m2/@as(f64, @floatFromInt(self.count-1)));
}
fnmixMax(self: *RunningStats, value: usize) void {
constv=@as(f64, @floatFromInt(value));
if (self.min==0) self.min=v;
if (v>self.max) {
self.max=v;
}
if (v<self.min) {
self.min=v;
}
}
};
pubfnmain() !void {
vararena=ArenaAllocator.init(std.heap.page_allocator);
deferarena.deinit();
constallocator=arena.allocator();
varrand=Random.DefaultPrng.init(@intCast(std.time.timestamp()));
constran=rand.random();
constfile=trystd.fs.cwd().createFile("genetic_algorithm_results2.csv", .{});
deferfile.close();
constwriter=file.writer();
// Write CSV headertrywriter.writeAll("Elitism,Avg,StdDev,Min,Max\n");
varelitism: usize=ELITISM_START;
while (elitism<=MAX_ELITISM) : (elitism+=ELITISM_STEP) {
varstats=RunningStats.init();
for (0..MAX_TRIALS) |_| {
constgenerations=tryrunGeneticAlgorithm(allocator, elitism, ran);
stats.update(generations);
// Free memory after each trialarena.deinit();
arena=ArenaAllocator.init(std.heap.page_allocator);
}
// Write summary statistics to CSVtrywriter.print("{d},{d:.2},{d:.2},{d},{d}\n", .{ elitism, stats.mean, stats.getStandardDeviation(), stats.min, stats.max });
// Print to console for immediate feedbackstd.debug.print("Elitism: {d}, Avg: {d:.2}, StdDev: {d:.2}, min: {d}, max: {d}\n", .{ elitism, stats.mean, stats.getStandardDeviation(), stats.min, stats.max });
}
// Ensure all data is written to the filetryfile.sync();
std.debug.print("The results have been saved to 'genetic_algorithm_results.csv'\n", .{});
}
First conclusion:
I used a lot the ziglings exercises, this helped a lot, but it was quite challenging (for me). I had plenty of errors. In particular, you have to be careful with the types (ints vs floats and cast quite a few times as functions take ints or floats), you must be careful between runtime code and compiled time code, manage the memory.
I also cheated and I used Claude.ai: how to export a CSV file, how to generate random numbers in Zig, how to sort a list with the sorting function.
Next step: use Elixir and its easy parallelism to check if I can speed up the computations.
The text was updated successfully, but these errors were encountered:
I wanted to test
Zig
for simple computational algorithms after going through the "ziglings" exercises.I looked into genetic algorithms for the simplicity of these kind of algorithms. It is an optimisation algorithm: instead of a "standard" say gradient descent, you perform a sorting, an elitism and mutation of the population.
What is this challenge?
You are given a phrase (a list of characters) and the program must find it. 🤔
Why?
You can use genetic algorithms for this kind of task, and it is conceptually simple to program and is performant because this problem is 1) simple (it is a kind of sorting problem) 2) has a "global minimum" (only one possible solution).
Genetic algorithms are not silver bullet. Read the "limitations" paragraph on wikipedia.
How?
You start with a list (or "population") of random phrases (or "chromosomes"). Each phrase is made of letters (or "genes"). The length of a phrase is of course the same length as your target phrase.
You will then produce a new generation from the old one. Concretely, you modify your list of genes at each step. This is a convergente process.
Convergence measure
We need of course a metric. The measure of convergence - aka as "fitness" - is simply the number of letters ("genes") in your guess (phrases) that are equal and at the right place (in your target): if "I love you" is your target, you produce a chromosome of random genes of length 10.
The algorithm
Your first generation is an initial random "population" of guesses. This is done by taking a random letter (or "gene") from a given pool of letters (all a..zA..Z0..9 plus special characters), and building a "chromosome" (a string in the Node.js implementation below) of the same size as the length of your target string
You sort your population of "chromosomes" (strings in fact here) with your previously defined "fitness" measure.
You run your genetic operator. You then produce a genetic admixture from this population for your next generation:
you keep the e% best candidates, the "elitism" parameter,
for the rest of your population, you take 45% of one candidate (or "parent") and 45% of another one and 10% random and rebuild a new candidate from it.
Node.js reference code
Node.js: Find my phrase
Results: plotting the influence of "elitism" on the number of runs to find the solution
For each elitism parameter, I ran 1000 times the algorithm, so for each elitism parameter, I have 1000 number of generations needed to find the guess. From this, I can extract a mean and a standard deviation and the min and the Max.
The axis is the "elitism" factor: how much of the best population do you keep at each generation.
The upper points are the mean. In average, the minimum runs is 80 at approximatively 20% of elitism. Then it is surprising to see that beyond 25% of elitism, it goes quickly up.
The middle points are the standard deviation. Above 25% of elitism, it is really increasing. You see that the variance is 50%, (stddev/avg) which is high.
The lower curve is the minimum generations needed to find the target phrase. It is rather flat between 0 and 40%.
An Elixir Livebook to explore a CSV file produced by the Zig code
This graph has been produced by importing the CSV file into a Livebook, and using Explorer to convert the CSV into a dataframe, and using Vegalite to plot the data.
The Livebook code
The Notebook code
Section
data
The Zig code
I reproduced this code in
Zig
but sophisticated it a bit to get the data aboveo:Explorer
in a Livebook and chart the data.The
Zig
code. It takes quite a long to run (with the parameters below) and consumes at peak 2MB of memory (really low). This is because I write to disk and output to stdout.First conclusion:
I used a lot the ziglings exercises, this helped a lot, but it was quite challenging (for me). I had plenty of errors. In particular, you have to be careful with the types (ints vs floats and cast quite a few times as functions take ints or floats), you must be careful between runtime code and compiled time code, manage the memory.
I also cheated and I used Claude.ai: how to export a CSV file, how to generate random numbers in Zig, how to sort a list with the sorting function.
Next step: use Elixir and its easy parallelism to check if I can speed up the computations.
The text was updated successfully, but these errors were encountered: