-
Notifications
You must be signed in to change notification settings - Fork 55
CapacityPlanning
Recently (2011) I've learned that Fityk was described in a book The Art of Capacity Planning by John Allspaw, published by O'Reilly in 2008. Page 77:
An open source program called fityk does a great job of curve-fitting equations to arbitrary data [...]. For our purposes, the full curve-fitting abilities of fityk are a distinct overkill. It was created for analyzing scientific data that can represent wildly dynamic datasets, not just growing and decaying data. While fityk is primarily a GUI-based application, a command-line version is also available, called cfityk. This version accepts commands that mimic what would have been done with the GUI, so it can be used to automate the curve fitting and forecasting.
The command file used by cfityk is nothing more than a script of actions you can write using the GUI version. Once you have the procedure choreographed in the GUI, you’ll be able to replay the sequence with different data via the command-line tool.
If you have a carriage return–delimited file of x-y data, you can feed it into a command script that can be processed by cfityk. The syntax of the command file is relatively straightforward, particularly for our simple case. Let’s go back to our storage consumption data for an example.
In the code example that follows, we have disk consumption data for a 15-day period, presented in increments of one data point per day. This data is in a file called storageconsumption.xy, and appears as displayed here:
1 14321.831192 14452.601933 14586.540034 14700.894175 14845.722236 15063.996817 15250.211648 15403.826079 15558.8181510 15702.3500711 15835.7629812 15986.5539513 16189.2742314 16367.8821115 16519.57105The cfityk command file containing our sequence of actions to run a fit (generated using the GUI) is called fit-storage.fit, and appears as shown below:
@0 < '/home/jallspaw/storage-consumption.xy' guess Quadratic fit info formula # changed, see the notes below quitThis script imports our x-y data file, sets the equation type to a second-order polynomial (quadratic equation), fits the data, and then returns back information about the fit, such as the formula used [...]
I haven't read this book, but it has very good reviews and if you do capacity planning, go buy it!
I have two notes regarding this description.
In version 0.9.5 (after the book was released) the syntax was changed -- I updated one line in the script above (removing in @0
).
In the book it is not explicitly written (or I missed it) why the results from fityk and Excel are different. It is because fityk employs weighted least squares regression. By default the weights are set as sqrt(y), which has theoretical justification if y is the number of independent events. If we set all weights to be equal:
S = 1
we get exactly the same results as from Excel.