QMiner Change Log

20 April 2018

Version 9.2.2

Non-breaking with bug fixes

Bugfix:

Queries with joins sometimes crashed due to unsorted intermediate record set (issue #606)
Added clarification to functions recordSet.each and recordSet.map and fs.readCsvLines
C++ work again!
TMem got slimmer (24 bytes instead of 32 bytes!)
Fixed how exceptions are handled in fs.readLines

13 April 2018

Verstion 9.2.1

Non-breaking with new feature and bug fixes

Feature:

JavaScript APIs cleaned for quantile estimators, moved to analytics.quantiles.

Bugfix:

Typo bug in Json UTF-16 handling
Added implementation for JSON parsing in newRecord
Fixed issue 606 on sorted joins

5 January 2018

Version 9.1.2

Non-breaking with bug fixes

Bugfix:

Added UTF-16 surrogate pair encoding for seralization of non-BMP characters

8 December 2017

Version 9.1.1

Non-breaking with bug fixes

Bugfix:

Replaced deprecated v8 API usages
Assert output stream not closed before write or close

1 December 2017

Version 9.1.0

Non-breaking with new feature and a bug fix

Feature:

TNodeTask aware if executed in async mode

Bugfix:

Removed TQm dependancy from glib/mine/svm

27 October 2017

Version 9.0.0

Breaking with new features

Breaking changes:

SVC and SVR models are backward binary incompatible

Features:

LIBSVM nonlinear classification supported (previously only linear models were wrapped)
Active learning (javascript implementation in analytics module): uses SVC (preferably LIBSVC) and maximum uncertainty criterion for semisupervised classification
Json parsing extended to support scalar values for strings and floats
Faster index joins to go over hash tables and not binary tree

Bugfix:

Lock gets deleted when creating a base with createClean
Added a fix for the text positional index
Fixed division by zero in zscore
Fixed unit test silent failing
Fixed stream aggregate example unit test that was halting
Fixed when parsing string literal with escaped null character
Fixed resize for TCHa larger then 1GB
When parsing json-s we ignore invalid escape characters
Fixed TDir::Exists for Linux
Fixed TVec::Resize to not crash in case of 64bit index and size being TInt::Mx - 1024

Other:

Fixed V8 API deprecated warnings
C++ unit tests for TJsonVal and TDir
Added performance tests for memory allocations

8 September 2017

Version 8.6.1

Non-breaking with no new features

Bugfix:

Analytics PNotify segfault bug fixed (smartpoitners to static notify objects changed to weak pointers)
Unit testing silent failing fix + improved example unit test generation
Several unit tests fixed that were failing
Division by zero fixed in bag-of-words feature extractor (unknown words + IDF weighting bug)
Fixed #550: frequency is computed correctly for tiny joins
Fixed #459 : PJsonVal returning temporary strings given as default value by reference
Fixed #446 : Added timeout parameter to TStore::GarbageCollect()
Fixes #455 : Fixed documentation and example for fs.readLines(...)

1 September 2017

Version 8.6.0

Non-breaking with new features

Features:

Greenwald-Khanna algorithm for online quantile estimation exposed in nodejs (analytics.Gk).
CKMS algorithm for onine biased quantie estimation exposed in nodejs (analytics.BiasdeGk). This algorithm is more accurate on extreme values (example q=0.0001).
TimeWindowGk algorithm accuracy and speed optimized.
external stream aggregate extensions
external qm nodejs extensions
Added a configuration hash table that can be used to specify custom sizes for cache for different index types (storage memory improvement).
histogramAD API extended to return largest normal bin/value

Bugfix:

LinAlgStat::Mean index out of bounds fix (sparse vector case)
Bugfix for TNodeJsRecSet::getVector (doesn't crash on null fields)
LIBSVM j parameter supported (was ignored before)

Documentation:

histogram anomaly detection updated
Greenwald-Khanna and CKMS (analytics module)

21 July 2017

Version 8.5.0

Non-breaking with new features

Features:

DpMeans algorithm in nodejs API. The algorithm fixes the radius of each cluster and the number of clusters is variable
Clustering quality measure for kmeans and dpmeans

Bugfix:

Inplace sparse vector linear combination assertion

14 July 2017

Version 8.4.0

Non-breaking with new features

Features:

Added support for Node.JS 8
Improved positional text indexing. Uses modulo 1024 instead of 256 (less false positives), stores all mentions of words in a document (before limited to 8 occurrences) and uses about 20% less space for the index.
Implementation of a windowed quantile estimation algorithm on streams based on [1].
Quantile estimation Node.js analytics models analytics.TimeWindowGk (timestamp based window) and analytics.CountWindowGk (count based window). Implements the standard analytics module API (partialFit,predict, save).
Quantile estimation stream aggregate with type windowQuantiles, reads from a window buffer and updates the statistics when data enters the buffer or when data leaves the buffer.

[1] http://infolab.stanford.edu/~datar/courses/cs361a/papers/quantiles.pdf

16 June 2017

Version 8.3.0

Non-breaking with new features

Features:

added TWPt serialization as in TPt
updated TStrHash.GetMemUsed() which used the nonavailable GetMemUsedDeep() method
added filtering classes TRecFilterByFieldByteSet, TRecFilterByFieldUIntSet, TRecFilterByFieldUInt64Set

26 May 2017

Version 8.2.1

Non-breaking with bug fixes

Bug fixes:

Positional index: items are not necessarily sorted
Positional index: Def() has to be called in case some items were deleted

5 May 2017

Version 8.2.0

Non-breaking with a new feature

New features:

qm.flags includes compiler version and sizeof information

21 April 2017

Version 8.1.0

Non-breaking with a new feature and bugfix

New features:

JsonVector (new JSON type supported for vectors)

Bug fixes:

assert the base create mode is valid
fixed error messages

31 March 2017

Version 8.0.0

Breaking with new features and bugfixes

Breaking:

binary compatibility of GIX

New features:

introduced tiny gix index in TIndex which does not store any frequency information
introduced position index which can index words and their position in a string
search over phrases with gaps based on the position index

Bug fixes:

fixed GetMemUsed() in THashSetKey

24 March 2017

Version 7.11.1

Non-breaking with a bugfix

Bug fixes:

Gix memory usage overflow fix

3 March 2017

Version 7.11.0

Non-brekaing with new features

New features:

Reimplemented online linear regression with more predictable influence of regaluarization and forgeting factor parameters.
Added TStrUtil::GetStr(int), formats number 1234567 as "1,234,567"
Added NotifyInfoFmt, NotifyWarnFmt, NotifyErrFmt to the TLogger

Bug fixes:

PartialFlush update should fix the problem caused due to using of some deleted itemsets

Other:

Cleaned up duplicate code introduced whith GixSmall
Added Windows pre-gyp for Node 7

24 February 2017

Version 7.10.1

Non-breaking with a bug fix

Bug fix:

Fixed linked list memory computation

10 February 2017

Version 7.10.0

Non-breaking with new features

New features:

tdigest wrapped as an analytics model, used to approximately track quantiles on streams
hashtable key id is exposed in ht module

Bug fix:

javascript feature extractor that returns dense vectors crash fixed

Other:

Documentation fixes and added examples

27 January 2017

Version 7.9.0

Non-breaking with new features

New features:

created common API for calculating memory usage in containers
calculating stream aggregates memory footprint
type trait API for detecting shallow types and containers
optimized TVec memory footprint calculation using type traits (only for C++11)
new API for multinomial feature extractor transformation

13 January 2017

Version 7.8.1

Bug fix:

Fixed FilterByFq on TRecSet

6 January 2017

Version 7.8.0

Non-breaking with new features

New features:

Exposed keyid for hashtables in javascript
Stay-Point-Detector aggregate (third party) that aggregates GPS time-series
loadStateJson and saveStateJson added to stream aggregates (alternative to binary save and load)

30 December 2016

Version 7.7.0

Non-breaking with new features and bug fixes

New features:

Added log transform to multinomial feature extractor
Extended filter options for DMoz classifier (wildcard supported)
Node 7 supported

23 December 2016

Version 7.6.0

Non-breaking with new feature and a bug fix

New feature:

Store that only holds schema and has no disk footprint: TStoreEmpty

Bug fix:

TJsonVal can be parsed from string in multiple threads

16 December 2016

Version 7.5.0

Non-breaking with new feature and big fixes

New feature:

Measuring stream aggregate performance. Exposed through TBase::GetStreamAggrStats()

Bug fixes:

Fixed silent exceptions in JavaScript stream aggregate
Javascript serialization of stream aggregate prohibited output stream to close properly
Nearest neighbor anomaly detector did not output complete explanation

Other:

Fixed out-of-sync example timeseries

2 December 2016

Version 7.4.0

Non-breaking with new features and a bug fix

New features:

Record switch aggregate (TRecSwitchAggr) reads strings from records and triggers other aggregates based on an internal hash map.
Time series sparse vector tick (TTimeSeriesSparseVectorTick) reads timestamps and sparse vectors from records, implements ISparseVec and ITm interfaces.
Sparse vector circular buffer (TWinBufSpV) reads from TWinBuf and stores the buffer values in memory as a circular buffer.
Stream aggregates can pass the caller when triggering onAdd, onStep and onTime of other aggregates.
TWinMemBuf supports separate aggregate that provides time.

Bug fix:

Fixed parsing dates in JSON objects (providing a Date object for a datetime field of a record now works).

Other:

Renamed TStreamAggr::GetParam -> GetParams (consistency with JS).
Renamed TStreamAggr::SetParam -> SetParams (consistency with JS).

25 November 2016

Version 7.3.0

Non-breaking with new features and bug fixes

New features:

Aggregating resampler (TAggrResampler) The resampler computes aggregates over consecutive equally sized intervals. It implements summing, averaging, max and min.
Added TStorePbBlob::GarbageCollect()
Added TRecFilterByFieldNull

Bug fixes:

When calling saveJson() on uninitalized TOnlineHistrogram, an exception was created. Now it serializes the current state.
Fix for deleting blobs; freed space from older blob files gets reused on following inserts. Now we always correctly know which file has free space where to put the new buffer. Had to extend the TBlobBs file with a parameter ReleasedSize that returns a value if the blob is moved and the previous buffer is released. Needed for monitoring which places in the files are free.
TRecSet::DoJoin fixed when using types onther than uint64 for field join
When deleting records, we need to call DelJoin without the freqency parameter. Otherwise we might keep some joins to deleted records.
TStorePbBlob::IsRecId did not work if all data for store was in memory

4 November 2016

Version 7.2.0

Non-breaking with new feature

New Feature:

Graph cascades expose topological order in JavaScript API

28 October 2016

Version 7.1.0

Non-breaking with new features and bug fixes

New feature:

graph cascades model (modeling times of visiting nodes for directed acyclic graph sweeps)

Bug fixes:

Fixed documentation (broken links in nnets)
TSIn, TSBase optimized (does not create redundant strings any more)
TStorePbBlob several fixes
TRecSet::DoJoin optimized
TNNAnomalyAggr initialization fix

21 October 2016

Version: 7.0.2

Non-breaking with bug fixes

Bug Fixes:

Fixed broken links in documentation (#481)
Fixed bug in feature space. Output vector when calling TFtrSpace::GetSpV was not cleared when not empty.

14 October 2016

Version: 7.0.1

Non-breaking with bug fixes

Bug Fixes:

Fixed histogram anomaly detector severity classifier
Fixed bad casts (unsigned)(int64) to (unsigned long long)(int64).
JS stream aggregate exceptions come with stacktraces, not just messages
JS stream aggregate this fixed
base construction with createClean mode made safer

7 October 2016

Version: 7.0.0

Breaking with new features

New Features:

Added getNameInteger and getNameFloat for stream aggregates in JavaScript (INmFlt and INmInt).
Online histogram can resize accordingly to new observed values.

Bug Fix:

BREAKING: stream aggregates return Unix timestamps on JavaScript side and Windows timestamps on C++ side (issue #286)
- new Date(sa.getTimestamp()-11644473600000) => new Date(sa.getTimestamp()) where sa instanceof qm.StreamAggreator.
Cleaned INmFlt and INmInt interfaces.

30 September 2016

Version: 6.5.1

Non-breaking with a bug fix

Bug Fix:

Fixed support for index joins in records by value

23 September 2016

Version: 6.5.0

Non-breaking with new features

New Features:

qm.stats property lists statistics on constructor and destructor calls
Histogram smoothing using kernel density estimation in THistogramToPMFModel
Histogram based anomaly detection stream aggregate
Nearest neighbor anomaly detection stream aggregate
Optimized Record set filter over code book strings

Bug Fixes:

Lots of fixes to PgPage and its associated store
Again compiles under debug mode in Visual Studio

2 September 2016

Version: 6.4.0

Non-breaking with bug fixes

New Features:

Added TStorePbBlob::GetFirstRecId() and GetLastRecId()
Added TVec::GetMnValN and TVec::GetMnVal
Added TInt::GetSepStr() to help formating numbers (1234 -> "1,234")
Modified KMeans.transform to return a matrix of distances to centroids
Added method TLinAlg::GetKernelVec, which returns a vector in the kernel of a matrix
Added new resempler stream aggregate that can read from input stream aggregates and push data to other stream aggregates
Added TStreamAggr::GetParam and TStreamAggr::SetParam to check and modify stream aggregate parameters after construction

Bug Fixes:

Fixed several bugs in TStorePbBlob
Fixed KMeans cosine distance generating NaN
Fixed compile warnings in TGix

19 August 2016

Version: 6.3.1

Non-breaking with bug fix

Bug fix:

TStoreImpl got wrong value for cache size parameter when loading from disk.

12 August 2016

Version: 6.3.0

Non-breaking with new features

New Features:

Gix updated to speed up deletes of records, especially when using FIFO stores
Support methods for byte fields in TStore
Added qm.RecordVector which can hold array of records passed by value. Vector support serialization using QMiner streams.
Standard deviation qm.statistics.std now supports la.Vector as input

5 August 2016

Version: 6.2.0

Non-breaking with new features

New Features:

Speed up of RecSet.SortByField
Added Store.GetFieldNmByte and Store.SetFieldNmByte
Tokenizer uses unicode as default type in constructor

Other:

Updated documentation: added missing types, descriptions, links and methods. Reduced number of clicks to get to specific information.

22 July 2016

Version: 6.1.0

Non-breaking with new features

New Features:

Changed TStreamAggrOut interfaces to be flat
TTDigest Stream aggregate can wait for N elements before it is initialized

Bug fixes:

Calling DelObjKey on key that is first in KeyValH makes following serialization invalid. ObjKeyN starts with 1 which makes invalid json object. Relevant change is in TJsonVal::GetChAFromVal()
TStr::SearchStr return exception when called on empty string
TStrHash created temporary TStrs when computing hash codes creating significant overhead without any good reason
TSAppSrv::OnHttpRq does better check for valid URL
Removed old debug code in TStr
Issue #418: Categorical feature extraction documentation - Removed the 'values' from the construction documentation.
Issue #439: Added the two missing optional parameters in the new KMeans constructor, fitIdx and fitStart. Also fixed the documentation for KMeans constructor parameter and added some new unit tests for KMeans.
Issue #449: Not all methods used for KMeans.fit were implemented when using distanceType: "Cos". Added unit tests for the fit and predict methods in the case of distanceType: "Cos".

Other:

Replaced tabs with spaces in sappsrv.cpp

24 June 2016

Version: 6.0.0

Breaking with new features

New Features:

Removed TStreamAggrBase and introduced TStreamAggrSet instead.
(breaking) Adjusted rest of the codebase to TStreamAggrSet replacing TStreamAggrBase.
Introduced new record filters which now all derive from TRecFilter and most have JSON constructors.
Added TStreamAggrFilter which calls given stream aggregate only when record passes given record filter.
(breaking) Added window buffer stream aggregate that keeps values in memory.
References to store and stream aggregate can be passed as parameters in jsons as object when creating new stream aggregates.

Bug fixes:

Fixed clang warnings in printf for uint64
(breaking) Fixed stream aggregates that worked on window buffer to correctly work in case on OnTime and OnStep triggers.
getSubmatrix can not get the last row and column of a matrix

Other:

Added fs.js to documentation generation
Moved instructions for building OpenBLAS to qminer wiki
Normalized few more files replacing all leading tabs to 4 spaces
Added script for noramlizing tabs to spaces
Cleaned output of example tests
Examples from documentation are executed using async to avoid base colisions

26 May 2016

Version: 5.3.0

Non-breaking with new features

New features:

Non-negative matrix factorization: qm.analytics.nmf(mat, k, json)
Recommender System (using nmf): new qm.analytics.RecommenderSys(params)
added TFtrExt::GetFtrRange() which returns the range of the feature
added method TJsonVal::SetArrVal

Bug fixes:

fixed concurrency bug when executing code from worker thread on the main thread
fixed TNodeJsUtil::GetFldObj and TNodeJsUtil::GetFldFun

Other:

testing non-implemented stuff removed
new API for inverting feature vectors
moved StreamStory to third_party
started cleaning TLinAlg: added some new classes, removed most ifdefs
added macros for TLinAlg templates

6 May 2016

Version: 5.2.0

Non-breaking with new feature

New feature:

Added binary option to multinomial feature extractor: check only for presenc of value and does not weight by count

Bug fix:

TSimpleLinReg::SaveState failed as it saved object and loaded smart-pointer.

15 Apr 2016

Version: 5.1.0

Non-breaking with new features

New features:

TStoreImpl can tell integer ID for codebook strings. TFieldDesc can tell if field is encoded using codebook.

Bug fixes:

Issue 400: RecSet.saveCsv should escape ” using ”” and not \”

Other:

TStr::Empty() uses Assert instead of IAssert to confirm Inner is either NULL or points to nonemtpy string.

8 Apr 2016

New version 5.0.0

Breaking with new features

New features:

Stores from same TBase share same PBlobBs. Speed improvements for 1000 empty stores:
- create: 0.5s vs 21s
- save: 0.6s vs 5s
- load: 0.06s vs 4s
Removed unused flags from blob pointer, freeing 37.5% space per TBlobPt
Changed TBlobPt segment parameter from uchar to uint16, increasing max blob base size to 128TB
KMeans reimplemented in C++: Templated and wrapped Stopar's KMeans, which is implemented clustering.h and clustering.cpp. The javascript wrapper contains the same functions as before.

Constructor parameters are:

name	type	description
iter	number	Number of iterations
k	number	Number of centroids
verbose	boolean	If false, the console output is supressed
centroidType	string	Type of the centroids. Options: "Dense" or "Sparse"
distanceType	string	Distance type used at the calculation. Options: "Euclid" or "Cos"

Properties centroids, medoids, idxv.

Methods getParams, setParams, getModel, fit, fitAsync, predict, transform, permuteCentroids, explain, save

The fit method can be used asynchronously (fitAsync).

var qm = require('qminer');
var params = { iter: 10000, k: 2, verbose: false, distanceType: 'Euclid', centroidType: 'Dense' };
var kmeans = new qm.analytics.KMeans(params);
// create the matrix
var mat = new qm.la.Matrix({ rows: 1000, cols: 300, random: true });

//- Synchronous use of fit
kmeans.fit(mat);

//- Asynchronous use of fit
kmeans.fitAsync(mat, function (err) {
    if (err) { console.log(err); }
    // successful fitting
});

Bug fixes:

fixed clang warnings
changed tabs to four spaces on qminer source files
Fixed issue 398 — move stat.h stuff to xmath.h
Fixed issue 399 - stream aggregate example description

25 Mar 2016

New version 4.10.0

Non-breaking with new features

New features:

TStreamAggrOnAddFilter class can be extended to override CallOnAdd method that takes a record and returns true if the aggregate should process it. Currently we have two filters: default that passes every record and TOnAddSubsampler, which can skip a given amount of records for every time the aggregate is actually updated. TSimpleLinReg is currently the only aggregate that supports filtering. Example:

var linReg = store.addStreamAggr({
    filter: { type: "subsamplingFilter", skip: 3 },
    type: 'simpleLinearRegression',
    inAggrX: winX.name,
    inAggrY: winY.name,
    storeX: "Function",
    storeY: "Function",
    quantiles: [0.25, 0.75]
});

Bug fixes:

#394. Added asserts for invalid record IDs in the buffer which are a result of store window being too short (incompatible with the window aggregate).
#264 (incorrect exception handling) TJsonVal has two new functions: AssertObjKeyStr, AssertObjKeyNum. The functions take a second parameter (function name), where __FUNCTION__ can be used (not standard but works on msvc and gcc). Example: ParamVal->AssertObjKeyStr("timestamp", __FUNCTION__);
#372. all JS vectors have toArray function.
#350. arm publish script added

Other:

deleted example tests (generated by travis)
removed datasets tests (will be moved to qminer-data-loader)
global mocha instalation is not required any more. tests can be run by calling npm test
made tests in store_partial_flush.js silent
updated GitHub Wiki (part of #351)

18 Mar 2016

New vesion 4.9.1

Non-breaking with a big fix

Bug fix:

Primary keys could be set to existing values belonging to other records. Now we throw exception in such cases. Added corresponding tests.

11 Mar 2016

New version: 4.9.0

Non-breaking with new features

New features:

New aggregates:
- TWinBufFltV (type: timeSeriesWinBufVector), connects to timeSeriesWinBuf, implements IFltV (holds a vector of buffer values in memory)
- TSimpleLinReg (type: simpleLinearRegression), connects to two IFltV aggregates, computes a linear fit and (optionally) quantile bands

Bug fixes:

TGix::PartialFlush had a bug when saved item set got assigned a new TBlobPt. Fixed #386
TInMemStorage::DelVals did wrong accounting when deleting bigger chunks of records
SVR using libsvm unallocated memory fix
VS2015 warning fix (StackWalker)
Queue made more efficient, added standard API (Front(), Back()), refactored variable names to be more meaningful and added unit tests.
Removed IOB error compensation in TQQueue::GetSubValV (now throws an exception) and renamed the method to TQQueue::GetSubValVec, so it doesn't compile (behaviour change). Any users should rename GetSubValV to GetSubValVec and check the correctness of indices when calling.
async MDS segfault fix
GUID method changed (can generate more than 10M IDs per second)

Other:

Unit tests, documentation, examples: timeSeriesWinBufVector, simpleLinearRegression

4 Mar 2016

New version: 4.8.0

Non-breaking with new features

New features:

analytics.MDS now has async version
la.svd now has async version

Bug fixes:

Renamed SparseMatrix.submat to SparseMatrix.getColSubmatrix. Fix #402
TQm::TBase now again backwards compatible with respect to loading settings. Fix #401

Other:

BlobBs file size limit extended from 1GB to 2GB
la.svd and la.qr moved from la_structures_nodejs.h

26 Feb 2016

New version: 4.7.0

Non-breaking with new features

New features:

TStrUtil can transform THashSet to string
Async version of analytics.MDS
New stream aggregate threshold, that returns 1 if input number above threshold, 0 otherwise.
Added getColSubmatrix function to SparseMatrix that gets IntVector of column ids and returns sparse matrix constructed from selected columns.
Added clear function to SparseMatrix that clears its content and sets rows to -1.

Bub fixes:

TClust::TAbsKMeans no longer returns empty clusters
Numeric::InvFtr wrongly de-normalized numbers
TVec Move constructor: no need to delete ValT
TVec Move assignment: Delete internal ValT pointer only if you own it.

Other:

Added examples to qm.la module

12 Feb 2016

New version: 4.6.0

Non-breaking with new features

New features:

Added new query aggregate that performs simple counting of records over some datetime column with provided granularity.
Added move constructor and assignment operator to TKeyDat
Added move assignment operator to THashKeyDat

Bug fixes:

Removed clang compile warnings for MDS and TTimeSpan
documentation generation: jsdoc-baseline version fixed

Other:

Faster and more focused CI testing
new branch for releases (ci_matrix) that tests and publishes the full version

5 Feb 2016

New version: 4.5.0

Non-breaking with new features

New features:

Added support for default field-storage location for whole store. Add special tag storage_location in the options node of store schema. In the following example, Here, all fields of the store will be stored in cache as this is now the default for the whole store. Each individual field can still override this setting.

{
    "name": store_name,
    "fields": [
        { "name": "name", "type": "string" },
        { "name": "val", "type": "int" }
    ],
    "options": {
        "storage_location": "cache"
    }
}

Added on onTime(unit64 TmMsec) and onStep() functions that enable updating of aggregates without adding a new record in the store. onTime(unit64 TmMsec) is added to the NodeJs interface, while onStep() is an internal function.

Bug fix:

MDS no longer generates compile warnings

29 Jan 2016

New version: 4.4.0

Non-breaking with new feature

New feature:

Vector.sparse takes an optional integer argument for the sparse vector dimension, which can be set to -1 for unknown

Bug fixes:

SVR debug test fix
MDS no longer returns compile warnings and actually uses selected distance metric.
TTDigest MergeValues() tests no longer break in debug mode (streamaggr.js sequential insert test)

Other:

debug builds are now tested

22 Jan 2016

New version: 4.3.0

Non-breaking with new features

Features:

Location of join fields can be defined in schema (memory or cache).
Nearest neighbor anomaly detector explain exposes first and last record ID
Nearest neighbor anomaly detector accepts vector of rates (as opposed to only single rate). Predict returns position of the rate that is reached starting with 1 (or 0 if none).
Can disable field name validation (scrictNames in base definition)
TLinAlg can solve generalized eigenvalue problems
added TRecSet::TRecFilterByFieldUInt64
added TIndex::HasJoin(const int& JoinKeyId, const uint64& RecId) const
added THash::THash(const TVec<TKeyDat<TKey, TDat> >& KeyDatV) constructor
added TStr::GetNrNumFExt can generate any number of leading zeros

Big fixes:

GYP fixed to make libsvm work.
qm.saveCsv puts headers always in quotes
TZipIn::CreateZipProcess puts filename in quotes
Consolidated all references to records with frequency to Fq.

15 Jan 2016

New version: 4.2.0

Non-breaking with new features

Features:

Nearest neighbor init method exposed
TNodeJsFtrSpace factory constructor added
IFtrSpace interface added, implemented by TWinBufFtrSpVec, exposed in JS
Field-join binary representation: Field-joins can now be stored in more compact way. Develop can specify field-types to be used for storing the field-join's record id and frequency by providing storage tag. Default is uint64-int. It is also possible to set frequency type to empty string, which means that frequency will be always 1 and it wont take any space in the storage. Example: joins: [{ name: 'parent', type: 'field', store: 'People', storage: 'int16-byte' }]
filterByField: Recordset provides utility method filterByField that can now also operate on field joins: recordset.filterByField("parent", parent_id, parent_id);. It currently accepts record ids. Caller can provide a range of IDs (min and max).
added tdigest stream aggregator for estimating any percentile from streaming data

Bug fixes:

Sparse vector normalize fixed
TWinBufFtrSpVec save/load fix
writeJson and readJson do not parse and stringify in C++ but instead use JSON.stringify and JSON.parse
added readString to FIn that complements FOut.writeBinary

Other:

test stream aggregate getFeatureSpace
TWinBufFtrSpVec save/load test

8 Jan 2016

New version: 4.1.0

non-breaking with new features

Features:

Stream aggregate TEmaSpVec - exponential moving average for sparse vectors.

Bug fixes:

TNodeJsFuncStreamAggr supports the IsInit method of the TStreamAggr interface provided by init function. Added unit tests.
TOnlineHistogram supports additional init logic by specifying minimum count (when we have less than the given min, init is false).
Code for new int16 and int64 types was copy-pasted in one place and not fixed.

Other:

Added tests and documentation for Tokenizer and PCA

24 Dec 2015

New version: 4.0.1

patch

Bug fixes:

Fixed reflexion of storage related objects (example: store.allRecords now reports to be a getter instead of a value). API was not changed. This fixes Tonic crashes.

18 Dec 2015

New version: 4.0.0

breaking with new features

Features:

Now works with Node.js 4.x and 5.x
(breaking) QMiner now supports the following new types:
- byte - unsigned value between 0 and 255
- int16 - 16-bit integer
- uint16 - 16-bit unsigned integer
- uint - 32-bit unsigned integer
- int64 - 64-bit integer
- sfloat - single-precision float value (existing type float uses double precision)
- json - arbitrary javascript object. Internally it will be stored as JSON. Automatically (de)serialized.
- blob - binary buffer (uses TMem internally). When new record is created, this field needs to be sent in as base64-encoded string. When the record is accessed, the field is represented and can be manipulated as javascript's Buffer object.
TStreamAggrOut two interfaces are now templated IValTmIO and TValVec
- TFltTmIO is a typedef for IValTmIO<TFlt>
- TFltVec is a typedef for TValVec<TFlt>
TWinBuf is now templated (according to new templated interfaces) and an abstract class. A derived class must implement TVal GetRecVal(const uint64& RecId).
- Two derived classes:
  - the old TWinBuf is implemented as TWinBufFlt : public TWinBuf<TFlt>
  - TWinBufFtrSpVec : TWinBuf<TIntFltKdV> takes a JSON array (or a single JSON) of feature extractor descriptors and computes sparse vectors of type TIntFltKdV on records — Added new interface functions to stream aggregate Node.js API
Added support for JSON argument inputs for: extractVector, extractSparseVector, extractMatrix, updateRecord, updateRecords. The methods that expected a record set can now take an array of JSON objects, where each object respects the store schema.
Added sparse-vector sum aggregate for sparse vectors that maintains centroid vector of sparse vectors coming out of window buffer feature space aggregate.
Introduced qminer-data-loader NPM module to handle datasets for examples.
Node.js TStore implementation that let us wrap external data sources as stores. For now works with feature extractors
Joins have index type by default.

Bug fixes:

MDS documentation fixed.

Other:

Added recordSet.sortByFq to documentation
Added examples to linear algebra
Updated travis and appveyor to test: arch x64/x86 - node 0.12/4/5 - platform win/linux/osx
Made qm structures safe for Tonic notebooks (no crashing due to infinite recursion)

04 Dec 2015

New version: 3.6.0

non-breaking with new features

Features:

Implement full API for MDS in qm.analytics
Record set filter by boolean
FeatureSpace.extractSparseVector can directly accept JSON, no need to do store.newRecord(JSON) before.

Bug fixes:

Assert valid names on stream aggregates
Fixed text query returning non-weighted results bug (issue #176)
Fixed record set weighted sampling to actually work as promised (issue #177)
TStore::GarbageCollect() works well for stores with only in-mem fields (issue #329)
Fixed createExampleTests.js to not remove * from code
Cleaned sparse matrix JS constructor
Optimised dense matrix multiplication for row-major
Propagate LIBSVM error messages (issue #303)
Use TNotify for debug and error messages in LIBSVM (issue #302)

Other:

Added documentation and tests for timeWindow definition on stores (issue #329)
Added documentation and tests for MDS (issue #309)
Removed Eigen from repository, now included as git submodule
Added unit tests for LIBSVM (issue #301)

27 Nov 2015

New version: 3.5.0

non-breaking with new features

Features:

Stream aggregates that get time series on input now support delayed inputs (can get more then one value per iteration): online histogram, window aggregates (sum, max, min, mean, variance)
Time series tick and window buffer can read from numeric fields of type other then double

Bug fixes:

LIBSVM sparse matrix bug-fix when working with sparse vectors
Multinomial fix for sparse vectors (does not store zero elements)
Nearest neighbor anomaly detector explains more in explain
ClassTPE defined reference counter is now protected and not private
Chi2 stream aggregate cleanup (save/load, etc.)
Stream aggregates implemented in JavaScript can (de)serialize their state
Renamed TNodeJsSA->TNodeJsStreamAggr
Renamed TNodeJsStreamAggr -> TNodeJsFuncStreamAggr

20 Nov 2015

New version: 3.4.0

non-breaking with new features

Feature:

Stream aggregates have reset() function that resets their state
Added serialisation to Chi^2 and online histogram
exposed FAcecss (mode in which base is opened) to js side in qm.base.getStats() method
Decision tree: explain for positive examples, correlation between attributes
Support for writing Node.js async code in C++: TNodeTask, macros for defining async functions, callback execution on main thread
Multinomial feature extractor can use numeric field as source for weight
Window stream aggregate:
- possibility of delay before things go into the window
- changed interface: input and output elements both vectors
- does not store windowed elements anymore, keeping only pointers to store

Bug fix:

Replaced nodist with nvmw to prepare binaries for Windows. (nodist started acting funny)
removed automatic closing and flushing file stream in .save(fout) and .load(fin) functions in online regression metric fixed unit tests according to previous commit
bugfix in resampler stream aggregate .load method
Compensation for numerical errors in TSpecFunc::BetaCf in xmath.cpp.

Other:

Tests do not output to console anymore
Renamed TWindowBuffer to TWinAggr

13 Nov 2015

New version: 3.3.0

non-breaking with new features

Feature:

Added LIBSVM (algorithm name "LIBSVM"), currently we have SVC and SVR
Changed chi2 algorithm so it computes a two sample test
multidimensional scaling for data visualization
EIGEN support (gyp updated). EIGEN will be added to qminer repository (third_party)
save and load in TRecBuffer. The buffer now stores record IDs as opposed to records
online regression metrics now have save and load
spread sheet parser TSsParse can take stream as input
added Decision Tree (split: InfoGain, GainRatio, prune: min examples threshold)
async reading of CSV
added record by value vector in qm module for async processing
FeatureSpace.updateRecordsAsync
FeatureSpace.extractMatrixAsync

Bugfixes:

Sort works with multiple threads and is more robust. Sort can take TRnd as argument.
undefined behaviour bug (works different on ARMv7): casting double to uint64 should be: (unsigned)(int64)(double)
portability problem with casting char * to double * (ARMv7 bus errors)

Other:

qminer works on tonic: go to https://tonicdev.com/npm/qminer
qminer win 32bit and linux 32bit binaries are published in the cloud
moved logistic regression classifier to classification.h/cpp

6 Nov 2015

New version: 3.2.0

non-breaking with new features

Feature:

Added build time to flags (require(‘qminer’).flags)

Other:

no longer depends on libuuid on Linux and Mac OSX, now we include sole library to handle this task.
can build on ARM v7 (RaspberryPi 2)

30 Oct 2015

New version: 3.1.0

non-breaking with new features and bug fixes

Features:

TNodeJsUtil::ExecuteJson — execute JS function that returns JSON.
ChiSquare stream aggregator - takes two IFltVec stream aggregates on the input and performs ChiSquare test
A new aggregator TOnlineSlottedHistogram was added that accumulates data from equivalent historical periods. For instance the same hours of the day or the same minutes of the hour.
A new aggregator TVecDiff was added that subtracts one vectors from another. This is useful e.g. when we have histogram for the last 2 hours and another histogram for the whole history. This new aggregate will output the difference in distributions.
JS Sparse matrix can save itself to text format supported by Matlab
Feature space can use only one of the extractors in extract* methods (given as second argument)
OneVsAll can get binary matrix as target label
clustering.h now has agglomerative approach
Added async methods (fitAsnyc on StreamStory)
Helper async methods available in TNodeJsAsyncUtil

Bug fixes:

CI does not fail if same version was already published
Movies example now works
TSparseColMatrix and TSparseRowMatrix now compute dimensions
Bag of words feature generator clears word hash table on Clr()
Added checks for feature space and record compatibility
Fixed issue #187
Undid skipped tests for fresh date, text feature extractor, prop hazards

Other:

Added some C++ unit tests, removed old obsolete files.
Example tests for examples that work over http server (currently only for stream aggregate example)
Removed deasync dependency
Documented analytics.preprocessing

23 Oct 2015

New version: 3.0.0

breaking with new features and bug fixes

Features:

On each release binaries for Windows, Linux and Mac OS are automatically prepared and uploaded to Amazon S3. npm install qminer no longer needs to compile from source. To force recompile use npm install qminer --build-from-source.
TNodeJsUtil::GetArgTmMSecs can extract time from javascript function argument (ISO String, Date or timestamp)

Bug fixes:

Time stamps coming from Node.js now treated as signed integer
Cleanup code for indicators in signalproc.h
Fixed serialization of KMeans in analytics.h (breaking)
Removed redundant constructors in stream aggregators (breaking)
Smaller release binaries on Linux (removed debug symbols)
KMeans in clustering.h did not update centroids, now it does
Tested and removed bugs in metrics.ClassificationScore and metrics.PredictionCurves in analytics.js.

Other:

Increased timeout for tests, needed for slow runs on Travis or Appveyor CI
Documentation and tests for metrics in analytics.js.

16 Oct 2015

**New version: 2.6.0 **

non-breaking with new features and bug fixes

Features:

Online histogram in signalproc.h, which can inc/dec bing counts
Histogram stream aggregate which can attach to tick (no forgetting) or window (forgetting)
FindAll and FindAllSatisfy added to TVec to get all IDs of occurrences of (a) given value, or (b) values that satisfy given function
TSpecFunc::StudentCdf student cumulative density functions
New cluster methods in clustering.h: K-Means, DP-Means
Hierarchical Markov Chain in mc.h working on continuous time
fs.readLines(fin, onLine, onEnd, onError) which iterates over file/inputs stream/node buffer and executes callback for each line
Base.isClosed(), returns true if base closed
Helper functions in TNodeJsUtil for checking if argument is undefined, getting function from argument and getting function, int and float from V8 object

Bug fixes:

Bad parameters when creating stream aggregates in Node.JS do not crash the whole process
Unwrapping has additional checks to prevent crashing
Sort does error and exception checking around JavaScript callbacks
TTm to ISO String fixed to always have 3 digits for milliseconds
Stemmer no longer crashes on strange parameters
SVD works on 1-dimensional inputs- Fixed confusions between C++ and JS timestamps. C++ side now consistently uses Windows timestamps (milliseconds from 1601-01-01) and JS uses milliseconds since 1970-01-01 (same as Date.getTime())
No more skipped tests for stream aggregates and resolved associated issues.
Issues closed: 197, 196, 189, 188, 183, 192, 230, 198

Other:

Improved documentation for base schemas, record sets, etc.

2 Oct 2015

New version: 2.5.0

non-breaking with new features and bug fixes

Features:

added BTree index for efficient numeric range searches. Supported data types: int, float, uint64, datatime
Regression error metrics: batch and online metrics
Recordset.filterByField - Added support for null values for numerics and datetime. Also, added support for datetime-filtering via string or uint64 (Unix msec-epoch).

Bug fixes:

Other:

Unit tests and documentation for NNet
Code cleanup
Documentation generation fixes and enhancements

25 Sep 2015

New version: 2.4.0

non-breaking with new features and bug fixes

Features:

record.setField, store.newRec and recset.filterByField accept unix timestamp as input for datetime fields
fout.writeBinary writes binary serialization of JS strings, numbers or jsons to GLib output stream (TFOut)
k-means has explain method which returns medoid of the cluster new instance is assigned to

Bug fixes:

fixed memory leak when assigning emtpy TVec to another empty TVec
automatic removal of timestamp in generated javascript documentation (jsdoc) to avoid conflicts at merging documentation

18 Sep 2015

New version: 2.3.0

non-breaking with new features

Features:

KMeans can get seed centroids as parameters

Bug fixes:

TFIn reading buffer beyond EOF
TVec::AddSorted made faster
Feature space constructor checkes parameters

Other:

Cleaned and updated SNAP examples and documentation
Added required APIs, documentation and tests for logistic regression, proportional hazards and neural networks

11 Sep 2015

New version: 2.2.1

non-breaking without new features

Bug fixes:

SVC save fixed
twitter example fixed
time series example fixed
TStr::TrimLeft could crash
active learning fixed

Other:

SVC.save (unit test, example)
recursive linear reg tests and documentation
logistic regression API update, doc, example, tests
proportional hazards model API update, doc, example, tests
licenses updated
/src/qminer/gui folder deleted
examples/timeseriesPlot deleted
examples/updatingTimeseriesPlot deleted
src/nodejs/ run.js scripts removed

4 Sep 2015

New version: 2.2.0

non-breaking with new features

Features:

TPath with functions for editing file and path strings
Added compile flags to qm.flags
Added recset.FilterByFieldStr which takes two strings and keeps all that are between. Exposed in JS API

Bug fixes:

TUInt64 now has Mx and Mn fields of type uint64
TInt64 now has Mx and Mn fields of type int64

28 Aug 2015

New version: 2.1.1

non-breaking, no new features

Other:

Added tests for almost all documented classes and methods in analytics.js
All examples from documentation are now also converted to tests
Tests now have 10s to finish
Removed copy-paste from binding.gyp, MS Build toolset now defaults to v120

21 Aug 2015

New version: 2.1.0

non-breaking with new features

New features:

k-Means - can reorder computed centroids (clusters) based on a given map
Nearest Neighbor anomaly detector reimplemented in C++, now much faster. Only works with sparse vectors.

Bug fixes:

TZipIn does not hang when looking for length, on empty files, etc.
Fixed NaN issue in TSigmoid, now works normally

Other:

Removed deprecated TTempIndex from qminer_core
Merged qminer_gs and qminer_pbs into qminer_storage
Moved TOAST functions from qminer_core to qminer_storage

7 Aug 2015

New version: 2.0.0

Breaking changes: binary format of storage changed, JS API modified (analytics cleanup)

New features:

New store implementation using Paged Blob:
TMem and TBase implement C++ move semantics
Optimized DeleteAllRecs on TStoreImpl and TStorePgb
TOAST support in Page Blob
Numeric feature extractor: new option for normalization (standardize)
Circular record buffer in qm.js
Nearest neighbor anomaly detection extended to cover streaming scenarios (partialFit) and serialization
TFIn, TFOut support for (de)-serializing JSON
kmeans: manually set initial centroids

Other:

gix improvements
SVR optimized
unit tests: svc, svr, kmeans

Files

CHANGELOG.md

Latest commit

History

CHANGELOG.md

File metadata and controls

QMiner Change Log

20 April 2018

13 April 2018

5 January 2018

8 December 2017

1 December 2017

27 October 2017

8 September 2017

1 September 2017

21 July 2017

14 July 2017

16 June 2017

26 May 2017

5 May 2017

21 April 2017

31 March 2017

24 March 2017

3 March 2017

24 February 2017

10 February 2017

27 January 2017

13 January 2017

6 January 2017

30 December 2016

23 December 2016

16 December 2016

2 December 2016

25 November 2016

4 November 2016

28 October 2016

21 October 2016

14 October 2016

7 October 2016

30 September 2016

23 September 2016

2 September 2016

19 August 2016

12 August 2016

5 August 2016

22 July 2016

24 June 2016

26 May 2016

6 May 2016

15 Apr 2016

8 Apr 2016

25 Mar 2016

18 Mar 2016

11 Mar 2016

4 Mar 2016

26 Feb 2016

12 Feb 2016

5 Feb 2016

29 Jan 2016

22 Jan 2016

15 Jan 2016

8 Jan 2016

24 Dec 2015

18 Dec 2015

04 Dec 2015

27 Nov 2015

20 Nov 2015

13 Nov 2015

6 Nov 2015

30 Oct 2015

23 Oct 2015

16 Oct 2015

2 Oct 2015

25 Sep 2015

18 Sep 2015

11 Sep 2015

4 Sep 2015

28 Aug 2015

21 Aug 2015

7 Aug 2015