Version 9.2.2
Non-breaking with bug fixes
Bugfix:
- Queries with joins sometimes crashed due to unsorted intermediate record set (issue #606)
- Added clarification to functions
recordSet.each
andrecordSet.map
andfs.readCsvLines
- C++ work again!
- TMem got slimmer (24 bytes instead of 32 bytes!)
- Fixed how exceptions are handled in
fs.readLines
Verstion 9.2.1
Non-breaking with new feature and bug fixes
Feature:
- JavaScript APIs cleaned for quantile estimators, moved to
analytics.quantiles
.
Bugfix:
- Typo bug in Json UTF-16 handling
- Added implementation for JSON parsing in
newRecord
- Fixed issue 606 on sorted joins
Version 9.1.2
Non-breaking with bug fixes
Bugfix:
- Added UTF-16 surrogate pair encoding for seralization of non-BMP characters
Version 9.1.1
Non-breaking with bug fixes
Bugfix:
- Replaced deprecated v8 API usages
- Assert output stream not closed before write or close
Version 9.1.0
Non-breaking with new feature and a bug fix
Feature:
- TNodeTask aware if executed in async mode
Bugfix:
- Removed
TQm
dependancy from glib/mine/svm
Version 9.0.0
Breaking with new features
Breaking changes:
- SVC and SVR models are backward binary incompatible
Features:
- LIBSVM nonlinear classification supported (previously only linear models were wrapped)
- Active learning (javascript implementation in analytics module): uses SVC (preferably LIBSVC) and maximum uncertainty criterion for semisupervised classification
- Json parsing extended to support scalar values for strings and floats
- Faster index joins to go over hash tables and not binary tree
Bugfix:
- Lock gets deleted when creating a base with createClean
- Added a fix for the text positional index
- Fixed division by zero in zscore
- Fixed unit test silent failing
- Fixed stream aggregate example unit test that was halting
- Fixed when parsing string literal with escaped null character
- Fixed resize for TCHa larger then 1GB
- When parsing json-s we ignore invalid escape characters
- Fixed TDir::Exists for Linux
- Fixed TVec::Resize to not crash in case of 64bit index and size being TInt::Mx - 1024
Other:
- Fixed V8 API deprecated warnings
- C++ unit tests for TJsonVal and TDir
- Added performance tests for memory allocations
Version 8.6.1
Non-breaking with no new features
Bugfix:
- Analytics
PNotify
segfault bug fixed (smartpoitners to static notify objects changed to weak pointers) - Unit testing silent failing fix + improved example unit test generation
- Several unit tests fixed that were failing
- Division by zero fixed in bag-of-words feature extractor (unknown words + IDF weighting bug)
- Fixed #550: frequency is computed correctly for tiny joins
- Fixed #459 :
PJsonVal
returning temporary strings given as default value by reference - Fixed #446 : Added timeout parameter to
TStore::GarbageCollect()
- Fixes #455 : Fixed documentation and example for
fs.readLines(...)
Version 8.6.0
Non-breaking with new features
Features:
- Greenwald-Khanna algorithm for online quantile estimation exposed in nodejs (analytics.Gk).
- CKMS algorithm for onine biased quantie estimation exposed in nodejs (analytics.BiasdeGk). This algorithm is more accurate on extreme values (example q=0.0001).
- TimeWindowGk algorithm accuracy and speed optimized.
- external stream aggregate extensions
- external qm nodejs extensions
- Added a configuration hash table that can be used to specify custom sizes for cache for different index types (storage memory improvement).
- histogramAD API extended to return largest normal bin/value
Bugfix:
- LinAlgStat::Mean index out of bounds fix (sparse vector case)
- Bugfix for TNodeJsRecSet::getVector (doesn't crash on null fields)
- LIBSVM j parameter supported (was ignored before)
Documentation:
- histogram anomaly detection updated
- Greenwald-Khanna and CKMS (analytics module)
Version 8.5.0
Non-breaking with new features
Features:
- DpMeans algorithm in nodejs API. The algorithm fixes the radius of each cluster and the number of clusters is variable
- Clustering quality measure for kmeans and dpmeans
Bugfix:
- Inplace sparse vector linear combination assertion
Version 8.4.0
Non-breaking with new features
Features:
- Added support for Node.JS 8
- Improved positional text indexing. Uses modulo 1024 instead of 256 (less false positives), stores all mentions of words in a document (before limited to 8 occurrences) and uses about 20% less space for the index.
- Implementation of a windowed quantile estimation algorithm on streams based on [1].
- Quantile estimation Node.js analytics models
analytics.TimeWindowGk
(timestamp based window) andanalytics.CountWindowGk
(count based window). Implements the standard analytics module API (partialFit
,predict
,save
). - Quantile estimation stream aggregate with type
windowQuantiles
, reads from a window buffer and updates the statistics when data enters the buffer or when data leaves the buffer.
[1] http://infolab.stanford.edu/~datar/courses/cs361a/papers/quantiles.pdf
Version 8.3.0
Non-breaking with new features
Features:
- added TWPt serialization as in TPt
- updated TStrHash.GetMemUsed() which used the nonavailable GetMemUsedDeep() method
- added filtering classes TRecFilterByFieldByteSet, TRecFilterByFieldUIntSet, TRecFilterByFieldUInt64Set
Version 8.2.1
Non-breaking with bug fixes
Bug fixes:
- Positional index: items are not necessarily sorted
- Positional index: Def() has to be called in case some items were deleted
Version 8.2.0
Non-breaking with a new feature
New features:
- qm.flags includes compiler version and sizeof information
Version 8.1.0
Non-breaking with a new feature and bugfix
New features:
- JsonVector (new JSON type supported for vectors)
Bug fixes:
- assert the base create mode is valid
- fixed error messages
Version 8.0.0
Breaking with new features and bugfixes
Breaking:
- binary compatibility of GIX
New features:
- introduced tiny gix index in TIndex which does not store any frequency information
- introduced position index which can index words and their position in a string
- search over phrases with gaps based on the position index
Bug fixes:
- fixed GetMemUsed() in THashSetKey
Version 7.11.1
Non-breaking with a bugfix
Bug fixes:
- Gix memory usage overflow fix
Version 7.11.0
Non-brekaing with new features
New features:
- Reimplemented online linear regression with more predictable influence of regaluarization and forgeting factor parameters.
- Added
TStrUtil::GetStr(int)
, formats number1234567
as"1,234,567"
- Added
NotifyInfoFmt
,NotifyWarnFmt
,NotifyErrFmt
to theTLogger
Bug fixes:
- PartialFlush update should fix the problem caused due to using of some deleted itemsets
Other:
- Cleaned up duplicate code introduced whith
GixSmall
- Added Windows pre-gyp for Node 7
Version 7.10.1
Non-breaking with a bug fix
Bug fix:
- Fixed linked list memory computation
Version 7.10.0
Non-breaking with new features
New features:
- tdigest wrapped as an analytics model, used to approximately track quantiles on streams
- hashtable key id is exposed in ht module
Bug fix:
- javascript feature extractor that returns dense vectors crash fixed
Other:
- Documentation fixes and added examples
Version 7.9.0
Non-breaking with new features
New features:
- created common API for calculating memory usage in containers
- calculating stream aggregates memory footprint
- type trait API for detecting shallow types and containers
- optimized TVec memory footprint calculation using type traits (only for C++11)
- new API for multinomial feature extractor transformation
Version 7.8.1
Bug fix:
- Fixed
FilterByFq
onTRecSet
Version 7.8.0
Non-breaking with new features
New features:
- Exposed keyid for hashtables in javascript
- Stay-Point-Detector aggregate (third party) that aggregates GPS time-series
- loadStateJson and saveStateJson added to stream aggregates (alternative to binary save and load)
Version 7.7.0
Non-breaking with new features and bug fixes
New features:
- Added log transform to multinomial feature extractor
- Extended filter options for DMoz classifier (wildcard supported)
- Node 7 supported
Version 7.6.0
Non-breaking with new feature and a bug fix
New feature:
- Store that only holds schema and has no disk footprint:
TStoreEmpty
Bug fix:
TJsonVal
can be parsed from string in multiple threads
Version 7.5.0
Non-breaking with new feature and big fixes
New feature:
- Measuring stream aggregate performance. Exposed through
TBase::GetStreamAggrStats()
Bug fixes:
- Fixed silent exceptions in JavaScript stream aggregate
- Javascript serialization of stream aggregate prohibited output stream to close properly
- Nearest neighbor anomaly detector did not output complete explanation
Other:
- Fixed out-of-sync example timeseries
Version 7.4.0
Non-breaking with new features and a bug fix
New features:
- Record switch aggregate (
TRecSwitchAggr
) reads strings from records and triggers other aggregates based on an internal hash map. - Time series sparse vector tick (
TTimeSeriesSparseVectorTick
) reads timestamps and sparse vectors from records, implementsISparseVec
andITm
interfaces. - Sparse vector circular buffer (
TWinBufSpV
) reads from TWinBuf and stores the buffer values in memory as a circular buffer. - Stream aggregates can pass the caller when triggering
onAdd
,onStep
andonTime
of other aggregates. TWinMemBuf
supports separate aggregate that provides time.
Bug fix:
- Fixed parsing dates in JSON objects (providing a Date object for a datetime field of a record now works).
Other:
- Renamed
TStreamAggr::GetParam
->GetParams
(consistency with JS). - Renamed
TStreamAggr::SetParam
->SetParams
(consistency with JS).
Version 7.3.0
Non-breaking with new features and bug fixes
New features:
- Aggregating resampler (
TAggrResampler
) The resampler computes aggregates over consecutive equally sized intervals. It implements summing, averaging, max and min. - Added
TStorePbBlob::GarbageCollect()
- Added
TRecFilterByFieldNull
Bug fixes:
- When calling
saveJson()
on uninitalizedTOnlineHistrogram
, an exception was created. Now it serializes the current state. - Fix for deleting blobs; freed space from older blob files gets reused on following inserts. Now we always correctly know which file has free space where to put the new buffer. Had to extend the
TBlobBs
file with a parameterReleasedSize
that returns a value if the blob is moved and the previous buffer is released. Needed for monitoring which places in the files are free. TRecSet::DoJoin
fixed when using types onther thanuint64
for field join- When deleting records, we need to call
DelJoin
without the freqency parameter. Otherwise we might keep some joins to deleted records. TStorePbBlob::IsRecId
did not work if all data for store was in memory
Version 7.2.0
Non-breaking with new feature
New Feature:
- Graph cascades expose topological order in JavaScript API
Version 7.1.0
Non-breaking with new features and bug fixes
New feature:
- graph cascades model (modeling times of visiting nodes for directed acyclic graph sweeps)
Bug fixes:
- Fixed documentation (broken links in nnets)
- TSIn, TSBase optimized (does not create redundant strings any more)
- TStorePbBlob several fixes
- TRecSet::DoJoin optimized
- TNNAnomalyAggr initialization fix
Version: 7.0.2
Non-breaking with bug fixes
Bug Fixes:
- Fixed broken links in documentation (#481)
- Fixed bug in feature space. Output vector when calling
TFtrSpace::GetSpV
was not cleared when not empty.
Version: 7.0.1
Non-breaking with bug fixes
Bug Fixes:
- Fixed histogram anomaly detector severity classifier
- Fixed bad casts (unsigned)(int64) to (unsigned long long)(int64).
- JS stream aggregate exceptions come with stacktraces, not just messages
- JS stream aggregate
this
fixed - base construction with
createClean
mode made safer
Version: 7.0.0
Breaking with new features
New Features:
- Added
getNameInteger
andgetNameFloat
for stream aggregates in JavaScript (INmFlt
andINmInt
). - Online histogram can resize accordingly to new observed values.
Bug Fix:
- BREAKING: stream aggregates return Unix timestamps on JavaScript side and Windows timestamps on C++ side (issue #286)
new Date(sa.getTimestamp()-11644473600000)
=>new Date(sa.getTimestamp())
wheresa instanceof qm.StreamAggreator
.
- Cleaned
INmFlt
andINmInt
interfaces.
Version: 6.5.1
Non-breaking with a bug fix
Bug Fix:
- Fixed support for index joins in records by value
Version: 6.5.0
Non-breaking with new features
New Features:
qm.stats
property lists statistics on constructor and destructor calls- Histogram smoothing using kernel density estimation in
THistogramToPMFModel
- Histogram based anomaly detection stream aggregate
- Nearest neighbor anomaly detection stream aggregate
- Optimized Record set filter over code book strings
Bug Fixes:
- Lots of fixes to PgPage and its associated store
- Again compiles under debug mode in Visual Studio
Version: 6.4.0
Non-breaking with bug fixes
New Features:
- Added
TStorePbBlob::GetFirstRecId()
andGetLastRecId()
- Added
TVec::GetMnValN
andTVec::GetMnVal
- Added
TInt::GetSepStr()
to help formating numbers (1234 -> "1,234"
) - Modified
KMeans.transform
to return a matrix of distances to centroids - Added method
TLinAlg::GetKernelVec
, which returns a vector in the kernel of a matrix - Added new resempler stream aggregate that can read from input stream aggregates and push data to other stream aggregates
- Added
TStreamAggr::GetParam
andTStreamAggr::SetParam
to check and modify stream aggregate parameters after construction
Bug Fixes:
- Fixed several bugs in
TStorePbBlob
- Fixed KMeans cosine distance generating NaN
- Fixed compile warnings in
TGix
Version: 6.3.1
Non-breaking with bug fix
Bug fix:
TStoreImpl
got wrong value for cache size parameter when loading from disk.
Version: 6.3.0
Non-breaking with new features
New Features:
- Gix updated to speed up deletes of records, especially when using FIFO stores
- Support methods for byte fields in
TStore
- Added
qm.RecordVector
which can hold array of records passed by value. Vector support serialization using QMiner streams. - Standard deviation
qm.statistics.std
now supportsla.Vector
as input
Version: 6.2.0
Non-breaking with new features
New Features:
- Speed up of
RecSet.SortByField
- Added
Store.GetFieldNmByte
andStore.SetFieldNmByte
- Tokenizer uses
unicode
as default type in constructor
Other:
- Updated documentation: added missing types, descriptions, links and methods. Reduced number of clicks to get to specific information.
Version: 6.1.0
Non-breaking with new features
New Features:
- Changed
TStreamAggrOut
interfaces to be flat TTDigest
Stream aggregate can wait forN
elements before it is initialized
Bug fixes:
- Calling
DelObjKey
on key that is first inKeyValH
makes following serialization invalid.ObjKeyN
starts with 1 which makes invalid json object. Relevant change is inTJsonVal::GetChAFromVal()
TStr::SearchStr
return exception when called on empty stringTStrHash
created temporaryTStr
s when computing hash codes creating significant overhead without any good reasonTSAppSrv::OnHttpRq
does better check for valid URL- Removed old debug code in
TStr
- Issue #418: Categorical feature extraction documentation - Removed the 'values' from the construction documentation.
- Issue #439: Added the two missing optional parameters in the new KMeans constructor,
fitIdx
andfitStart
. Also fixed the documentation for KMeans constructor parameter and added some new unit tests for KMeans. - Issue #449: Not all methods used for KMeans.fit were implemented when using distanceType: "Cos". Added unit tests for the fit and predict methods in the case of distanceType: "Cos".
Other:
- Replaced tabs with spaces in
sappsrv.cpp
Version: 6.0.0
Breaking with new features
New Features:
- Removed
TStreamAggrBase
and introducedTStreamAggrSet
instead. - (breaking) Adjusted rest of the codebase to
TStreamAggrSet
replacingTStreamAggrBase
. - Introduced new record filters which now all derive from
TRecFilter
and most have JSON constructors. - Added
TStreamAggrFilter
which calls given stream aggregate only when record passes given record filter. - (breaking) Added window buffer stream aggregate that keeps values in memory.
- References to store and stream aggregate can be passed as parameters in jsons as object when creating new stream aggregates.
Bug fixes:
- Fixed clang warnings in
printf
foruint64
- (breaking) Fixed stream aggregates that worked on window buffer to correctly work in case on
OnTime
andOnStep
triggers. getSubmatrix
can not get the last row and column of a matrix
Other:
- Added
fs.js
to documentation generation - Moved instructions for building OpenBLAS to qminer wiki
- Normalized few more files replacing all leading tabs to 4 spaces
- Added script for noramlizing tabs to spaces
- Cleaned output of example tests
- Examples from documentation are executed using
async
to avoid base colisions
Version: 5.3.0
Non-breaking with new features
New features:
- Non-negative matrix factorization: qm.analytics.nmf(mat, k, json)
- Recommender System (using nmf): new qm.analytics.RecommenderSys(params)
- added TFtrExt::GetFtrRange() which returns the range of the feature
- added method TJsonVal::SetArrVal
Bug fixes:
- fixed concurrency bug when executing code from worker thread on the main thread
- fixed TNodeJsUtil::GetFldObj and TNodeJsUtil::GetFldFun
Other:
- testing non-implemented stuff removed
- new API for inverting feature vectors
- moved StreamStory to third_party
- started cleaning TLinAlg: added some new classes, removed most ifdefs
- added macros for TLinAlg templates
Version: 5.2.0
Non-breaking with new feature
New feature:
- Added binary option to multinomial feature extractor: check only for presenc of value and does not weight by count
Bug fix:
TSimpleLinReg::SaveState
failed as it saved object and loaded smart-pointer.
Version: 5.1.0
Non-breaking with new features
New features:
TStoreImpl
can tell integer ID for codebook strings.TFieldDesc
can tell if field is encoded using codebook.
Bug fixes:
- Issue 400:
RecSet.saveCsv
should escape”
using””
and not\”
Other:
TStr::Empty()
usesAssert
instead ofIAssert
to confirm Inner is eitherNULL
or points to nonemtpy string.
New version 5.0.0
Breaking with new features
New features:
- Stores from same
TBase
share samePBlobBs
. Speed improvements for 1000 empty stores:- create: 0.5s vs 21s
- save: 0.6s vs 5s
- load: 0.06s vs 4s
- Removed unused flags from blob pointer, freeing 37.5% space per
TBlobPt
- Changed
TBlobPt
segment parameter fromuchar
touint16
, increasing max blob base size to 128TB - KMeans reimplemented in C++: Templated and wrapped Stopar's KMeans, which is implemented
clustering.h
andclustering.cpp
. The javascript wrapper contains the same functions as before.
Constructor parameters are:
name | type | description |
---|---|---|
iter | number | Number of iterations |
k | number | Number of centroids |
verbose | boolean | If false, the console output is supressed |
centroidType | string | Type of the centroids. Options: "Dense" or "Sparse" |
distanceType | string | Distance type used at the calculation. Options: "Euclid" or "Cos" |
Properties
centroids
, medoids
, idxv
.
Methods
getParams
, setParams
, getModel
, fit
, fitAsync
, predict
, transform
, permuteCentroids
, explain
, save
The fit
method can be used asynchronously (fitAsync
).
var qm = require('qminer');
var params = { iter: 10000, k: 2, verbose: false, distanceType: 'Euclid', centroidType: 'Dense' };
var kmeans = new qm.analytics.KMeans(params);
// create the matrix
var mat = new qm.la.Matrix({ rows: 1000, cols: 300, random: true });
//- Synchronous use of fit
kmeans.fit(mat);
//- Asynchronous use of fit
kmeans.fitAsync(mat, function (err) {
if (err) { console.log(err); }
// successful fitting
});
Bug fixes:
- fixed clang warnings
- changed tabs to four spaces on qminer source files
- Fixed issue 398 — move
stat.h
stuff toxmath.h
- Fixed issue 399 - stream aggregate example description
New version 4.10.0
Non-breaking with new features
New features:
TStreamAggrOnAddFilter
class can be extended to overrideCallOnAdd
method that takes a record and returns true if the aggregate should process it. Currently we have two filters: default that passes every record andTOnAddSubsampler
, which can skip a given amount of records for every time the aggregate is actually updated.TSimpleLinReg
is currently the only aggregate that supports filtering. Example:
var linReg = store.addStreamAggr({
filter: { type: "subsamplingFilter", skip: 3 },
type: 'simpleLinearRegression',
inAggrX: winX.name,
inAggrY: winY.name,
storeX: "Function",
storeY: "Function",
quantiles: [0.25, 0.75]
});
Bug fixes:
- #394. Added asserts for invalid record IDs in the buffer which are a result of store window being too short (incompatible with the window aggregate).
- #264 (incorrect exception handling)
TJsonVal
has two new functions:AssertObjKeyStr
,AssertObjKeyNum
. The functions take a second parameter (function name), where__FUNCTION__
can be used (not standard but works on msvc and gcc). Example:ParamVal->AssertObjKeyStr("timestamp", __FUNCTION__);
- #372. all JS vectors have
toArray
function. - #350. arm publish script added
Other:
- deleted example tests (generated by travis)
- removed datasets tests (will be moved to
qminer-data-loader
) - global mocha instalation is not required any more. tests can be run by calling
npm test
- made tests in
store_partial_flush.js
silent - updated GitHub Wiki (part of #351)
New vesion 4.9.1
Non-breaking with a big fix
Bug fix:
- Primary keys could be set to existing values belonging to other records. Now we throw exception in such cases. Added corresponding tests.
New version: 4.9.0
Non-breaking with new features
New features:
- New aggregates:
TWinBufFltV
(type:timeSeriesWinBufVector
), connects totimeSeriesWinBuf
, implementsIFltV
(holds a vector of buffer values in memory)TSimpleLinReg
(type:simpleLinearRegression
), connects to twoIFltV
aggregates, computes a linear fit and (optionally) quantile bands
Bug fixes:
TGix::PartialFlush
had a bug when saved item set got assigned a newTBlobPt
. Fixed #386TInMemStorage::DelVals
did wrong accounting when deleting bigger chunks of records- SVR using libsvm unallocated memory fix
- VS2015 warning fix (
StackWalker
) - Queue made more efficient, added standard API (
Front()
,Back()
), refactored variable names to be more meaningful and added unit tests. - Removed IOB error compensation in
TQQueue::GetSubValV
(now throws an exception) and renamed the method toTQQueue::GetSubValVec
, so it doesn't compile (behaviour change). Any users should renameGetSubValV
toGetSubValVec
and check the correctness of indices when calling. - async MDS segfault fix
- GUID method changed (can generate more than 10M IDs per second)
Other:
- Unit tests, documentation, examples:
timeSeriesWinBufVector
,simpleLinearRegression
New version: 4.8.0
Non-breaking with new features
New features:
analytics.MDS
now has async versionla.svd
now has async version
Bug fixes:
- Renamed SparseMatrix.submat to SparseMatrix.getColSubmatrix. Fix #402
TQm::TBase
now again backwards compatible with respect to loading settings. Fix #401
Other:
- BlobBs file size limit extended from 1GB to 2GB
la.svd
andla.qr
moved fromla_structures_nodejs.h
New version: 4.7.0
Non-breaking with new features
New features:
TStrUtil
can transformTHashSet
to string- Async version of
analytics.MDS
- New stream aggregate
threshold
, that returns 1 if input number above threshold, 0 otherwise. - Added
getColSubmatrix
function toSparseMatrix
that getsIntVector
of column ids and returns sparse matrix constructed from selected columns. - Added
clear
function toSparseMatrix
that clears its content and sets rows to -1.
Bub fixes:
TClust::TAbsKMeans
no longer returns empty clustersNumeric::InvFtr
wrongly de-normalized numbers- TVec Move constructor: no need to delete ValT
- TVec Move assignment: Delete internal ValT pointer only if you own it.
Other:
- Added examples to
qm.la
module
New version: 4.6.0
Non-breaking with new features
New features:
- Added new query aggregate that performs simple counting of records over some datetime column with provided granularity.
- Added move constructor and assignment operator to
TKeyDat
- Added move assignment operator to
THashKeyDat
Bug fixes:
- Removed clang compile warnings for MDS and TTimeSpan
- documentation generation: jsdoc-baseline version fixed
Other:
- Faster and more focused CI testing
- new branch for releases (ci_matrix) that tests and publishes the full version
New version: 4.5.0
Non-breaking with new features
New features:
- Added support for default field-storage location for whole store. Add special tag
storage_location
in theoptions
node of store schema. In the following example, Here, all fields of the store will be stored in cache as this is now the default for the whole store. Each individual field can still override this setting.
{
"name": store_name,
"fields": [
{ "name": "name", "type": "string" },
{ "name": "val", "type": "int" }
],
"options": {
"storage_location": "cache"
}
}
- Added on
onTime(unit64 TmMsec)
andonStep()
functions that enable updating of aggregates without adding a new record in the store.onTime(unit64 TmMsec)
is added to the NodeJs interface, whileonStep()
is an internal function.
Bug fix:
- MDS no longer generates compile warnings
New version: 4.4.0
Non-breaking with new feature
New feature:
Vector.sparse
takes an optional integer argument for the sparse vector dimension, which can be set to -1 for unknown
Bug fixes:
- SVR debug test fix
- MDS no longer returns compile warnings and actually uses selected distance metric.
TTDigest MergeValues()
tests no longer break in debug mode (streamaggr.js
sequential insert test)
Other:
- debug builds are now tested
New version: 4.3.0
Non-breaking with new features
Features:
- Location of join fields can be defined in schema (memory or cache).
- Nearest neighbor anomaly detector explain exposes first and last record ID
- Nearest neighbor anomaly detector accepts vector of rates (as opposed to only single rate). Predict returns position of the rate that is reached starting with 1 (or 0 if none).
- Can disable field name validation (
scrictNames
in base definition) - TLinAlg can solve generalized eigenvalue problems
- added
TRecSet::TRecFilterByFieldUInt64
- added
TIndex::HasJoin(const int& JoinKeyId, const uint64& RecId) const
- added
THash::THash(const TVec<TKeyDat<TKey, TDat> >& KeyDatV)
constructor - added
TStr::GetNrNumFExt
can generate any number of leading zeros
Big fixes:
- GYP fixed to make
libsvm
work. qm.saveCsv
puts headers always in quotes- TZipIn::CreateZipProcess puts filename in quotes
- Consolidated all references to records with frequency to Fq.
New version: 4.2.0
Non-breaking with new features
Features:
- Nearest neighbor init method exposed
TNodeJsFtrSpace
factory constructor addedIFtrSpace
interface added, implemented byTWinBufFtrSpVec
, exposed in JS- Field-join binary representation: Field-joins can now be stored in more compact way. Develop can specify field-types to be used for storing the field-join's record id and frequency by providing
storage
tag. Default isuint64-int
. It is also possible to set frequency type to empty string, which means that frequency will be always 1 and it wont take any space in the storage. Example:joins: [{ name: 'parent', type: 'field', store: 'People', storage: 'int16-byte' }]
filterByField
: Recordset provides utility methodfilterByField
that can now also operate on field joins:recordset.filterByField("parent", parent_id, parent_id);
. It currently accepts record ids. Caller can provide a range of IDs (min and max).- added tdigest stream aggregator for estimating any percentile from streaming data
Bug fixes:
- Sparse vector normalize fixed
TWinBufFtrSpVec
save/load fixwriteJson
andreadJson
do not parse and stringify in C++ but instead useJSON.stringify
andJSON.parse
- added
readString
toFIn
that complementsFOut.writeBinary
Other:
- test stream aggregate
getFeatureSpace
TWinBufFtrSpVec
save/load test
New version: 4.1.0
non-breaking with new features
Features:
- Stream aggregate
TEmaSpVec
- exponential moving average for sparse vectors.
Bug fixes:
TNodeJsFuncStreamAggr
supports theIsInit
method of theTStreamAggr
interface provided byinit
function. Added unit tests.TOnlineHistogram
supports additional init logic by specifying minimum count (when we have less than the given min, init is false).- Code for new
int16
andint64
types was copy-pasted in one place and not fixed.
Other:
- Added tests and documentation for Tokenizer and PCA
New version: 4.0.1
patch
Bug fixes:
- Fixed reflexion of storage related objects (example: store.allRecords now reports to be a getter instead of a value). API was not changed. This fixes Tonic crashes.
New version: 4.0.0
breaking with new features
Features:
- Now works with Node.js 4.x and 5.x
- (breaking)
QMiner
now supports the following new types:byte
- unsigned value between 0 and 255int16
- 16-bit integeruint16
- 16-bit unsigned integeruint
- 32-bit unsigned integerint64
- 64-bit integersfloat
- single-precision float value (existing typefloat
uses double precision)json
- arbitraryjavascript
object. Internally it will be stored asJSON
. Automatically (de)serialized.blob
- binary buffer (usesTMem
internally). When new record is created, this field needs to be sent in as base64-encoded string. When the record is accessed, the field is represented and can be manipulated as javascript'sBuffer
object.
TStreamAggrOut
two interfaces are now templatedIValTmIO
andTValVec
TFltTmIO
is a typedef forIValTmIO<TFlt>
TFltVec
is a typedef forTValVec<TFlt>
TWinBuf
is now templated (according to new templated interfaces) and an abstract class. A derived class must implementTVal GetRecVal(const uint64& RecId)
.- Two derived classes:
- the old
TWinBuf
is implemented asTWinBufFlt : public TWinBuf<TFlt>
TWinBufFtrSpVec : TWinBuf<TIntFltKdV>
takes a JSON array (or a single JSON) of feature extractor descriptors and computes sparse vectors of type TIntFltKdV on records — Added new interface functions to stream aggregate Node.js API
- the old
- Two derived classes:
- Added support for JSON argument inputs for:
extractVector
,extractSparseVector
,extractMatrix
,updateRecord
,updateRecords
. The methods that expected a record set can now take an array of JSON objects, where each object respects the store schema. - Added sparse-vector sum aggregate for sparse vectors that maintains centroid vector of sparse vectors coming out of window buffer feature space aggregate.
- Introduced
qminer-data-loader
NPM module to handle datasets for examples. - Node.js
TStore
implementation that let us wrap external data sources as stores. For now works with feature extractors - Joins have index type by default.
Bug fixes:
- MDS documentation fixed.
Other:
- Added recordSet.sortByFq to documentation
- Added examples to linear algebra
- Updated travis and appveyor to test: arch x64/x86 - node 0.12/4/5 - platform win/linux/osx
- Made qm structures safe for Tonic notebooks (no crashing due to infinite recursion)
New version: 3.6.0
non-breaking with new features
Features:
- Implement full API for MDS in
qm.analytics
- Record set filter by boolean
FeatureSpace.extractSparseVector
can directly accept JSON, no need to dostore.newRecord(JSON)
before.
Bug fixes:
- Assert valid names on stream aggregates
- Fixed text query returning non-weighted results bug (issue #176)
- Fixed record set weighted sampling to actually work as promised (issue #177)
TStore::GarbageCollect()
works well for stores with only in-mem fields (issue #329)- Fixed
createExampleTests.js
to not remove*
from code - Cleaned sparse matrix JS constructor
- Optimised dense matrix multiplication for row-major
- Propagate LIBSVM error messages (issue #303)
- Use TNotify for debug and error messages in LIBSVM (issue #302)
Other:
- Added documentation and tests for timeWindow definition on stores (issue #329)
- Added documentation and tests for MDS (issue #309)
- Removed Eigen from repository, now included as git submodule
- Added unit tests for LIBSVM (issue #301)
New version: 3.5.0
non-breaking with new features
Features:
- Stream aggregates that get time series on input now support delayed inputs (can get more then one value per iteration): online histogram, window aggregates (sum, max, min, mean, variance)
- Time series tick and window buffer can read from numeric fields of type other then double
Bug fixes:
- LIBSVM sparse matrix bug-fix when working with sparse vectors
- Multinomial fix for sparse vectors (does not store zero elements)
- Nearest neighbor anomaly detector explains more in explain
- ClassTPE defined reference counter is now protected and not private
- Chi2 stream aggregate cleanup (save/load, etc.)
- Stream aggregates implemented in JavaScript can (de)serialize their state
- Renamed TNodeJsSA->TNodeJsStreamAggr
- Renamed TNodeJsStreamAggr -> TNodeJsFuncStreamAggr
New version: 3.4.0
non-breaking with new features
Feature:
- Stream aggregates have
reset()
function that resets their state - Added serialisation to Chi^2 and online histogram
- exposed FAcecss (mode in which base is opened) to js side in qm.base.getStats() method
- Decision tree: explain for positive examples, correlation between attributes
- Support for writing Node.js async code in C++:
TNodeTask
, macros for defining async functions, callback execution on main thread - Multinomial feature extractor can use numeric field as source for weight
- Window stream aggregate:
- possibility of delay before things go into the window
- changed interface: input and output elements both vectors
- does not store windowed elements anymore, keeping only pointers to store
Bug fix:
- Replaced
nodist
withnvmw
to prepare binaries for Windows. (nodist
started acting funny) - removed automatic closing and flushing file stream in
.save(fout)
and.load(fin)
functions in online regression metric fixed unit tests according to previous commit - bugfix in resampler stream aggregate
.load
method - Compensation for numerical errors in
TSpecFunc::BetaCf
inxmath.cpp
.
Other:
- Tests do not output to console anymore
- Renamed
TWindowBuffer
toTWinAggr
New version: 3.3.0
non-breaking with new features
Feature:
- Added LIBSVM (algorithm name "LIBSVM"), currently we have SVC and SVR
- Changed chi2 algorithm so it computes a two sample test
- multidimensional scaling for data visualization
- EIGEN support (gyp updated). EIGEN will be added to qminer repository (third_party)
- save and load in TRecBuffer. The buffer now stores record IDs as opposed to records
- online regression metrics now have save and load
- spread sheet parser TSsParse can take stream as input
- added Decision Tree (split: InfoGain, GainRatio, prune: min examples threshold)
- async reading of CSV
- added record by value vector in qm module for async processing
- FeatureSpace.updateRecordsAsync
- FeatureSpace.extractMatrixAsync
Bugfixes:
- Sort works with multiple threads and is more robust. Sort can take TRnd as argument.
- undefined behaviour bug (works different on ARMv7): casting double to uint64 should be: (unsigned)(int64)(double)
- portability problem with casting char * to double * (ARMv7 bus errors)
Other:
- qminer works on tonic: go to https://tonicdev.com/npm/qminer
- qminer win 32bit and linux 32bit binaries are published in the cloud
- moved logistic regression classifier to classification.h/cpp
New version: 3.2.0
non-breaking with new features
Feature:
- Added build time to flags (
require(‘qminer’).flags
)
Other:
- no longer depends on
libuuid
on Linux and Mac OSX, now we includesole
library to handle this task. - can build on ARM v7 (RaspberryPi 2)
New version: 3.1.0
non-breaking with new features and bug fixes
Features:
TNodeJsUtil::ExecuteJson
— execute JS function that returns JSON.- ChiSquare stream aggregator - takes two IFltVec stream aggregates on the input and performs ChiSquare test
- A new aggregator
TOnlineSlottedHistogram
was added that accumulates data from equivalent historical periods. For instance the same hours of the day or the same minutes of the hour. - A new aggregator
TVecDiff
was added that subtracts one vectors from another. This is useful e.g. when we have histogram for the last 2 hours and another histogram for the whole history. This new aggregate will output the difference in distributions. - JS Sparse matrix can save itself to text format supported by Matlab
- Feature space can use only one of the extractors in extract* methods (given as second argument)
- OneVsAll can get binary matrix as target label
clustering.h
now has agglomerative approach- Added async methods (
fitAsnyc
on StreamStory) - Helper async methods available in
TNodeJsAsyncUtil
Bug fixes:
- CI does not fail if same version was already published
- Movies example now works
- TSparseColMatrix and TSparseRowMatrix now compute dimensions
- Bag of words feature generator clears word hash table on Clr()
- Added checks for feature space and record compatibility
- Fixed issue #187
- Undid skipped tests for fresh date, text feature extractor, prop hazards
Other:
- Added some C++ unit tests, removed old obsolete files.
- Example tests for examples that work over http server (currently only for stream aggregate example)
- Removed deasync dependency
- Documented analytics.preprocessing
New version: 3.0.0
breaking with new features and bug fixes
Features:
- On each release binaries for Windows, Linux and Mac OS are automatically prepared and uploaded to Amazon S3.
npm install qminer
no longer needs to compile from source. To force recompile usenpm install qminer --build-from-source
. TNodeJsUtil::GetArgTmMSecs
can extract time from javascript function argument (ISO String, Date or timestamp)
Bug fixes:
- Time stamps coming from Node.js now treated as signed integer
- Cleanup code for indicators in signalproc.h
- Fixed serialization of KMeans in analytics.h (breaking)
- Removed redundant constructors in stream aggregators (breaking)
- Smaller release binaries on Linux (removed debug symbols)
- KMeans in
clustering.h
did not update centroids, now it does - Tested and removed bugs in
metrics.ClassificationScore
andmetrics.PredictionCurves
inanalytics.js
.
Other:
- Increased timeout for tests, needed for slow runs on Travis or Appveyor CI
- Documentation and tests for
metrics
inanalytics.js
.
**New version: 2.6.0 **
non-breaking with new features and bug fixes
Features:
- Online histogram in
signalproc.h
, which can inc/dec bing counts - Histogram stream aggregate which can attach to tick (no forgetting) or window (forgetting)
FindAll
andFindAllSatisfy
added toTVec
to get all IDs of occurrences of (a) given value, or (b) values that satisfy given functionTSpecFunc::StudentCdf
student cumulative density functions- New cluster methods in
clustering.h
: K-Means, DP-Means - Hierarchical Markov Chain in
mc.h
working on continuous time fs.readLines(fin, onLine, onEnd, onError)
which iterates over file/inputs stream/node buffer and executes callback for each lineBase.isClosed()
, returns true if base closed- Helper functions in
TNodeJsUtil
for checking if argument is undefined, getting function from argument and getting function, int and float from V8 object
Bug fixes:
- Bad parameters when creating stream aggregates in Node.JS do not crash the whole process
- Unwrapping has additional checks to prevent crashing
- Sort does error and exception checking around JavaScript callbacks
- TTm to ISO String fixed to always have 3 digits for milliseconds
- Stemmer no longer crashes on strange parameters
- SVD works on 1-dimensional inputs- Fixed confusions between C++ and JS timestamps. C++ side now consistently uses Windows timestamps (milliseconds from 1601-01-01) and JS uses milliseconds since 1970-01-01 (same as Date.getTime())
- No more skipped tests for stream aggregates and resolved associated issues.
- Issues closed: 197, 196, 189, 188, 183, 192, 230, 198
Other:
- Improved documentation for base schemas, record sets, etc.
New version: 2.5.0
non-breaking with new features and bug fixes
Features:
- added BTree index for efficient numeric range searches. Supported data types: int, float, uint64, datatime
- Regression error metrics: batch and online metrics
- Recordset.filterByField - Added support for null values for numerics and datetime. Also, added support for datetime-filtering via string or uint64 (Unix msec-epoch).
Bug fixes:
Other:
- Unit tests and documentation for NNet
- Code cleanup
- Documentation generation fixes and enhancements
New version: 2.4.0
non-breaking with new features and bug fixes
Features:
record.setField
,store.newRec
andrecset.filterByField
accept unix timestamp as input for datetime fieldsfout.writeBinary
writes binary serialization of JS strings, numbers or jsons to GLib output stream (TFOut
)- k-means has explain method which returns medoid of the cluster new instance is assigned to
Bug fixes:
- fixed memory leak when assigning emtpy TVec to another empty TVec
- automatic removal of timestamp in generated javascript documentation (jsdoc) to avoid conflicts at merging documentation
New version: 2.3.0
non-breaking with new features
Features:
- KMeans can get seed centroids as parameters
Bug fixes:
- TFIn reading buffer beyond EOF
- TVec::AddSorted made faster
- Feature space constructor checkes parameters
Other:
- Cleaned and updated SNAP examples and documentation
- Added required APIs, documentation and tests for logistic regression, proportional hazards and neural networks
New version: 2.2.1
non-breaking without new features
Bug fixes:
- SVC save fixed
- twitter example fixed
- time series example fixed
- TStr::TrimLeft could crash
- active learning fixed
Other:
- SVC.save (unit test, example)
- recursive linear reg tests and documentation
- logistic regression API update, doc, example, tests
- proportional hazards model API update, doc, example, tests
- licenses updated
- /src/qminer/gui folder deleted
- examples/timeseriesPlot deleted
- examples/updatingTimeseriesPlot deleted
- src/nodejs/ run.js scripts removed
New version: 2.2.0
non-breaking with new features
Features:
- TPath with functions for editing file and path strings
- Added compile flags to
qm.flags
- Added recset.FilterByFieldStr which takes two strings and keeps all that are between. Exposed in JS API
Bug fixes:
- TUInt64 now has Mx and Mn fields of type uint64
- TInt64 now has Mx and Mn fields of type int64
New version: 2.1.1
non-breaking, no new features
Other:
- Added tests for almost all documented classes and methods in
analytics.js
- All examples from documentation are now also converted to tests
- Tests now have 10s to finish
- Removed copy-paste from
binding.gyp
, MS Build toolset now defaults to v120
New version: 2.1.0
non-breaking with new features
New features:
- k-Means - can reorder computed centroids (clusters) based on a given map
- Nearest Neighbor anomaly detector reimplemented in C++, now much faster. Only works with sparse vectors.
Bug fixes:
TZipIn
does not hang when looking for length, on empty files, etc.- Fixed NaN issue in
TSigmoid
, now works normally
Other:
- Removed deprecated
TTempIndex
fromqminer_core
- Merged
qminer_gs
andqminer_pbs
intoqminer_storage
- Moved TOAST functions from
qminer_core
toqminer_storage
New version: 2.0.0
Breaking changes: binary format of storage changed, JS API modified (analytics cleanup)
New features:
- New store implementation using Paged Blob:
- TMem and TBase implement C++ move semantics
- Optimized DeleteAllRecs on TStoreImpl and TStorePgb
- TOAST support in Page Blob
- Numeric feature extractor: new option for normalization (standardize)
- Circular record buffer in
qm.js
- Nearest neighbor anomaly detection extended to cover streaming scenarios (partialFit) and serialization
- TFIn, TFOut support for (de)-serializing JSON
- kmeans: manually set initial centroids
Other:
- gix improvements
- SVR optimized
- unit tests: svc, svr, kmeans