Skip to content

Releases: Teradata/teradataml

teradataml 20.00.00.02

03 Sep 11:10
89c3d75
Compare
Choose a tag to compare

Teradata Python package for Advanced Analytics.

teradataml makes available to Python users a collection of analytic functions that reside on Teradata Vantage. This allows users to perform analytics on Teradata Vantage with no SQL coding. In addition, the teradataml library provides functions for scaling data manipulation and transformation, data filtering and sub-setting, and can be used in conjunction with other open-source python libraries.

For community support, please visit the Teradata Community.

For Teradata customer support, please visit Teradata Support.

Copyright 2024, Teradata. All Rights Reserved.

Table of Contents

Release Notes:

teradataml 20.00.00.02

  • teradataml will no longer be supported with SQLAlchemy < 2.0.

  • teradataml no longer shows the warnings from Vantage by default.

    • Users should set display.suppress_vantage_runtime_warnings to False to display warnings.
  • New Features/Functionality
    • teradataml: SQLE Engine Analytic Functions
      • New Analytics Database Analytic Functions:
        • TFIDF()
        • Pivoting()
        • UnPivoting()
      • New Unbounded Array Framework(UAF) Functions:
        • AutoArima()
        • DWT()
        • DWT2D()
        • FilterFactory1d()
        • IDWT()
        • IDWT2D()
        • IQR()
        • Matrix2Image()
        • SAX()
        • WindowDFFT()
    • teradataml: Functions
      • udf() - Creates a user defined function (UDF) and returns ColumnExpression.
      • set_session_param() is added to set the database session parameters.
      • unset_session_param() is added to unset database session parameters.
    • teradataml: DataFrame
      • materialize() - Persists DataFrame into database for current session.
      • create_temp_view() - Creates a temporary view for session on the DataFrame.
    • teradataml DataFrameColumn a.k.a. ColumnExpression
      • Date Time Functions
        • DataFrameColumn.to_timestamp() - Converts string or integer value to a TIMESTAMP data type or TIMESTAMP WITH TIME ZONE data type.
        • DataFrameColumn.extract() - Extracts date component to a numeric value.
        • DataFrameColumn.to_interval() - Converts a numeric value or string value into an INTERVAL_DAY_TO_SECOND or INTERVAL_YEAR_TO_MONTH value.
      • String Functions
        • DataFrameColumn.parse_url() - Extracts a part from a URL.
      • Arithmetic Functions
        • DataFrameColumn.log - Returns the logarithm value of the column with respect to 'base'.
    • teradataml: AutoML
      • New methods added for AutoML(), AutoRegressor() and AutoClassifier():
        • evaluate() - Performs evaluation on the data using the best model or the model of users choice
          from the leaderboard.
        • load(): Loads the saved model from database.
        • deploy(): Saves the trained model inside database.
        • remove_saved_model(): Removes the saved model in database.
        • model_hyperparameters(): Returns the hyperparameter of fitted or loaded models.
  • Updates
    • teradataml: AutoML
      • AutoML(), AutoRegressor()
        • New performance metrics added for task type regression i.e., "MAPE", "MPE", "ME", "EV", "MPD" and "MGD".
      • AutoML(), AutoRegressor() and AutoClassifier
        • New arguments added: volatile, persist.
        • predict() - Data input is now mandatory for generating predictions. Default model
          evaluation is now removed.
    • DataFrameColumn.cast(): Accepts 2 new arguments format and timezone.

    • DataFrame.assign(): Accepts ColumnExpressions returned by udf().

    • teradataml: Options
      • set_config_params()
        • Following arguments will be deprecated in the future:
          • ues_url
          • auth_token
    • Database Utility
      • list_td_reserved_keywords() - Accepts a list of strings as argument.
    • Updates to existing UAF Functions:
      • ACF() - round_results parameter removed as it was used for internal testing.
      • BreuschGodfrey() - Added default_value 0.05 for parameter significance_level.
      • GoldfeldQuandt() -
        • Removed parameters weights and formula.
          Replaced parameter orig_regr_paramcnt with const_term.
          Changed description for parameter algorithm. Please refer document for more details.
        • Note: This will break backward compatibility.
      • HoltWintersForecaster() - Default value of parameter seasonal_periods removed.
      • IDFFT2() - Removed parameter output_fmt_row_major as it is used for internal testing.
      • Resample() - Added parameter output_fmt_index_style.
  • Bug Fixes
    • KNN predict() function can now predict on test data which does not contain target column.
    • Metrics functions are supported on the Lake system.
    • The following OpensourceML functions from different sklearn modules are fixed.
      • sklearn.ensemble:
        • ExtraTreesClassifier - apply()
        • ExtraTreesRegressor - apply()
        • RandomForestClassifier - apply()
        • RandomForestRegressor - apply()
      • sklearn.impute:
        • SimpleImputer - transform(), fit_transform(), inverse_transform()
        • MissingIndicator - transform(), fit_transform()
      • sklearn.kernel_approximations:
        • Nystroem - transform(), fit_transform()
        • PolynomialCountSketch - transform(), fit_transform()
        • RBFSampler - transform(), fit_transform()
      • sklearn.neighbors:
        • KNeighborsTransformer - transform(), fit_transform()
        • RadiusNeighborsTransformer - transform(), fit_transform()
      • sklearn.preprocessing:
        • KernelCenterer - transform()
        • OneHotEncoder - transform(), inverse_transform()
    • OpensourceML returns teradataml objects for model attributes and functions instead of sklearn
      objects so that the user can perform further operations like score(), predict() etc on top
      of the returned objects.
    • AutoML predict() function now generates correct ROC-AUC value for positive class.
    • deploy() method of Script and Apply classes retries model deployment if there is any
      intermittent network issues.

teradataml 20.00.00.01

11 Jun 16:50
e17a74a
Compare
Choose a tag to compare

Teradata Python package for Advanced Analytics.

teradataml makes available to Python users a collection of analytic functions that reside on Teradata Vantage. This allows users to perform analytics on Teradata Vantage with no SQL coding. In addition, the teradataml library provides functions for scaling data manipulation and transformation, data filtering and sub-setting, and can be used in conjunction with other open-source python libraries.

For community support, please visit the Teradata Community.

For Teradata customer support, please visit Teradata Support.

Copyright 2024, Teradata. All Rights Reserved.

Table of Contents

Release Notes:

teradataml 20.00.00.01

  • teradataml no longer supports Python versions less than 3.8.

  • New Features/Functionality
    • Personal Access Token (PAT) support in teradataml
      • set_auth_token() - teradataml now supports authentication via PAT in addition to
        OAuth 2.0 Device Authorization Grant (formerly known as the Device Flow).
        • It accepts UES URL, Personal AccessToken (PAT) and Private Key file generated from VantageCloud Lake Console
          and optional argument username and expiration_time in seconds.
  • Updates
    • teradataml: SQLE Engine Analytic Functions
      • ANOVA()
        • New arguments added: group_name_column, group_value_name, group_names, num_groups for data containing group values and group names.
      • FTest()
        • New arguments added: sample_name_column, sample_name_value, first_sample_name, second_sample_name.
      • GLM()
        • Supports stepwise regression and accept new arguments stepwise_direction, max_steps_num and initial_stepwise_columns.
        • New arguments added: attribute_data, parameter_data, iteration_mode and partition_column.
      • GetFutileColumns()
        • Arguments category_summary_column and threshold_value are now optional.
      • KMeans()
        • New argument added: initialcentroids_method.
      • NonLinearCombineFit()
        • Argument result_column is now optional.
      • ROC()
        • Argument positive_class is now optional.
      • SVMPredict()
        • New argument added: model_type.
      • ScaleFit()
        • New arguments added: ignoreinvalid_locationscale, unused_attributes, attribute_name_column, attribute_value_column.
        • Arguments attribute_name_column, attribute_value_column and target_attributes are supported for sparse input.
        • Arguments attribute_data, parameter_data and partition_column are supported for partitioning.
      • ScaleTransform()
        • New arguments added: attribute_name_column and attribute_value_column support for sparse input.
      • TDGLMPredict()
        • New arguments added: family and partition_column.
      • XGBoost()
        • New argument base_score is added for initial prediction value for all data points.
      • XGBoostPredict()
        • New argument detailed is added for detailed information of each prediction.
      • ZTest()
        • New arguments added: sample_name_column, sample_value_column, first_sample_name and second_sample_name.
    • teradataml: AutoML
      • AutoML(), AutoRegressor() and AutoClassifier()
        • New argument max_models is added as an early stopping criterion to limit the maximum number of models to be trained.
    • teradataml: DataFrame functions
      • DataFrame.agg()
        • Accepts ColumnExpressions and list of ColumnExpressions as arguments.
    • teradataml: General Functions
      • Data Transfer Utility
        • fastload() - Improved error and warning table handling with below-mentioned new arguments.
          • err_staging_db
          • err_tbl_name
          • warn_tbl_name
          • err_tbl_1_suffix
          • err_tbl_2_suffix
        • fastload() - Change in behaviour of save_errors argument.
          When save_errors is set to True, error information will be available in two persistent tables ERR_1 and ERR_2.
          When save_errors is set to False, error information will be available in single pandas dataframe.
      • Garbage collector location is now configurable.
        User can set configure.local_storage to a desired location.
  • Bug Fixes
    • UAF functions now work if the database name has special characters.
    • OpensourceML can now read and process NULL/nan values.
    • Boolean values output will now be returned as VARBYTE column with 0 or 1 values in OpensourceML.
    • Fixed bug for Apply's deploy().
    • Issue with volatile table creation is fixed where it is created in the right database, i.e., user's spool space, regardless of the temp database specified.
    • ColumnTransformer function now processes its arguments in the order they are passed.

teradataml 20.00.00.00

01 Apr 14:33
cbf9297
Compare
Choose a tag to compare
  • New Features/Functionality
    • teradataml OpenML: Run Opensource packages through Teradata Vantage

      OpenML dynamically exposes opensource packages through Teradata Vantage. OpenML provides an
      interface object through which exposed classes and functions of opensource packages can be accessed
      with the same syntax and arguments.
      The following functionality is added in the current release:

      • td_sklearn - Interface object to run scikit-learn functions and classes through Teradata Vantage.
        Example usage below:
        from teradataml import td_sklearn, DataFrame
        
        df_train = DataFrame("multi_model_classification")
        
        feature_columns = ["col1", "col2", "col3", "col4"]
        label_columns = ["label"]
        part_columns = ["partition_column_1", "partition_column_2"]
        
        linear_svc = td_sklearn.LinearSVC()
        
      • OpenML is supported in both Teradata Vantage Enterprise and Teradata Vantage Lake.
      • Argument Support:
        • Use of X and y arguments - Scikit-learn users are familiar with using X and y as argument names
          which take data as pandas DataFrames, numpy arrays or lists etc. However, in OpenML, we pass
          teradataml DataFrames for arguments X and y.
          df_x = df_train.select(feature_columns)
          df_y = df_train.select(label_columns)
          
          linear_svc = linear_svc.fit(X=df_x, y=df_y)
          
        • Additional support for data, feature_columns, label_columns and group_columns arguments -
          Apart from traditional arguments, OpenML supports additional arguments - data,
          feature_columns, label_columns and group_columns. These are used as alternatives to X, y
          and groups.
          linear_svc = linear_svc.fit(data=df_train, feature_columns=feature_columns, label_colums=label_columns)
          
      • Support for classification and regression metrics - Metrics functions for classification and
        regression in sklearn.metrics module are supported. Other metrics functions' support will be added
        in future releases.
      • Distributed Modeling and partition_columns argument support - Existing scikit-learn supports
        only single model generation. However, OpenML supports both single model use case and distributed
        (multi) model use case. For this, user has to additionally pass partition_columns argument to
        existing fit(), predict() or any other function to be run. This will generate multiple models
        for multiple partitions, using the data in corresponding partition.
        df_x_1 = df_train.select(feature_columns + part_columns)
        linear_svc = linear_svc.fit(X=df_x_1, y=df_y, partition_columns=part_columns)      
        
      • Support for load and deploy models - OpenML provides additional support for saving (deploying) the
        trained models. These models can be loaded later to perform operations like prediction, score etc. The
        following functions are provided by OpenML:
        • <obj>.deploy() - Used to deploy/save the model created and/or trained by OpenML.
        • td_sklearn.deploy() - Used to deploy/save the model created and/or trained outside teradataml.
        • td_sklearn.load() - Used to load the saved models.


      Refer Teradata Python Package User Guide for more details of this feature, arguments, usage, examples and supportability in both VantageCloud Enterprise and VantageCloud Lake.

    • teradataml: AutoML - Automated end to end Machine Learning flow.

      AutoML is an approach to automate the process of building, training, and validating machine learning models.
      It involves automation of various aspects of the machine learning workflow, such as feature exploration,
      feature engineering, data preparation, model training and evaluation for given dataset.
      teradataml AutoML feature offers best model identification, model leaderboard generation, parallel execution,
      early stopping feature, model evaluation, model prediction, live logging, customization on default process.

      • AutoML
        AutoML is a generic algorithm that supports all three tasks, i.e. 'Regression',
        'Binary Classification' and 'Multiclass Classification'.
        • Methods of AutoML
          • __init__() - Instantiate an object of AutoML with given parameters.
          • fit() - Perform fit on specified data and target column.
          • leaderboard() - Get the leaderboard for the AutoML. Presents diverse models, feature
            selection method, and performance metrics.
          • leader() - Show best performing model and its details such as feature
            selection method, and performance metrics.
          • predict() - Perform prediction on the data using the best model or the model of users
            choice from the leaderboard.
          • generate_custom_config() - Generate custom config JSON file required for customized
            run of AutoML.
      • AutoRegressor
        AutoRegressor is a special purpose AutoML feature to run regression specific tasks.
        • Methods of AutoRegressor
          • __init__() - Instantiate an object of AutoRegressor with given parameters.
          • fit() - Perform fit on specified data and target column.
          • leaderboard() - Get the leaderboard for the AutoRegressor. Presents diverse models, feature
            selection method, and performance metrics.
          • leader() - Show best performing model and its details such as feature
            selection method, and performance metrics.
          • predict() - Perform prediction on the data using the best model or the model of users
            choice from the leaderboard.
          • generate_custom_config() - Generate custom config JSON file required for customized
            run of AutoRegressor.
      • AutoClassifier
        AutoClassifier is a special purpose AutoML feature to run classification specific tasks.
        • Methods of AutoClassifier
          • __init__() - Instantiate an object of AutoClassifier with given parameters.
          • fit() - Perform fit on specified data and target column.
          • leaderboard() - Get the leaderboard for the AutoClassifier. Presents diverse models, feature
            selection method, and performance metrics.
          • leader() - Show best performing model and its details such as feature
            selection method, and performance metrics.
          • predict() - Perform prediction on the data using the best model or the model of users
            choice from the leaderboard.
          • generate_custom_config() - Generate custom config JSON file required for customized
            run of AutoClassifier.
    • teradataml: DataFrame
      • fillna - Replace the null values in a column with the value specified.
      • Data Manipulation
        • cube()- Analyzes data by grouping it into multiple dimensions.
        • rollup() - Analyzes a set of data across a single dimension with more than one level of detail.
        • replace() - Replaces the values for columns.
    • teradataml: Script and Apply
      • deploy() - Function deploys the model, generated after execute_script(), in database or user
        environment in lake. The function is available in both Script and Apply.
    • teradataml: DataFrameColumn
      • fillna - Replaces every occurrence of null value in column with the value specified.
  • teradataml DataFrameColumn a.k.a. ColumnExpression
    • Date Time Functions
      • DataFrameColumn.week_start() - Returns the first date or timestamp of the week that begins immediately before the specified date or timestamp value in a column as a literal.
      • DataFrameColumn.week_begin() - It is an alias for DataFrameColumn.week_start() function.
      • DataFrameColumn.week_end() - Returns the last date or timestamp of the week that ends immediately after the specified date or timestamp value in a column as a literal.
      • DataFrameColumn.month_start() - Returns the first date or timestamp of the month that begins immediately before the specified date or timestamp value in a column or as a literal.
      • DataFrameColumn.month_begin() - It is an alias for DataFrameColumn.month_start() function.
      • DataFrameColumn.month_end() - Returns the last date or timestamp of the month that ends immediately after the specified date or timestamp value in a column or as a literal.
      • DataFrameColumn.year_start() - Returns the first date or timestamp of the year that begins immediately before the specified date or timestamp value in a column or as a literal.
      • DataFrameColumn.year_begin() - It is an alias for DataFrameColumn.year_start() function.
      • DataFrameColumn.year_end() - Returns the last date or timestamp of the year that ends immediately after the specified date or timestamp value in a column or as a literal.
      • DataFrameColumn.quarter_start() - Returns the first date or timestamp of the quarter that begins immediately before the specified date or timestamp value in a column as a literal.
      • DataFrameColumn.quarter_begin() - It is an alias for DataFrameColumn.quarter_start() function.
      • DataFrameColumn.quarter_end() - Returns the last date or timestamp of the quarter that ends immediately after the specified date or timestamp value in a column as a literal.
      • DataFrameColumn.last_sunday() - Returns the date or timestamp of Sunday that falls immediately before the specified date or timestamp value in a column as a literal.
      • DataFrameColumn.last_monday() - Returns the date o...
Read more

teradataml17.20.00.07

27 Feb 04:56
d182a2d
Compare
Choose a tag to compare
  • New Features/Functionality
    • Open Analytics Framework (OpenAF) APIs:
      • Manage all user environments.
        • create_env():
          • new argument conda_env is added to create a conda environment.
        • list_user_envs():
          • User can list conda environment(s) by using filter with new argument conda_env.
      • Conda environment(s) can be managed using APIs for installing , updating, removing files/libraries.
    • Bug Fixes

      • columns argument for FillNa function is made optional.

teradataml 17.20.00.06

18 Dec 18:19
d182a2d
Compare
Choose a tag to compare
  • New Features/Functionality
    • teradataml DataFrameColumn a.k.a. ColumnExpression
      • ColumnExpression.nulls_first() - Displays NULL values at first.

      • ColumnExpression.nulls_last() - Displays NULL values at last.

      • Bit Byte Manipulation Functions

        • DataFrameColumn.bit_and() - Returns the logical AND operation on the bits from
          the column and corresponding bits from the argument.
        • DataFrameColumn.bit_get() - Returns the bit specified by input argument from the column and
          returns either 0 or 1 to indicate the value of that bit.
        • DataFrameColumn.bit_or() - Returns the logical OR operation on the bits from the column and
          corresponding bits from the argument.
        • DataFrameColumn.bit_xor() - Returns the bitwise XOR operation on the binary representation of the
          column and corresponding bits from the argument.
        • DataFrameColumn.bitand() - It is an alias for DataFrameColumn.bit_and() function.
        • DataFrameColumn.bitnot() - Returns a bitwise complement on the binary representation of the column.
        • DataFrameColumn.bitor() - It is an alias for DataFrameColumn.bit_or() function.
        • DataFrameColumn.bitwise_not() - It is an alias for DataFrameColumn.bitnot() function.
        • DataFrameColumn.bitwiseNOT() - It is an alias for DataFrameColumn.bitnot() function.
        • DataFrameColumn.bitxor() - It is an alias for DataFrameColumn.bit_xor() function.
        • DataFrameColumn.countset() - Returns the count of the binary bits within the column that are either set to 1
          or set to 0, depending on the input argument value.
        • DataFrameColumn.getbit() - It is an alias for DataFrameColumn.bit_get() function.
        • DataFrameColumn.rotateleft() - Returns an expression rotated to the left by the specified number of bits,
          with the most significant bits wrapping around to the right.
        • DataFrameColumn.rotateright() - Returns an expression rotated to the right by the specified number of bits,
          with the least significant bits wrapping around to the left.
        • DataFrameColumn.setbit() - Sets the value of the bit specified by input argument to the value
          of column.
        • DataFrameColumn.shiftleft() - Returns the expression when value in column is shifted by the specified
          number of bits to the left.
        • DataFrameColumn.shiftright() - Returns the expression when column expression is shifted by the specified
          number of bits to the right.
        • DataFrameColumn.subbitstr() - Extracts a bit substring from the column expression based on the specified
          bit position.
        • DataFrameColumn.to_byte() - Converts a numeric data type to the Vantage byte representation
          (byte value) of the column expression value.
      • Regular Expression Functions

        • DataFrameColumn.regexp_instr() - Searches string value in column for a match to value specified in argument.
        • DataFrameColumn.regexp_replace() - Replaces the portions of string value in a column that matches the value
          specified regex string and replaces with the replace string.
        • DataFrameColumn.regexp_similar() - Compares value in column to value in argument and returns integer value.
        • DataFrameColumn.regexp_substr() - Extracts a substring from column that matches a regular expression
          specified in the input argument.
    • Open Analytics Framework (OpenAF) APIs:
      • Manage all user environments.
        • create_env():
          • User can create one or more user environments using newly added argument template by providing specifications in template json file. New feature allows user to create complete user environment, including file and library installation, in just single function call.
      • UserEnv Class – Manage individual user environment.
        • Properties:
          • models - Supports listing of models in user environment.
        • Methods:
          • install_model() - Install a model in user environment.
          • uninstall_model() - Uninstall a model from user environment.
          • snapshot()- Take the snapshot of the user environment.
    • teradataml: Bring Your Own Model
      • New Functions
        • DataRobotPredict() - Score the data in Vantage using the model trained externally in DataRobot and stored
          in Vantage.
  • Updates
    • DataFrame.describe()
      • Method now accepts an argument statistics, which specifies the aggregate operation to be performed.
    • DataFrame.sort()
      • Method now accepts ColumnExpressions as well.
      • Enables sorting using NULLS FIRST and NULLS LAST.
    • view_log() downloads the Apply query logs based on query id.
    • Arguments which accepts floating numbers will accept integers also for Analytics Database Analytic Functions.
    • Argument ignore_nulls added to DataFrame.plot() to ignore the null values while plotting the data.
    • Dataframe.sample()
      • Method supports column stratification.
  • Bug Fixes
    • DataFrameColumn.cast() accepts all teradatasqlalchemy types.
    • Minor bug fix related to DataFrame.merge().

teradataml 17.20.00.05

26 Oct 18:04
8144c2a
Compare
Choose a tag to compare
  • New Features/Functionality
    • teradataml: Hyperparameter-Tuning - Technique to identify best model parameters.

      Hyperparameter tuning is an optimization method to determine the optimal set of
      hyperparameters for the given dataset and learning model. teradataml hyperparameter tuning feature
      offers best model identification, parallel execution, early stopping feature, best data identification,
      model evaluation, model prediction, live logging, input data hyper-parameterization, input data sampling,
      numerous scoring functions, hyper-parameterization for non-model trainer functions.

      • GridSearch
        GridSearch is an exhaustive search algorithm that covers all possible
        parameter values to identify optimal hyperparameters.
        • Methods of GridSearch
          • __init__() - Instantiate an object of GridSearch for given model function and parameters.
          • evaluate() - Function to perform evaluation on the given teradataml DataFrame using default model.
          • fit() - Function to perform hyperparameter-tuning for given hyperparameters and model on teradataml DataFrame.
          • get_error_log() - Useful to get the error log if model execution failed, using the model identifier.
          • get_input_data() - Useful to get the input data using the data identifier, when input data is also parameterized.
          • get_model() - Returns the trained model for the given model identifier.
          • get_parameter_grid() - Returns the hyperparameter space used for hyperparameter optimization.
          • is_running() - Returns the execution status of hyperaparameter tuning.
          • predict() - Function to perform prediction on the given teradataml DataFrame using default model.
          • set_model() - Function to update the default model.
        • Properties of GridSearch
          • best_data_id - Returns the best data identifier used for model training.
          • best_model - Returns the best trained model.
          • best_model_id - Returns the identifier for best model.
          • best_params_ - Returns the best set of hyperparameter.
          • best_sampled_data_ - Returns the best sampled data used to train the best model.
          • best_score_ - Returns the best trained model score.
          • model_stats - Returns the model evaluation reports.
          • models - Returns the metadata of all the models.
      • RandomSearch
        RandomSearch algorithm performs random sampling on hyperparameter
        space to identify optimal hyperparameters.
        • Methods of RandomSearch
          • __init__() - Instantiate an object of RandomSearch for given model function and parameters.
          • evaluate() - Function to perform evaluation on the given teradataml DataFrame using default model.
          • fit() - Function to perform hyperparameter-tuning for given hyperparameters and model on teradataml DataFrame.
          • get_error_log() - Useful to get the error log if model execution failed, using the model identifier.
          • get_input_data() - Useful to get the input data using the data identifier, when input data is also parameterized.
          • get_model() - Returns the trained model for the given model identifier.
          • get_parameter_grid() - Returns the hyperparameter space used for hyperparameter optimization.
          • is_running() - Returns the execution status of hyperaparameter tuning.
          • predict() - Function to perform prediction on the given teradataml DataFrame using default model.
          • set_model() - Function to update the default model.
        • Properties of GridSearch
          • best_data_id - Returns the best data identifier used for model training.
          • best_model - Returns the best trained model.
          • best_model_id - Returns the identifier for best model.
          • best_params_ - Returns the best set of hyperparameter.
          • best_sampled_data_ - Returns the best sampled data used to train the best model.
          • best_score_ - Returns the best trained model score.
          • model_stats - Returns the model evaluation reports.
          • models - Returns the metadata of all the models.
    • teradataml: DataFrame
      • New Functions
        • DataFrame.plot() - Generates the below type of plots on teradataml DataFrame.
          • line - Generates line plot.
          • bar - Generates bar plot.
          • scatter - Generates scatter plot.
          • corr - Generates correlation plot.
          • wiggle - Generates a wiggle plot.
          • mesh - Generates a mesh plot.
        • DataFrame.itertuples() - iterate over teradataml DataFrame rows as namedtuples or list.
    • teradataml: GeoDataFrame
      • New Functions
        • GeoDataFrame.plot() - Generate the below type of plots on teradataml GeoDataFrame.
          • line - Generates line plot.
          • bar - Generates bar plot.
          • scatter - Generates scatter plot.
          • corr - Generates correlation plot.
          • wiggle - Generates a wiggle plot.
          • mesh - Generates a mesh plot.
          • geometry - Generates plot on geospatial data.
    • Plot:

      • Axis - Genertes the axis for plot.
      • Figure - Generates the figure for plot.
      • subplots - Helps in generating multiple plots on a single Figure.
    • Bring Your Own Model (BYOM) Function:

      • DataikuPredict - Score the data in Vantage using the model trained externally in Dataiku UI and stored in Vantage.
    • async_run_status() - Function to check the status of asynchronous run(s) using unique run id(s).

    • teradataml DataFrameColumn a.k.a. ColumnExpression
      • Regular Arithmetic Functions
        • DataFrameColumn.abs() - Computes the absolute value.
        • DataFrameColumn.ceil() - Returns the ceiling value of the column.
        • DataFrameColumn.ceiling() - It is an alias for DataFrameColumn.ceil() function.
        • DataFrameColumn.degrees() - Converts radians value from the column to degrees.
        • DataFrameColumn.exp() - Raises e (the base of natural logarithms) to the power of the value in the column, where e = 2.71828182845905.
        • DataFrameColumn.floor() - Returns the largest integer equal to or less than the value in the column.
        • DataFrameColumn.ln() - Computes the natural logarithm of values in column.
        • DataFrameColumn.log10() - Computes the base 10 logarithm.
        • DataFrameColumn.mod() - Returns the modulus of the column.
        • DataFrameColumn.pmod() - It is an alias for DataFrameColumn.mod() function.
        • DataFrameColumn.nullifzero() - Converts data from zero to null to avoid problems with division by zero.
        • DataFrameColumn.pow() - Computes the power of the column raised to expression or constant.
        • DataFrameColumn.power() - It is an alias for DataFrameColumn.pow() function.
        • DataFrameColumn.radians() - Converts degree value from the column to radians.
        • DataFrameColumn.round() - Returns the rounded off value.
        • DataFrameColumn.sign() - Returns the sign.
        • DataFrameColumn.signum() - It is an alias for DataFrameColumn.sign() function.
        • DataFrameColumn.sqrt() - Computes the square root of values in the column.
        • DataFrameColumn.trunc() - Provides the truncated value of columns.
        • DataFrameColumn.width_bucket() - Returns the number of the partition to which column is assigned.
        • DataFrameColumn.zeroifnull() - Converts data from null to zero to avoid problems with null.
      • Trigonometric Functions
        • DataFrameColumn.acos() - Returns the arc-cosine value.
        • DataFrameColumn.asin() - Returns the arc-sine value.
        • DataFrameColumn.atan() - Returns the arc-tangent value.
        • DataFrameColumn.atan2() - Returns the arc-tangent value based on x and y coordinates.
        • DataFrameColumn.cos() - Returns the cosine value.
        • DataFrameColumn.sin() - Returns the sine value.
        • DataFrameColumn.tan() - Returns the tangent value.
      • Hyperbolic Functions
        • DataFrameColumn.acosh() - Returns the inverse hyperbolic cosine value.
        • DataFrameColumn.asinh() - Returns the inverse hyperbolic sine value.
        • DataFrameColumn.atanh() - Returns the inverse hyperbolic tangent value.
        • DataFrameColumn.cosh() - Returns the hyperbolic cosine value.
        • DataFrameColumn.sinh() - Returns the hyperbolic sine value
        • DataFrameColumn.tanh() - Returns the hyperbolic tangent value.
      • String Functions
        • DataFrameColumn.ascii() - Returns the decimal representation of the first character in column.
        • DataFrameColumn.char2hexint() - Returns the hexadecimal representation for a character string in a column.
        • DataFrameColumn.chr() - Returns the Latin ASCII character of a given a numeric code value in column.
        • DataFrameColumn.char() - It is an alias for DataFrameColumn.chr() function.
        • DataFrameColumn.character_length() - Returns the number of characters in the column.
        • DataFrameColumn.char_length() - It is an alias for DataFrameColumn.character_length() function.
        • DataFrameColumn.edit_distance() - Returns the minimum number of edit operations required to
          transform string in a column into string specified in argument.
        • DataFrameColumn.index() - Returns the position of a string in a column where string specified in argument starts.
        • DataFrameColumn.initcap() - Modifies a string column and returns the string with the first character
          of each word in uppercase.
        • DataFrameColumn.instr() - Searches the string in a column for occurrences of search string passed as argument.
        • DataFrameColumn.lcase() - Returns a character string identical to string values ...
Read more

teradataml 17.20.00.04

24 Jul 08:18
dfed882
Compare
Choose a tag to compare
  • New Features/Functionality
    • teradataml is now compatible with SQLAlchemy 2.0.X

      • Important notes when user has sqlalchemy version >= 2.0:
        • Users will not be able to run the execute() method on SQLAlchemy engine object returned by
          get_context() and create_context() teradataml functions. This is because SQLAlchemy has
          removed the support for execute() method on the engine object.
          Thus, user scripts where get_context().execute() and create_context().execute(), is used,
          Teradata recommends to replace those with either execute_sql() function exposed by teradataml
          or exec_driver_sql() method on the Connection object returned by get_connection() function
          in teradataml.
          from teradataml import execute_sql
          execute_sql("DROP TABLE test_select")

          get_connection().exec_driver_sql("select sessionno from DBC.SessionInfoV where UserName = 'alice';")

        • Now get_connection().execute() accepts only executable sqlalchemy object. Refer to
          sqlalchemy.engine.base.execute() for more details.

    • New utility function execute_sql() is added to execute the SQL.

    • Extending compatibility for native MAC with ARM processors.

    • Added support for floor division (//) between two teradataml DataFrame Columns.

    • Analytics Database Analytic Functions:

      • GLMPerSegment()
      • GLMPredictPerSegment()
      • OneClassSVM()
      • OneClassSVMPredict()
      • SVM()
      • SVMPredict()
      • TargetEncodingFit()
      • TargetEncodingTransform()
      • TrainTestSplit()
      • WordEmbeddings()
      • XGBoost()
      • XGBoostPredict()
    • teradataml Options
      • Display Options
        • display.geometry_column_length
          Option to display the default length of geometry column in GeoDataFrame.
    • Updates
      • set_auth_token() function can generate the client id automatically based on org_id when user do not specify it.
      • Analytics Database Analytic Functions:
        • ColumnTransformer()
          • Does not allow list values for arguments - onehotencoding_fit_data and ordinalencoding_fit_data.
        • OrdidnalEncodingFit()
          • New arguments added - category_data, target_column_names, categories_column, ordinal_values_column.
          • Allows the list of values for arguments - target_column, start_value, default_value.
        • OneHotEncodingFit()
          • New arguments added - category_data, approach, target_columns, categories_column, category_counts.
          • Allows the list of values for arguments - target_column, other_column.
    • Bug Fixes
      • DataFrame.sample() method output is now deterministic.
      • copy_to_sql() now preserves the rows of the table even when the view content is copied to the same table name.
      • list_user_envs() does not raise warning when no user environments found.

teradataml 17.20.00.03

04 May 05:17
Compare
Choose a tag to compare

Teradata Python package for Advanced Analytics.

teradataml makes available to Python users a collection of analytic functions that reside on Teradata Vantage. This allows users to perform analytics on Teradata Vantage with no SQL coding. In addition, the teradataml library provides functions for scaling data manipulation and transformation, data filtering and sub-setting, and can be used in conjunction with other open-source python libraries.

For community support, please visit the Teradata Community.

For Teradata customer support, please visit Teradata Support.

Copyright 2023, Teradata. All Rights Reserved.

Table of Contents

Release Notes:

teradataml 17.20.00.03

  • Updates
    • DataFrame.join
      • New arguments lprefix and rprefix added.
      • Behavior of arguments lsuffix and rsuffix will be changed in future, use new arguments instead.
      • New and old affix arguments can now be used independently.
    • Analytic functions can be imported regardless of context creation.
      Import after create context constraint is now removed.
    • ReadNOS and WriteNOS now accept dictionary value for authorization and row_format arguments.
    • WriteNOS supports writing CSV files to external store.
    • Following model cataloging APIs will be deprecated in future:
      • describe_model
      • delete_model
      • list_models
      • publish_model
      • retrieve_model
      • save_model
  • Bug Fixes
    • copy_to_sql() bug related to NaT value has been fixed.
    • Tooltip on PyCharm IDE now points to SQLE.
    • value argument of FillNa(), a Vantage Analytic Library function supports special characters.
    • case function accepts DataFrame column as value in whens argument.

Release Notes:

teradataml 17.20.00.02

  • New Features/Functionality
    • teradataml: Open Analytics
      • New Functions
        • set_auth_token() - Sets the JWT token automatically for using Open AF API's.
    • teradataml Options
      • Display Options
        • display.suppress_vantage_runtime_warnings
          Suppresses the VantageRuntimeWarning raised by teradataml, when set to True.
    • Updates
      • SimpleImputeFit function arguments stats_columns and stats are made to be optional.
      • New argument table_format is added to ReadNOS().
      • Argument full_scan is changed to scan_pct in ReadNOS().
    • Bug Fixes
      • Minor bug fix related to read_csv.
      • APPLY and DataFrame.apply() supports hash by and local order by.
      • Output column names are changed for DataFrame.dtypes and DataFrame.tdtypes.

Release Notes:

teradataml 17.20.00.01

  • New Features/Functionality
    • teradataml: DataFrame
      • New Functions
        • DataFrame.pivot() - Rotate data from rows into columns to create easy-to-read DataFrames.
        • DataFrame.unpivot() - Rotate data from columns into rows to create easy-to-read DataFrames.
        • DataFrame.drop_duplicate() - Drop duplicate rows from teradataml DataFrame.
      • New properties
        • Dataframe.is_art - Check whether teradataml DataFrame is created on an Analytic Result Table, i.e., ART table or not.
    • teradataml: Unbounded Array Framework (UAF) Functions:
      • New Functions

        • New Functions Supported on Database Versions: 17.20.x.x
          • MODEL PREPARATION AND PARAMETER ESTIMATION functions:
            1. ACF()
            2. ArimaEstimate()
            3. ArimaValidate()
            4. DIFF()
            5. LinearRegr()
            6. MultivarRegr()
            7. PACF()
            8. PowerTransform()
            9. SeasonalNormalize()
            10. Smoothma()
            11. UNDIFF()
            12. Unnormalize()
          • SERIES FORECASTING functions:
            1. ArimaForecast()
            2. DTW()
            3. HoltWintersForecaster()
            4. MAMean()
            5. SimpleExp()
          • DATA PREPARATION functions:
            1. BinaryMatrixOp()
            2. BinarySeriesOp()
            3. GenseriesFormula()
            4. MatrixMultiply()
            5. Resample()
          • DIAGNOSTIC STATISTICAL TEST functions:
            1. BreuschGodfrey()
            2. BreuschPaganGodfrey()
            3. CumulPeriodogram()
            4. DickeyFuller()
            5. DurbinWatson()
            6. FitMetrics()
            7. GoldfeldQuandt()
            8. Portman()
            9. SelectionCriteria()
            10. SignifPeriodicities()
            11. SignifResidmean()
            12. WhitesGeneral()
          • TEMPORAL AND SPATIAL functions:
            1. Convolve()
            2. Convolve2()
            3. DFFT()
            4. DFFT2()
            5. DFFT2Conv()
            6. DFFTConv()
            7. GenseriesSinusoids()
            8. IDFFT()
            9. IDFFT2()
            10. LineSpec()
            11. PowerSpec()
          • GENERAL UTILITY functions:
            1. ExtractResults()
            2. InputValidator()
            3. MInfo()
            4. SInfo()
            5. TrackingOp()
      • New Features: Inputs to Unbounded Array Framework (UAF) functions

        • TDAnalyticResult() - Allows to prepare function output generated by UAF functions to be passed.
        • TDGenSeries() - Allows to generate a series, that can be passed to a UAF function.
        • TDMatrix() - Represents a Matrix in time series, that can be created from a teradataml DataFrame.
        • TDSeries() - Represents a Series in time series, that can be created from a teradataml DataFrame.
    • Updates
      • Native Object Store (NOS) functions support authorization by specifying authorization object.
      • display_analytic_functions() categorizes the analytic functions based on function type.
      • ColumnTransformer accepts multiple values for arguments nonlinearcombine_fit_data,
        onehotencoding_fit_data, ordinalencoding_fit_data.
    • Bug Fixes
      • Redundant warnings thrown by teradataml are suppressed.
      • OpenAF supports when context is created with JWT Token.
      • New argument "match_column_order" added to copy_to_sql, that allows DataFrame loading with any column order.
      • copy_to_sql updated to map data type timezone(tzinfo) to TIMESTAMP(timezone=True), instead of VARCHAR.
      • Improved performance for DataFrame.sum and DataFrameColumn.sum functions.

Release Notes:

teradataml 17.20.00.00

  • New Features/Functionality
    • teradataml: Analytics Database Analytic Functions
      • New Functions
        • New Functions Supported on Database Versions: 17.20.x.x
          • ANOVA()
          • ClassificationEvaluator()
          • ColumnTransformer()
          • DecisionForest()
          • GLM​()
          • GetFutileColumns()
          • KMeans()
          • KMeansPredict()​​
          • NaiveBayesTextClassifierTrainer()
          • NonLinearCombineFit()
          • NonLinearCombineTransform()
          • OrdinalEncodingFit​()
          • OrdinalEncodingTransform()
          • RandomProjectionComponents​()
          • RandomProjectionFit​()
          • RandomProjectionTransform()
          • RegressionEvaluator​()
          • ROC​()
          • SentimentExtractor()
          • Silhouette​()
          • TDGLMPredict​()
          • TextParser​()
          • VectorDistance()
      • Updates
        • display_analytic_functions() categorizes the analytic functions based on function type.
        • Users can provide range value for columns argument.
    • teradataml: Open Analytics
      • Manage all user environments.
        • list_base_envs() - list the available python base versions.​
        • create_env() - create a new user environment. ​
        • get_env() - get existing user environment.
        • list_user_envs() - list the available user environments.​
        • remove_env() - delete user environment.​
        • remove_all_envs() - delete all the user environments.
      • UserEnv Class – Manage individual user environment.
        • Properties
          • files - Get files in user environment.
          • libs - Get libraries in user environment.
        • Methods
          • install_file() - Install a file in user environment.​
          • remove_file() - Remove a file in user environment.​
          • install_lib() - Install a library in user environment.​
          • update_lib() - Update a library in user environment.​
          • uninstall_lib() - Uninstall a library in user environment.​
          • status() - Check the status of​
            • file installation​
            • library installation​
            • library update​
            • library uninstallation​
          • refresh() - Refresh the environment details in local client.
      • Apply Class – Execute a user script on VantageCloud Lake.​
        • __init__() - Instantiate an object of apply for script execution.​
        • install_file() - Install a file in user environment.​
        • remove_file() - Remove a file in user environment.​
        • set_data() – Reset data and related arguments.​
        • execute_script() – Executes Python script.
    • teradataml: DataFrame
      • New Functions
        • DataFrame.apply() - Execute a user defined Python function on VantageLake Cloud.
    • teradataml: Bring Your Own Model
      • New Functions
        • ONNXPredict() - Score using model trained externally on ONNX and stored in Vantage.
    • teradataml: Options
      • New Functions
        • set_config_params() New API to set all config params in one go.
      • New Configuration Options
        • For Open Analytics support.​
          • ues_url – User Environment Service URL for ...
Read more