AI Platform releases > Self-Managed AI Platform releases > Version 10.0.x > Data, modeling, and apps (V10.0)

Data, modeling, and apps (V10.0)¶

April 29, 2024

The DataRobot v10.0.0 release includes many new data, modeling, and apps, as well as admin, features and enhancements, described below. See additional details of Release 10.0 in the MLOps and code-first release announcements.

In the spotlight¶

DataRobot introduces extensible, fully-customizable, cloud-agnostic GenAI capabilities¶

With DataRobot GenAI capabilities, you can generate text content using a variety of pre-trained large language models (LLMs). Additionally, you can tailor the content to your data by building vector databases and leveraging them in the LLM blueprints. The DataRobot GenAI offering builds off of DataRobot's predictive AI experience to provide confidence scores and enable you to bring your favorite libraries, choose your LLMs, and integrate third-party tools. Via a hosted notebook or DataRobot’s UI, embed or deploy AI wherever it will drive value for your business and leverage built-in governance for each asset in the pipeline. Through the DataRobot UI you can:

Build vector databases, both in the UI, and for deployed vector databases, with code.
Create playgrounds where you can build and compare LLM blueprints.
Configure evaluation and moderation guardrails.

And more.

Deploy LLM blueprints to production

Read an example use case of building an end-to-end generative AI solution at scale.

More docs coming soon!

The generative AI documentation is "continuous deployment." The most complete content can be found on the DataRobot public documentation site.

10.0 release¶

Release v10.0 provides updated UI string translations for the following languages:

Japanese
French
Spanish
Korean
Brazilian Portuguese

Features grouped by capability

Name	GA	Preview
Data
Native Databricks connector added	✔
Native AWS S3 connector added	✔
Improved recipe management in Workbench	✔
Enable in-source materialization for wrangled BigQuery and Snowflake datasets	✔
All wrangling operations generally available	✔
Support for Parquet file ingestion	✔
Publish wrangling recipes with smart downsampling	✔
Perform Feature Discovery in Workbench		✔
Data improvements added to Workbench		✔
Improved recipe management in Workbench		✔
ADLS Gen2 connector added to DataRobot		✔
Modeling
Workbench adds two partitioning methods for predictive projects	✔
Sliced insights for time-aware experiments now GA in DataRobot Classic	✔
Date/time partitioning for time-aware experiments now GA	✔
Save Workbench experiment progress prior to modeling	✔
Uploading, modeling, and generating insights on 10GB for OTV experiments	✔
GPU support now available in Workbench		✔
SHAP-based Individual Prediction Explanations in Workbench		✔
Word Cloud now available in Workbench		✔
Migrate projects and notebooks from DataRobot Classic to NextGen		✔
Compute Prediction Explanations for data in OTV and time series projects		✔
Create custom feature lists in existing experiments		✔
Scalability up to 100GB now available with incremental learning		✔
Apps
Chat generation Q&A application (MLOps)	✔*
Admin
Add a Microsoft Teams notification channel	✔
Updated user settings interface	✔
Single-tenant SaaS	✔

See these important deprecation announcements for information about changes to DataRobot's support for older, expiring functionality. This document also describes DataRobot's fixed issues.

Data enhancements¶

GA¶

Native Databricks connector added¶

The native Databricks connector, which lets you access data in Databricks on Azure or AWS, is now generally available in DataRobot. In addition to performance enhancements, the new connector also allows you to:

Create and configure data connections.
Authenticate a connection via service principal as well as sharing service principal credentials through secure configurations.
Add Databricks datasets to a Use Case.
Wrangle Databricks datasets, and then publish recipes to Databricks to materialize the output in the Data Registry.
Use the public python API client to access data via the Databricks connector.

Native AWS S3 connector added¶

The new AWS S3 connector is now generally available in DataRobot. In addition to performance enhancements, this connector also enables support for AWS S3 in Workbench, allowing you to:

Create and configure data connections.
Add AWS S3 datasets to a Use Case.

Improved recipe management in Workbench¶

This release introduces the following enhancements when wrangling data in Workbench:

When you wrangle a dataset in your Use Case, including re-wrangling the same dataset, DataRobot creates and saves a copy of the recipe in the Data tab regardless of whether or not you add operations to it. Then, each time you modify the recipe, your changes are automatically saved. Additionally, you can open saved recipes to continue making changes.
In a Use Case, the Datasets tab has been replaced by the Data tab, which now lists both datasets and recipes. New icons have been also been added to the Data tab to quickly distinguish between datasets and recipes.
During a wrangling session, add a helpful name and description to your recipe for context when re-wrangling a recipe in the future.

Enable in-source materialization for wrangled BigQuery and Snowflake datasets¶

In-source materialization is now generally available when wrangling BigQuery and Snowflake datasets. In Publishing Settings, click either Publish to BigQuery or Publish to Snowflake depending on your data source. Selecting this option materializes an output dynamic dataset in the Data Registry as well as your data source. This allows you to leverage the security, compliance, and financial controls specified within its environment.

All wrangling operations generally available¶

The Join and Aggregate wrangling operations, previously available for preview, are now generally available in Workbench.

Support for Parquet file ingestion¶

Ingestion of Parquet files is now GA for the AI Catalog, training datasets, and predictions datasets. The following Parquet file types are supported:

Single Parquet files
Single zipped Parquet files
Multiple Parquet files (registered as separate datasets)
Zipped multi-Parquet file (merged to create a single dataset in DataRobot)

Publish wrangling recipes with smart downsampling¶

After building a wrangling recipe in Workbench, enable smart downsampling in the publishing settings to reduce the size of your output dataset and optimize model training. Smart downsampling is a data science technique that reduces the time it takes to fit a model without sacrificing accuracy, as well as account for class imbalance by stratifying the sample by class.

Preview¶

Perform Feature Discovery in Workbench¶

Now available for preview, perform Feature Discovery in Workbench to discover and generate new features from multiple datasets. You can initiate Feature Discovery in two places:

The Data tab, to the right of the dataset that will serve as the primary dataset, click the more options icon > Feature Discovery.
The data explore page of a specific dataset, click Data actions > Start feature discovery.

On this page, you can add secondary datasets and configure relationships between the datasets.

Publishing a Feature Discovery recipe instructs DataRobot to perform the specific joins and aggregations, generating a new output dataset that is then registered in the Data Registry and added to your current Use Case.

Preview documentation.

Feature flag(s) ON by default: Enable Feature Discovery in Workbench

Data improvements added to Workbench¶

This release introduces the following enhancements, available for preview, when working with data in Workbench:

The data explore page now supports dataset versioning, as well as the ability to rename and download datasets.
The feature list dropdown is now a separate tab on the data explore page.
Autocomplete functionality has been improved for the Compute New Feature operation.
You can now use dynamic datasets to set up an experiment.

Feature flag(s) OFF by default: Enable Enhanced Data Explore View

Improved recipe management in Workbench¶

Now available for preview, when you wrangle a dataset in your Use Case, including re-wrangling the same dataset, DataRobot creates and saves a copy of the recipe in the Data tab regardless of whether or not you add operations to it. Then, each time you modify the recipe, your changes are automatically saved. Additionally, you can open saved recipes to continue making changes.

New icons have been added to the Data tab to quickly distinguish between datasets and recipes.

During a wrangling session, add a helpful name and description to your recipe for context when re-wrangling a recipe in the future.

Preview documentation.

Feature flag OFF by default: Enable Recipe Management in Workbench

ADLS Gen2 connector added to DataRobot¶

Support for the ADLS Gen2 native connector has been added to both DataRobot Classic and Workbench, allowing you to:

Create and configure data connections.
Add ADLS Gen2 datasets.

Preview documentation.

Feature flag ON by default: Enable ADLS Gen2 Connector

Automatically generate relationships for Feature Discovery¶

Now available for public preview, DataRobot can automatically detect and generate relationships between datasets in Feature Discovery projects, allowing you to quickly explore potential relationships when you’re unsure of how they connect. To automatically generate relationships, make sure all secondary datasets are added to your project, and then click Generate Relationships at the top of the Define Relationships page.

Preview documentation.

Feature flag OFF by default: Enable Feature Discovery Relationship Detection

Modeling features¶

GA¶

Workbench adds two partitioning methods for predictive projects¶

Now GA, Workbench now supports user-defined grouping (“column-based” or “partition feature” in Classic) and automated grouping (“group partitioning” in classic). While less common, user-defined and automated group partitioning provide a method for partitioning by partition feature—a feature from the dataset that is the basis of grouping. To use grouping, select which method based on the cardinality of the partition feature. Once selected, choose a validation type; see the documentation for assistance in selecting the appropriate validation type and more details about using grouping for partitioning.

Sliced insights for time-aware experiments now GA in DataRobot Classic¶

With this deployment, sliced insights for OTV and time series projects are now generally available for Lift Chart, ROC Curve, Feature Effects, and Feature Impact in DataRobot Classic. Sliced insights provide the option to view a subpopulation of a model's derived data based on feature values. Use the segment-based accuracy information gleaned from sliced insights, or compare the segments to the "global" slice (all data), to improve training data, create individual models per segment, or augment predictions post-deployment.

Date/time partitioning for time-aware experiments now GA¶

The ability to create time-aware experiments—either predictive or forecasting with time series—is now generally available. With a simplified workflow that shares much of the setup for row-by-row predictions and forecasting, clearer backtest modification tools, and the ability to reset changes before building, you can now quickly and easily work with time-relevant data.

Save Workbench experiment progress prior to modeling¶

When creating Workbench experiments, you can now save your progress as a draft, navigate away from the experiment setup page, and return to the setup later. Draft status is indicated in the Use Case’s home page.

Uploading, modeling, and generating insights on 10GB for OTV experiments¶

To improve the scalability and user experience for OTV (out-of-time validation) experiments, this deployment introduces scaling for larger datasets. When using out-of-time validation as the partitioning method, DataRobot no longer needs to downsample for datasets as large as 10GB. Instead, a multistage Autopilot process supports the greatly expanded input allowance.

Preview¶

GPU support now available in Workbench¶

In additional automation settings, you can now enable GPU workers for Workbench experiments that include text and/or images and require deep learning models. Training on GPUs speeds up training time. DataRobot detects blueprints that contain certain tasks and, when detected, includes GPU-supported blueprints both in Autopilot and in the blueprint repository.

Preview documentation.

Feature flag OFF by default: Enable GPU Workers

SHAP-based Individual Prediction Explanations in Workbench¶

SHAP-based explanations help to understand what drives predictions on a row-by-row basis by providing an estimation of how much each feature contributes to a given prediction differing from the average. With their introduction to Workbench, SHAP explanations are available for all model types; XEMP-based explanations are not available in Use Case experiments. Use the controls in the insight to set the data partition, apply slices, and set a prediction range.

Note

To better communicate this feature’s functionality as a local explanation method that calculates SHAP values for each individual row, we have updated the name from SHAP Prediction Explanations to Individual Prediction Explanations.

Preview documentation.

Feature flag ON by default: Universal SHAP in NextGen

Word Cloud now available in Workbench¶

Word Cloud, a text-based insight for classification and regression projects, is now available as preview in Workbench. It displays up to 200 of the most impactful words and short phrases, helping to understand the correlation of a word to the target. When viewing the Word Cloud, you can view individual word details, filter the display, and export the insight.

Preview documentation.

Feature flag ON by default: Word Cloud in Workbench

Multiclass experiments now available in Workbench¶

Available as a preview feature, you can now create multiclass experiments in Workbench. DataRobot determines the experiment type based on the number of values for a given target feature. If more than two, the experiment is handled as either multiclass or regression (for numeric targets). Special handling by DataRobot allows:

Changing a regression experiment to multiclass.
Using aggregation, with configurable settings, to support more than 1000 classes.

Once a model is built, a multiclass confusion matrix helps to visualize where the model is, perhaps, mislabeling one class as another.

Preview documentation.

Feature flag ON by default: Unlimited multiclass

New configuration options added to Workbench¶

This deployment adds several Public Preview settings during experiment setup, covering many common cases for users. New settings include the ability to:

Change the modeling mode. Previously Workbench ran quick Autopilot only, now you can set the mode to manual mode (for building via the blueprint repository) or Comprehensive mode (not available for time-aware).
Change the optimization metric—the metric that defines how DataRobot scores your models—from the metric selected by DataRobot to any supported metric appropriate for your experiment.
Configure additional settings, such as offset/weight/exposure, monotonic feature constraints, and positive class selection.

Preview documentation.

Feature flag ON by default: UXR Advanced Modeling Options

Migrate projects and notebooks from DataRobot Classic to NextGen¶

Now available for preview, DataRobot allows you to transfer projects and notebooks created in DataRobot Classic to DataRobot NextGen. You can export projects in DataRobot Classic and add them to a Use Case in Workbench as an experiment. Notebooks can also be exported from Classic and added to a Use Case in Workbench.

Preview documentation.

Feature flag OFF by default: Enable Asset Migration

Compute Prediction Explanations for data in OTV and time series projects¶

Now available for preview, you can compute Prediction Explanations for time series and OTV projects. Specifically, you can get XEMP Prediction Explanations for the holdout partition and sections of the training data. DataRobot only computes Prediction Explanations for the validation partition of backtest one in the training data.

Preview documentation.

Feature flag OFF by default: Enable Prediction Explanations on training data for time-aware Projects

Create custom feature lists in existing experiments¶

In Workbench, you can now add new, custom feature lists to an existing predictive or forecasting experiment through the UI. DataRobot automatically creates several feature lists, which control the subset of features that DataRobot uses to build models and make predictions, on data ingest. Now, you can create your own lists from the Feature Lists or Data tabs in the Experiment information window accessed from the Leaderboard. Use bulk selections to choose multiple features with a single click:

Preview documentation.

Feature flags ON by default: Enable Data and Feature Lists tabs in Workbench, Enable Feature Lists in Workbench Preview, Enable Workbench Feature List Creation

Scalability up to 100GB now available with incremental learning¶

Available as a preview feature, you can now train models on up to 100GB of data using incremental learning (IL) for binary and regression project types. IL is a model training method specifically tailored for large datasets that chunks data and creates training iterations. Configuration options allow you to control whether a top model trains on one—or all—iterations and whether training stops if accuracy improvements plateau. After model building begins, you can compare trained iterations and optionally assign a different active version or continue training. The active iteration is the basis for other insights and is used for making predictions.

Preview documentation.

Feature flags OFF by default:: Enable Incremental Learning, Enable Data Chunking

Apps¶

See Chat generation Q&A application in the MLOps enhancements.

Admin enhancements¶

GA¶

Add a Microsoft Teams notification channel¶

Admins can now configure a notification channel for Microsoft Teams. Notification channels are mechanisms for delivering notifications created by admins. You may want to set up several channels for each type of notification; for example, a webhook with a URL for deployment-related events, and a webhook for all project-related events.

Updated user settings interface¶

As part of DataRobot NextGen, DataRobot has updated the interface for user settings, including data connection and developer tools pages. Some settings have been renamed and the configuration pages have been repainted. To view the updated settings, select your user icon and browse the settings in the menu.

Single-tenant SaaS¶

DataRobt has released a single-tenant SaaS solution, featuring a public internet access networking option. With this enhancement, you can seamlessly leverage DataRobot's AI capabilities while securely connecting across the web, unlocking unprecedented flexibility and scalability.

All product and company names are trademarks™ or registered® trademarks of their respective holders. Use of them does not imply any affiliation with or endorsement by them.

Data, modeling, and apps (V10.0)¶

In the spotlight¶

DataRobot introduces extensible, fully-customizable, cloud-agnostic GenAI capabilities¶

10.0 release¶

Data enhancements¶

GA¶

Native Databricks connector added¶

Native AWS S3 connector added¶

Improved recipe management in Workbench¶

Enable in-source materialization for wrangled BigQuery and Snowflake datasets¶

All wrangling operations generally available¶

Support for Parquet file ingestion¶

Publish wrangling recipes with smart downsampling¶

Preview¶

Perform Feature Discovery in Workbench¶

Data improvements added to Workbench¶

Improved recipe management in Workbench¶

ADLS Gen2 connector added to DataRobot¶

Automatically generate relationships for Feature Discovery¶

Modeling features¶

GA¶

Workbench adds two partitioning methods for predictive projects¶

Sliced insights for time-aware experiments now GA in DataRobot Classic¶

Date/time partitioning for time-aware experiments now GA¶

Save Workbench experiment progress prior to modeling¶

Uploading, modeling, and generating insights on 10GB for OTV experiments¶

Preview¶

GPU support now available in Workbench¶

SHAP-based Individual Prediction Explanations in Workbench¶

Word Cloud now available in Workbench¶

Multiclass experiments now available in Workbench¶

New configuration options added to Workbench¶

Migrate projects and notebooks from DataRobot Classic to NextGen¶

Compute Prediction Explanations for data in OTV and time series projects¶

Create custom feature lists in existing experiments¶

Scalability up to 100GB now available with incremental learning¶

Apps¶

Admin enhancements¶

GA¶

Add a Microsoft Teams notification channel¶

Updated user settings interface¶

Single-tenant SaaS¶

Was this page helpful?

Great! Let us know what you found helpful.

What can we do to improve the content?