Skip to content

On-premise users: click in-app to access the full platform documentation for your version of DataRobot.

Wrangle large Snowflake datasets

This page describes how to improve performance and interactivity when wrangling large Snowflake datasets in Workbench.

Increase Snowflake warehouse size

Snowflake warehouse size specifies the compute resources available per cluster, therefore, increasing your warehouse size will reduce the time it takes to execute wrangling queries.

See the Snowflake documentation on increasing warehouse size.

Change the sampling method

When generating the live wrangling preview, DataRobot, by default, retrieves a random sample from the source table. To reduce the time it takes to execute the query in Snowflake and display the preview, you can change the sampling method so DataRobot retrieves the First-N Rows instead.

For step-by-step instructions, see the documentation on choosing a sampling method.

Reduce the sample size

To generate a live wrangling preview, DataRobot executes the query directly in Snowflake. By default, the preview uses 10000 random rows from the source table to generate insights, however, you can reduce the number of rows sampled to decrease the time it takes to execute the query in Snowflake.

This method is particularly helpful for wide (hundreds of features) and heavy (many long text features) datasets where 10000 rows may require significant resources and time to process.

For step-by-step instructions, see the documentation on configuring the live sample.


Updated December 8, 2023