NextGen experience > Data preparation > Data preparation reference > Wrangle large Snowflake datasets

Wrangle large Snowflake datasets¶

This page describes how to improve performance and interactivity when wrangling large Snowflake datasets in Workbench.

Increase Snowflake warehouse size¶

Snowflake warehouse size specifies the compute resources available per cluster, therefore, increasing your warehouse size will reduce the time it takes to execute wrangling queries.

See the Snowflake documentation on increasing warehouse size.

Change the sampling method¶

When generating the live wrangling preview, DataRobot, by default, retrieves a random sample from the source table. To reduce the time it takes to execute the query in Snowflake and display the preview, you can change the sampling method so DataRobot retrieves the First-N Rows instead.

For step-by-step instructions, see the documentation on choosing a sampling method.

Reduce the sample size¶

To generate a live wrangling preview, DataRobot executes the query directly in Snowflake. By default, the preview uses 10000 random rows from the source table to generate insights, however, you can reduce the number of rows sampled to decrease the time it takes to execute the query in Snowflake.

This method is particularly helpful for wide (hundreds of features) and heavy (many long text features) datasets where 10000 rows may require significant resources and time to process.

For step-by-step instructions, see the documentation on configuring the live sample.

Wrangle large Snowflake datasets¶

Increase Snowflake warehouse size¶

Change the sampling method¶

Reduce the sample size¶

Was this page helpful?

Great! Let us know what you found helpful.

What can we do to improve the content?