{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Schedule predictions with a JDBC database\n", "\n", "Making predictions on a daily or monthly basis is a manual, time-consuming, and cumbersome process. Batch predictions are commonly used when you have to score new records over a certain frame of time (weeks, months, etc.). For example, you can use batch predictions to score new leads on a monthly basis to predict who will churn, or to predict on a daily basis which products someone is likely to purchase.\n", "\n", "This notebook outlines how to use DataRobot's Python client to schedule batch prediction jobs and write them to a JDBC database. Specifically, you will:\n", "\n", "1. Retrieve existing data stores and credential information.\n", "2. Configure prediction job specifications.\n", "3. Set up a prediction job schedule.\n", "4. Run a test prediction job and enable an automated schedule for scoring.\n", "\n", "Before proceeding, note that this workflow requires a [deployed DataRobot model](https://docs.datarobot.com/en/docs/mlops/deployment/deploy-methods/index.html) object to use for scoring and an established [data connection](https://docs.datarobot.com/en/docs/data/connect-data/data-conn.html) to read data and host prediction writeback. For more information about the Python client, reference the [documentation](https://datarobot-public-api-client.readthedocs-hosted.com)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Import libraries" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": {}, "inputWidgets": {}, "nuid": "42a73bc4-375b-4267-a7c3-d13dbfe3a732", "showTitle": false, "title": "" } }, "outputs": [], "source": [ "import datarobot as dr\n", "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Connect to DataRobot\n", "\n", "Read more about different options for [connecting to DataRobot from the client](https://docs.datarobot.com/en/docs/api/api-quickstart/api-qs.html)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": {}, "inputWidgets": {}, "nuid": "94c61bf1-2711-4faa-a9be-df126384ac8b", "showTitle": false, "title": "" } }, "outputs": [], "source": [ "# If the config file is not in the default location described in the API Quickstart guide, '~/.config/datarobot/drconfig.yaml', then you will need to call\n", "# dr.Client(config_path='path-to-drconfig.yaml')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### List data stores\n", "\n", "To enable integration with a variety of enterprise databases, DataRobot provides a “self-service” JDBC product for database connectivity setup. Once configured, you can read data from production databases for model building and predictions. This allows you to quickly train and retrain models on that data while avoiding the unnecessary step of exporting data from your enterprise database to a CSV for ingest to DataRobot. It allows access to more diverse data, which results in more accurate models.\n", "\n", "Use the cell below to query all data sources tied to a DataRobot account. The second line lists each datastore with an alphanumeric string; that is the datastore ID." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": {}, "inputWidgets": {}, "nuid": "caebb795-0f52-48da-90f3-220bbbbe1cc4", "showTitle": false, "title": "" } }, "outputs": [ { "data": { "text/html": [ "\n", "
60d9e03183eea5b896391ace Snowflake Sandbox <datarobot.models.data_store.DataStoreParameters object at 0x7fc1534a2fd0>\n", "612039d524db15df18f4848c Snowflake <datarobot.models.data_store.DataStoreParameters object at 0x7fc1534a2df0>\n", "614ca745c7fab1f23da7a632 Snowflake Connection <datarobot.models.data_store.DataStoreParameters object at 0x7fc1534a2fa0>\n", "615fc79edb76fcafb2e62eb1 SQL Connection <datarobot.models.data_store.DataStoreParameters object at 0x7fc1534a2340>\n", "618acfa3b2ca93c91b0c072f DRDemo_DC_ContinuousAI_NYC311_00 <datarobot.models.data_store.DataStoreParameters object at 0x7fc1534a2f40>\n", "618c42960cd0c7523e9184ba DEMO ENGR <datarobot.models.data_store.DataStoreParameters object at 0x7fc1534a2370>\n", "6269970b0ac03f0dfb3413a0 Azure Databricks <datarobot.models.data_store.DataStoreParameters object at 0x7fc1534a24c0>\n", "62829ec4c1565bddb2643592 Big Query <datarobot.models.data_store.DataStoreParameters object at 0x7fc1534a2280>\n", "
" ] }, "metadata": { "application/vnd.databricks.v1+output": { "addedWidgets": {}, "arguments": {}, "data": "
60d9e03183eea5b896391ace Snowflake Sandbox <datarobot.models.data_store.DataStoreParameters object at 0x7fc1534a2fd0>\n612039d524db15df18f4848c Snowflake <datarobot.models.data_store.DataStoreParameters object at 0x7fc1534a2df0>\n614ca745c7fab1f23da7a632 Snowflake Connection <datarobot.models.data_store.DataStoreParameters object at 0x7fc1534a2fa0>\n615fc79edb76fcafb2e62eb1 SQL Connection <datarobot.models.data_store.DataStoreParameters object at 0x7fc1534a2340>\n618acfa3b2ca93c91b0c072f DRDemo_DC_ContinuousAI_NYC311_00 <datarobot.models.data_store.DataStoreParameters object at 0x7fc1534a2f40>\n618c42960cd0c7523e9184ba DEMO ENGR <datarobot.models.data_store.DataStoreParameters object at 0x7fc1534a2370>\n6269970b0ac03f0dfb3413a0 Azure Databricks <datarobot.models.data_store.DataStoreParameters object at 0x7fc1534a24c0>\n62829ec4c1565bddb2643592 Big Query <datarobot.models.data_store.DataStoreParameters object at 0x7fc1534a2280>\n
", "datasetInfos": [], "metadata": {}, "removedWidgets": [], "type": "html" } }, "output_type": "display_data" } ], "source": [ "for d in dr.DataStore.list():\n", " print(d.id, d.canonical_name, d.params)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Retrieve credentials list\n", "\n", "You can reference the [DataRobot documentation](https://docs.datarobot.com/en/docs/data/connect-data/stored-creds.html#credentials-management) for more information about managing credentials." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": {}, "inputWidgets": {}, "nuid": "9305bcfc-742a-4a7b-80b5-ca400693e72b", "showTitle": false, "title": "" } }, "outputs": [], "source": [ "dr.Credential.list()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The output above returns multiple sets of credentials. The alphanumeric string included in each item of the list is the credentials ID. You can use that ID to access credentials through the API." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Specify the deployment and data connection\n", "\n", "Use the snippet below to indicate the deployment you want to use (by binding the deployment ID, retrieved from the deployment's [Overview tab](https://docs.datarobot.com/en/docs/mlops/monitor/dep-overview.html)) and the data store to which you want to write predictions (by providing the data store ID and the corresponding credentials ID)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": {}, "inputWidgets": {}, "nuid": "b6ddd55f-bddb-4dfc-be1e-4124a7235115", "showTitle": false, "title": "" } }, "outputs": [], "source": [ "deployment_id = \"620219bb18f7f84dec6cec59\"\n", "\n", "datastore_id = \"614ca745c7fab1f23da7a632\"\n", "data_store = dr.DataStore.get(datastore_id)\n", "\n", "credential_id = \"63865454a351b56ce3cb78b3\"\n", "cred = dr.Credential.get(credential_id)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Configure intake settings\n", "\n", "Use the snippet below to configure the intake settings for JDBC scoring. For more information, reference the [batch predictions documentation in the Python client](https://datarobot-public-api-client.readthedocs-hosted.com/en/v2.25.0/entities/batch_predictions.html) and the [intake options documentation](https://docs.datarobot.com/en/docs/api/reference/batch-prediction-api/intake-options.html)." ] }, { "cell_type": "markdown", "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": {}, "inputWidgets": {}, "nuid": "255b62ff-a6e8-491c-9156-a552aad2104c", "showTitle": false, "title": "" } }, "source": [ "intake_settings = {\n", " 'type': 'jdbc',\n", " 'table': 'LENDING_CLUB_10K',\n", " 'schema': 'TRAINING', # optional, if supported by database\n", " 'catalog': 'DEMO', # optional, if supported by database \n", " 'data_store_id': data_store.id,\n", " 'credential_id': cred.credential_id,\n", "}\n", "\n", "print(intake_settings)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Configure output settings\n", "\n", "Use the snippet below to configure the output settings for JDBC scoring. For more information, reference the [output options documentation](https://docs.datarobot.com/en/docs/api/reference/batch-prediction-api/output-options.html#statement-types)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": {}, "inputWidgets": {}, "nuid": "298301ab-459f-4e48-bb92-0d294ed2d0e8", "showTitle": false, "title": "" } }, "outputs": [ { "data": { "text/html": [ "\n", "
{'type': 'jdbc', 'table': 'LENDING_CLUB_10K_AA_Temp', 'schema': 'SCORING', 'catalog': 'SANDBOX', 'statement_type': 'insert', 'create_table_if_not_exists': True, 'data_store_id': '614ca745c7fab1f23da7a632', 'credential_id': '63865454a351b56ce3cb78b3'}\n", "
" ] }, "metadata": { "application/vnd.databricks.v1+output": { "addedWidgets": {}, "arguments": {}, "data": "
{'type': 'jdbc', 'table': 'LENDING_CLUB_10K_AA_Temp', 'schema': 'SCORING', 'catalog': 'SANDBOX', 'statement_type': 'insert', 'create_table_if_not_exists': True, 'data_store_id': '614ca745c7fab1f23da7a632', 'credential_id': '63865454a351b56ce3cb78b3'}\n
", "datasetInfos": [], "metadata": {}, "removedWidgets": [], "type": "html" } }, "output_type": "display_data" } ], "source": [ "output_settings = {\n", " \"type\": \"jdbc\",\n", " \"table\": \"LENDING_CLUB_10K_AA_Temp\",\n", " \"schema\": \"SCORING\", # optional, if supported by database\n", " \"catalog\": \"SANDBOX\", # optional, if supported by database schema\n", " \"statement_type\": \"insert\",\n", " \"create_table_if_not_exists\": True,\n", " \"data_store_id\": data_store.id,\n", " \"credential_id\": cred.credential_id,\n", "}\n", "\n", "print(output_settings)\n", "\n", "# Uncomment and use the following lines for local file export:\n", "# output_settings={\n", "# 'type': 'localFile',\n", "# 'path': './predicted.csv',\n", "# }\n", "\n", "# print(output_settings)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Use the code below to retrieve the name of the deployment." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": {}, "inputWidgets": {}, "nuid": "ac8ef134-5411-4de9-b026-a004fcff6dbb", "showTitle": false, "title": "" } }, "outputs": [ { "data": { "text/html": [ "\n", "
Out[57]: 'Predict Likelihood of Loan Default'
" ] }, "metadata": { "application/vnd.databricks.v1+output": { "addedWidgets": {}, "arguments": {}, "data": "
Out[57]: 'Predict Likelihood of Loan Default'
", "datasetInfos": [], "metadata": {}, "removedWidgets": [], "type": "html" } }, "output_type": "display_data" } ], "source": [ "deployment = dr.Deployment.get(deployment_id)\n", "deployment.label" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create a schedule\n", "\n", "Next, set up a [schedule](https://datarobot-public-api-client.readthedocs-hosted.com/en/v2.25.0/entities/batch_prediction_job_definitions.html?highlight=schedule) for making predictions. The snippet below creates a schedule that makes predictions on the first day of every month at 7:59 AM." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": {}, "inputWidgets": {}, "nuid": "7a3af98c-183d-4ac4-8b03-c5df643add93", "showTitle": false, "title": "" } }, "outputs": [ { "data": { "text/html": [ "\n", "
Out[97]: {'minute': [59],\n", " 'hour': [7],\n", " 'month': ['*'],\n", " 'dayOfWeek': ['*'],\n", " 'dayOfMonth': [1]}
" ] }, "metadata": { "application/vnd.databricks.v1+output": { "addedWidgets": {}, "arguments": {}, "data": "
Out[97]: {'minute': [59],\n 'hour': [7],\n 'month': ['*'],\n 'dayOfWeek': ['*'],\n 'dayOfMonth': [1]}
", "datasetInfos": [], "metadata": {}, "removedWidgets": [], "type": "html" } }, "output_type": "display_data" } ], "source": [ "schedule = {\n", " \"minute\": [59],\n", " \"hour\": [7],\n", " \"month\": [\"*\"],\n", " \"dayOfWeek\": [\"*\"],\n", " \"dayOfMonth\": [1],\n", "}\n", "schedule" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Configure a prediction job" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": {}, "inputWidgets": {}, "nuid": "5e1f7d0a-dab4-4939-aea9-994a81bff81a", "showTitle": false, "title": "" } }, "outputs": [ { "data": { "text/html": [ "\n", "
" ] }, "metadata": { "application/vnd.databricks.v1+output": { "addedWidgets": {}, "arguments": {}, "data": "
", "datasetInfos": [], "metadata": {}, "removedWidgets": [], "type": "html" } }, "output_type": "display_data" } ], "source": [ "job = {\n", " \"deployment_id\": deployment_id,\n", " \"num_concurrent\": 4,\n", " \"intake_settings\": intake_settings,\n", " \"output_settings\": output_settings,\n", " \"passthroughColumnsSet\": \"all\",\n", "}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create a prediction job\n", "\n", "After configuring a prediction job, use the `BatchPredictionJobDefinition.create` method to create a prediction job definition based on the job and scheduled you configured above." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": {}, "inputWidgets": {}, "nuid": "f04b2387-5fdd-4ad2-a867-1f4fa65fc9b7", "showTitle": false, "title": "" } }, "outputs": [ { "data": { "text/html": [ "\n", "
Out[100]: BatchPredictionJobDefinition(63869404dd19bd0222a2ffb5)
" ] }, "metadata": { "application/vnd.databricks.v1+output": { "addedWidgets": {}, "arguments": {}, "data": "
Out[100]: BatchPredictionJobDefinition(63869404dd19bd0222a2ffb5)
", "datasetInfos": [], "metadata": {}, "removedWidgets": [], "type": "html" } }, "output_type": "display_data" } ], "source": [ "definition = dr.BatchPredictionJobDefinition.create(\n", " enabled=True, batch_prediction_job=job, name=\"Monthly Prediction Job JDBC\", schedule=schedule\n", ")\n", "definition" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lastly, the snippet below initiates the prediction job on the schedule." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": {}, "inputWidgets": {}, "nuid": "c767185f-bf08-41e3-b657-d63d0b410615", "showTitle": false, "title": "" } }, "outputs": [ { "data": { "text/html": [ "\n", "
" ] }, "metadata": { "application/vnd.databricks.v1+output": { "addedWidgets": {}, "arguments": {}, "data": "
", "datasetInfos": [], "metadata": {}, "removedWidgets": [], "type": "html" } }, "output_type": "display_data" } ], "source": [ "job_run_automatically = definition.run_on_schedule(schedule)" ] } ], "metadata": { "application/vnd.databricks.v1+notebook": { "dashboards": [], "language": "python", "notebookMetadata": { "pythonIndentUnit": 2 }, "notebookName": "Python API Scoring Test", "notebookOrigID": 4172545007333052, "widgets": {} }, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.8" } }, "nbformat": 4, "nbformat_minor": 4 }