{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Schedule predictions with a JDBC database\n", "\n", "Making predictions on a daily or monthly basis is a manual, time-consuming, and cumbersome process. Batch predictions are commonly used when you have to score new records over a certain frame of time (weeks, months, etc.). For example, you can use batch predictions to score new leads on a monthly basis to predict who will churn, or to predict on a daily basis which products someone is likely to purchase.\n", "\n", "This notebook outlines how to use DataRobot's Python client to schedule batch prediction jobs and write them to a JDBC database. Specifically, you will:\n", "\n", "1. Retrieve existing data stores and credential information.\n", "2. Configure prediction job specifications.\n", "3. Set up a prediction job schedule.\n", "4. Run a test prediction job and enable an automated schedule for scoring.\n", "\n", "Before proceeding, note that this workflow requires a [deployed DataRobot model](https://docs.datarobot.com/en/docs/mlops/deployment/deploy-methods/index.html) object to use for scoring and an established [data connection](https://docs.datarobot.com/en/docs/data/connect-data/data-conn.html) to read data and host prediction writeback. For more information about the Python client, reference the [documentation](https://datarobot-public-api-client.readthedocs-hosted.com)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Import libraries" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": {}, "inputWidgets": {}, "nuid": "42a73bc4-375b-4267-a7c3-d13dbfe3a732", "showTitle": false, "title": "" } }, "outputs": [], "source": [ "import datarobot as dr\n", "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Connect to DataRobot\n", "\n", "Read more about different options for [connecting to DataRobot from the client](https://docs.datarobot.com/en/docs/api/api-quickstart/api-qs.html)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": {}, "inputWidgets": {}, "nuid": "94c61bf1-2711-4faa-a9be-df126384ac8b", "showTitle": false, "title": "" } }, "outputs": [], "source": [ "# If the config file is not in the default location described in the API Quickstart guide, '~/.config/datarobot/drconfig.yaml', then you will need to call\n", "# dr.Client(config_path='path-to-drconfig.yaml')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### List data stores\n", "\n", "To enable integration with a variety of enterprise databases, DataRobot provides a “self-service” JDBC product for database connectivity setup. Once configured, you can read data from production databases for model building and predictions. This allows you to quickly train and retrain models on that data while avoiding the unnecessary step of exporting data from your enterprise database to a CSV for ingest to DataRobot. It allows access to more diverse data, which results in more accurate models.\n", "\n", "Use the cell below to query all data sources tied to a DataRobot account. The second line lists each datastore with an alphanumeric string; that is the datastore ID." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": {}, "inputWidgets": {}, "nuid": "caebb795-0f52-48da-90f3-220bbbbe1cc4", "showTitle": false, "title": "" } }, "outputs": [ { "data": { "text/html": [ "\n", "