{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "# Build a Model Factory\n", "\n", "A model factory is a system or set of procedures that automatically generate predictive models with little to no human intervention. Model factories can have multiple layers of complexity, called modules. One module may train models while others can deploy or retrain models. In this example of a model factory, you set up projects and start them in a _parallel_ loop. This allows you to start all projects simultaneously, without unexpected errors.\n", "\n", "Consider a scenario where you have 20,000 SKUs and you need to do sales forecasting for each one of them. Or, you may have multiple types of customers and you are trying to predict which types will churn.\n", "\n", "* Can one model handle the high dimensionality that comes with these problems?\n", "* Is a single model family able to address the scope of these problems?\n", "* Is one preprocessing method sufficient?\n", "\n", "In this example, use DataRobot to build a single project with the readmitted dataset to predict the probability that a hospital patient may be readmitted after discharge. Then, you will build multiple projects with the `admission id` feature as the target and find the best model for unique value for `admission id`. Lastly, you will prepare the selected models for deployment." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### Import Libraries" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "from time import sleep\n", "\n", "from dask import compute, delayed # For parallelization\n", "import datarobot as dr # Requires version >2.19\n", "import matplotlib.pyplot as plt\n", "import pandas as pd\n", "import seaborn as sns\n", "\n", "sns.set(style=\"whitegrid\")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### Import data\n", "\n", "Download the sample dataset [here](10k-diabetes.csv)." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "data_path = \"https://docs.datarobot.com/en/docs/api/guide/python/10k-diabetes.csv\"\n", "\n", "df = pd.read_csv(data_path)" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | race | \n", "gender | \n", "age | \n", "weight | \n", "admission_type_id | \n", "discharge_disposition_id | \n", "admission_source_id | \n", "time_in_hospital | \n", "payer_code | \n", "medical_specialty | \n", "... | \n", "glipizide_metformin | \n", "glimepiride_pioglitazone | \n", "metformin_rosiglitazone | \n", "metformin_pioglitazone | \n", "change | \n", "diabetesMed | \n", "readmitted | \n", "diag_1_desc | \n", "diag_2_desc | \n", "diag_3_desc | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "Caucasian | \n", "Female | \n", "[50-60) | \n", "? | \n", "Elective | \n", "Discharged to home | \n", "Physician Referral | \n", "1 | \n", "CP | \n", "Surgery-Neuro | \n", "... | \n", "No | \n", "No | \n", "No | \n", "No | \n", "No | \n", "No | \n", "False | \n", "Spinal stenosis in cervical region | \n", "Spinal stenosis in cervical region | \n", "Effusion of joint, site unspecified | \n", "
1 | \n", "Caucasian | \n", "Female | \n", "[20-30) | \n", "[50-75) | \n", "Urgent | \n", "Discharged to home | \n", "Physician Referral | \n", "2 | \n", "UN | \n", "? | \n", "... | \n", "No | \n", "No | \n", "No | \n", "No | \n", "No | \n", "No | \n", "False | \n", "First-degree perineal laceration, unspecified ... | \n", "Diabetes mellitus of mother, complicating preg... | \n", "Sideroblastic anemia | \n", "
2 | \n", "Caucasian | \n", "Male | \n", "[80-90) | \n", "? | \n", "Not Available | \n", "Discharged/transferred to home with home healt... | \n", "NaN | \n", "7 | \n", "MC | \n", "Family/GeneralPractice | \n", "... | \n", "No | \n", "No | \n", "No | \n", "No | \n", "No | \n", "Yes | \n", "True | \n", "Pneumococcal pneumonia [Streptococcus pneumoni... | \n", "Congestive heart failure, unspecified | \n", "Hyperosmolality and/or hypernatremia | \n", "
3 | \n", "AfricanAmerican | \n", "Female | \n", "[50-60) | \n", "? | \n", "Emergency | \n", "Discharged to home | \n", "Transfer from another health care facility | \n", "4 | \n", "UN | \n", "? | \n", "... | \n", "No | \n", "No | \n", "No | \n", "No | \n", "No | \n", "Yes | \n", "False | \n", "Cellulitis and abscess of face | \n", "Streptococcus infection in conditions classifi... | \n", "Diabetes mellitus without mention of complicat... | \n", "
4 | \n", "AfricanAmerican | \n", "Female | \n", "[50-60) | \n", "? | \n", "Emergency | \n", "Discharged to home | \n", "Emergency Room | \n", "5 | \n", "? | \n", "Psychiatry | \n", "... | \n", "No | \n", "No | \n", "No | \n", "No | \n", "Ch | \n", "Yes | \n", "False | \n", "Bipolar I disorder, single manic episode, unsp... | \n", "Diabetes mellitus without mention of complicat... | \n", "Depressive type psychosis | \n", "
5 rows × 51 columns
\n", "