import pandas as pd
from pycaret.datasets import get_data
from autoviz.AutoViz_Class import AutoViz_Class
Imported AutoViz_Class version: 0.0.83. Call using: AV = AutoViz_Class() AV.AutoViz(filename, sep=',', depVar='', dfte=None, header=0, verbose=0, lowess=False,chart_format='svg',max_rows_analyzed=150000,max_cols_analyzed=30) Note: verbose=0 or 1 generates charts and displays them in your local Jupyter notebook. verbose=2 does not show plot but creates them and saves them in AutoViz_Plots directory in your local machine.
# Record version of key libraries
from importlib.metadata import version
print('autoviz==%s' % version('autoviz'))
autoviz==0.0.83
# Select a pre-packaged data for testing
df = get_data('diabetes', verbose=True)
Number of times pregnant | Plasma glucose concentration a 2 hours in an oral glucose tolerance test | Diastolic blood pressure (mm Hg) | Triceps skin fold thickness (mm) | 2-Hour serum insulin (mu U/ml) | Body mass index (weight in kg/(height in m)^2) | Diabetes pedigree function | Age (years) | Class variable | |
---|---|---|---|---|---|---|---|---|---|
0 | 6 | 148 | 72 | 35 | 0 | 33.6 | 0.627 | 50 | 1 |
1 | 1 | 85 | 66 | 29 | 0 | 26.6 | 0.351 | 31 | 0 |
2 | 8 | 183 | 64 | 0 | 0 | 23.3 | 0.672 | 32 | 1 |
3 | 1 | 89 | 66 | 23 | 94 | 28.1 | 0.167 | 21 | 0 |
4 | 0 | 137 | 40 | 35 | 168 | 43.1 | 2.288 | 33 | 1 |
# Generate report on data - including uni/paired variables distribution, correlation and missing values
# NOTE - set max_rows_analyzed, max_cols_analyzed for large data set
# NOTE - charts can be saved to a local directory by setting verbose=2
AV = AutoViz_Class()
report = AV.AutoViz(
filename='', # must be set if passing dataframe directly
dfte=df,
depVar='Class variable',
verbose=0,
lowess=True # only set True for data with less than 100k rows
)
Shape of your Data Set loaded: (768, 9) ############## C L A S S I F Y I N G V A R I A B L E S #################### Classifying variables in data set... Number of Numeric Columns = 2 Number of Integer-Categorical Columns = 6 Number of String-Categorical Columns = 0 Number of Factor-Categorical Columns = 0 Number of String-Boolean Columns = 0 Number of Numeric-Boolean Columns = 0 Number of Discrete String Columns = 0 Number of NLP String Columns = 0 Number of Date Time Columns = 0 Number of ID Columns = 0 Number of Columns to Delete = 0 8 Predictors classified... This does not include the Target column(s) No variables removed since no ID or low-information variables found in data set ################ Binary_Classification VISUALIZATION Started ##################### Using Lowess Smoothing. This might take a few minutes for large data sets...
Using Lowess Smoothing. This might take a few minutes for large data sets... Total Number of Scatter Plots = 3
Time to run AutoViz (in seconds) = 12.914 ###################### VISUALIZATION Completed ########################