Comparing Exploratory Data Analysis Libraries

I have tested and reviewed a few Python packages for data processing and/or exploratory data analysis (EDA). Most of these packages attempt to automate parts of the data processing and/or EDA process, or provide a suite of functions to manipulate and visualize data.

The main objective here is to review and explore Python packages that will shorten the time needed for data processing and/or exploratory data analysis.

In summary, sweetviz may be the best option under a business setting (with focus on business understanding) while autoviz/pandas-profiling are solid choices under an R&D setting (with focus on deep dive analysis).

Comparison

Package	sweetviz	pandas-profiling	autoviz	lux-api	dtale	dataprep
Version	2.1.3	3.0.0	0.0.83	0.3.2	1.56.0	0.3.0
Recommended for exploration	Yes	No	Yes	No	No	Yes
Recommended for production	No	No	No	No	No	No
Ease of use	Yes	Yes	Yes	Yes	Yes	Yes
Computation speed	Fast	Medium	Fast	Fast	Fast	Fast
Installation complexity	Low	Low	Low	Low	Medium	Low
Target variable-centric	Yes	No	Yes	Yes	No	Yes
Missing data check	No	Yes	No	No	Yes	Yes
Per variable summary statistics	Yes	Yes	No	No	Yes	Yes
AutoEDA focus	No	Yes	Yes	No	No	Yes
Score	4	3	2	2	1	1

Table comparing autoEDA libraries for exploratory data analysis.

Unique Points of Each Library

sweetviz

Beautiful and simple visualisation that is good for business explanation.
Limited features beyond simple visualisation.

View notebook

pandas-profiling

Good for generic data quality check.
No option to specify target variable for tailored EDA.

View notebook

autoviz

Holistic visualisations that are good for deep dive analysis.
Claims to do smart selection of plots/analyses but have yet to see the impact.

View notebook

lux-api

Generate very simple single or pair variables distribution plots.
Plot of interest selection is manual.
Charts are not integrated into notebook (as widgets only).

View notebook

dtale

Very good as a Python-based data manipulation tool with GUI.
Contains a full suite of tools for data manipulation/analysis/visualisation, almost matching similar commercial data analytics tools.
Too much manual effort required to setup all analyses/plots required.
No dashboard deployment support.
No option to specify target variable for tailored EDA.
Complex dependecies on many libraries.

View notebook

dataprep

EDA module is like an extension to pandas-profiling. Very comprehensive.
Contains 3 modules to : collect data, explore data and clean data.
Potentially faster on large data due to use of Dask.
Possible to deep dive investigate selected columns/variables.

View notebook

Comparing Exploratory Data Analysis Libraries

Comparison

Unique Points of Each Library

sweetviz

pandas-profiling

autoviz

lux-api

dtale

dataprep

Related

1 comment

Leave a comment Cancel reply