Comparing Exploratory Data Analysis Libraries

I have tested and reviewed a few Python packages for data processing and/or exploratory data analysis (EDA). Most of these packages attempt to automate parts of the data processing and/or EDA process, or provide a suite of functions to manipulate and visualize data.

The main objective here is to review and explore Python packages that will shorten the time needed for data processing and/or exploratory data analysis.

In summary, sweetviz may be the best option under a business setting (with focus on business understanding) while autoviz/pandas-profiling are solid choices under an R&D setting (with focus on deep dive analysis).


Recommended for explorationYesNoYesNoNoYes
Recommended for productionNoNoNoNoNoNo
Ease of useYesYesYesYesYesYes
Computation speedFastMediumFastFastFastFast
Installation complexityLowLowLowLowMediumLow
Target variable-centricYesNoYesYesNoYes
Missing data checkNoYesNoNoYesYes
Per variable summary statisticsYesYesNoNoYesYes
AutoEDA focusNoYesYesNoNoYes
Table comparing autoEDA libraries for exploratory data analysis.

Unique Points of Each Library


  • Beautiful and simple visualisation that is good for business explanation. 
  • Limited features beyond simple visualisation.


  • Good for generic data quality check.
  • No option to specify target variable for tailored EDA.


  • Holistic visualisations that are good for deep dive analysis. 
  • Claims to do smart selection of plots/analyses but have yet to see the impact.


  • Generate very simple single or pair variables distribution plots.
  • Plot of interest selection is manual.
  • Charts are not integrated into notebook (as widgets only).


  • Very good as a Python-based data manipulation tool with GUI.
  • Contains a full suite of tools for data manipulation/analysis/visualisation, almost matching similar commercial data analytics tools.
  • Too much manual effort required to setup all analyses/plots required.
  • No dashboard deployment support.
  • No option to specify target variable for tailored EDA.
  • Complex dependecies on many libraries.


  • EDA module is like an extension to pandas-profiling. Very comprehensive.
  • Contains 3 modules to : collect data, explore data and clean data.
  • Potentially faster on large data due to use of Dask.
  • Possible to deep dive investigate selected columns/variables.

1 comment

Leave a comment

Your email address will not be published. Required fields are marked *