A closer look: Exploratory Data Analysis with Spark and IntelliJ IDEA

Data

A typical workflow of a Data Scientist involves some level of exploratory data analysis. If you're using Python when working with your data, you are probably quite familiar with packages like pandas, matplotlib, seaborn and others that help you get the initial familiarity with the data and understand what are the best approaches for your next step. Switching from pandas to Spark - how do you explore your data? How do you visualise it? How do you understand it better before crafting your Spark jobs? In this talk I'll take a dataset and will guide you through the numerous ways you can explore your data with Spark and a new plugin for IntelliJ IDEA.