PySpark : Combining Machine Learning & Big Data
Data Engineering
The amount of data generated by IoT, smart devices, cloud applications, and social media is growing exponentially. You need ways to easily and cost-effectively analyze all of this data with minimal time-to-insight, regardless of the format or where the data is stored. In this session, I introduce the Amazon Redshift lake house architecture which enables you to query data across your data warehouse, data lake, and operational databases to gain faster and deeper insights. With a lake house architecture, you can store data in open file formats in your Amazon S3 data lake. This allows you to make this data available easily to other analytics and machine learning tools rather than locking it in a new silo.