How to tune Spark performance for ML needs

Day 2 /  / Track 4  /  RU

When speaking about machine learning on large data volumes, Apache Spark is a popular solution. While coding on Spark is pretty easy, to make performance of your application higher you need to understand not only Spark internals, but also what data and in what volumes you are dealing with. Artem will tell about a set of methods tried on a "live" project, which helped make execution time of some jobs 5-20 times better.

Download presentation
Artem Shutak
Artem Shutak
Grid Dynamics

Senior software engineer at Grid Dynamics, specializes in big data processing and analysis. He used to be a full-time Apache Ignite contributor, that's why he knows how distributed systems work under the hood. Actively interested in machine learning.