Talk

Spark magic: How high-level pipelines become distributed hardcore

In RussianComplexity -Hardcore. Really hard and demanding talk, you'll understand only if you're an experienced engineer.
Presentation https

Spark is the most popular tool for building data pipelines. Every data engineer knows Spark, blah-blah-blah… OK, but Spark is just a distributed Java Streams, right? But how does it work then? Oh, it turns out you can't just call "flatMap" or "groupBy" to a remote machine. Codegen! Interested? Come and find more!

  • #big data
  • #codegen
  • #kotlin

Speakers

Invited experts

Schedule