Spark is the most popular tool for building data pipelines. Every data engineer knows Spark, blah-blah-blah… OK, but Spark is just a distributed Java Streams, right? But how does it work then? Oh, it turns out you can't just call "flatMap" or "groupBy" to a remote machine. Codegen! Interested? Come and find more!