Practical Spark

Updated 4 months ago

In the early days of the data analysis, statisticians, mathematicians and physicist are the most wanted people in the field. But then comes the age of big data, where engineers start to become useful in the fields of machine learning and data analysis. The reason is this:

**In the field of data analysis, it is hard to predict the result of our action. Moreover, we often can't even decide on our action/method. Engineers have the power of implementing an algorithm in a quicker and more efficient way, and just try it out without understanding it much, then further improve on the method based on the result.**

The cycle of

- exploratory analysis and forming hypothesis
- write codes to test hypothesis
- Test
- Analyze and improve

has proven to be more effective then doing a complete and thorough analysis in the beginning. Mathematician's lack of programming skill made them only useful in stage 1 and 4.

That being said, the world is changing with the trend in functional programming paradigm, and the gradually mature industrialization of functional programming platforms.

**With functional programming, mathematicians are now able to write codes that more intuitively reflects their thought process. And this kind of coding style happens to be more easily parallelized in practice. And the idea in algebra, computation theory became more and more practical when we write codes this way.**

I've always consider myself lucky born in this age. Being a mathematics enthusiasts, if I was born 15 years earlier, I probably need to choose between "holding on to my mathematical principles and write beautiful codes but stay in academics and make little money" or "writing ugly but fast codes and make a lot of money." But now I can write code with (mathematically) better design, and really gain better performance, how amazing!

This is precisely why I decide to write this book, focusing on introducing the practical side of the spark system to readers and gradually introduce how we benefits from functional programming paradigm in the growing data analysis world.