Today I would like to write a bit about last week’s Vienna Spark Meetup, which was the first of hopefully many more to come. Apache Spark is a computing framework for Big Data processing which is getting a lot of momentum lately. We at openForce recently used it for our Twitter Sentiment Analysis project which you can checkout on Github.
Bogdan Pirvu did a great job organising this event at a really nice art deco venue, the Novomatic Forum. It is owned by the sponsor of the meetup Novomatic and if you are into architecture (and not only in the context of software) you should check it out.
Bogdan managed to get Denny Lee from Databricks to do a remote talk from San Francisco. He is not only Technology Evangelist at the company behind Spark but has a ton of experience in the field of Big Data and BI. Before joining Databricks he was for instance involved in building Yahoo!’s 24TB Analysis Services cube, the largest in the world. You can find more infos about this and a lot of other interesting articles on Denny’s website.
In his talk he showed us a live demo using Databricks Notebooks for web log analysis. You can watch a video of this example here. Notebooks are an interactive workspace for exploration and visualization. The nice thing about it is that you can use multiple languages like R, Python, Scala or SQL within these notebooks and play around with the data and the Spark API. You can think of a notebook as something like a REPL on steroids, where you can easily go back, change your commands, hit enter and get the updated visualization in form of all kinds of fancy graphs.
It has to be said, that although Denny made a great effort to explain basic concepts like RDDs, Dataframes and Actions vs. Transformations his talk was for some of the less experienced audience members hard to follow. Nevertheless I think it is fair to say that Denny’s enthusiasm was infectious and will lead many of the audience to check out Spark in the future – might it be work related or just for fun.
After Denny’s talk Bogdan took over and told us about his background and intentions with the Spark meetup he initiated. He also showed us a live demo of Apache Zeppelin, which is similar to the commercial Databricks Notebook but open source and currently an incubator project on Apache. It is also a web-based notebook for interactive data analysis which although in its early stages looks very promising. Especially for organisations which don’t allow cloud services to be used this could be a viable alternative as it can be installed and run on premise or locally. Bogdan couldn’t go into details about his work at Novomatic due to obvious confidentiality reasons but it was nevertheless interesting to see that a big Austrian company is investing in this topic.
To sum up, I can say that I very much enjoyed the talks and the inspired discussions afterwards and I am looking forward to the next Spark Meetup hopefully very soon.