Talks & Publications

Spark+AI Summit 2018

Date: June 05, 2018
Speaker Page: link

The Kubernetes and Spark communities have put their heads together over the past year to come up with a new native scheduler for Kubernetes within Apache Spark.In this talk, we explore the exciting new things that this native Kubernetes integration makes possible with Apache Spark. We also go over the roadmap and features that the Kubernetes community has planned for the scheduler over the next several releases of Spark.

Couchbase Silicon Valley Meetup

Date: April 19, 2018
Session Page: link

Provide an overview of GKE. We’ll do a walk-through how to deploy a Couchbase cluster in GKE with Couchbase Operator. We’ll also demonstrate scaling and simulate failure scenarios to show how the operator reacts under stress, simplifying administration and operations for the user.

Google Cloud Platform Blog

Date: March 06, 2018
Blog Post: Learn to run Apache Spark natively on Google Kubernetes Engine with this tutorial

Co-author of a blog post introducing a Spark/Google Kubernetes Engine solution leveraging containers.

Apache Spark 2.3 Blog Posts

Date: March 06, 2018
kubernetes.io blog post: Apache Spark 2.3 with Native Kubernetes Support
databricks blog post: Apache Spark 2.3 with Native Kubernetes Support

Co-authored blog post introducing the technical details of Apache Spark 2.3 with Kubernetes as a top-billed feature.

SF Kubernetes Meetup: Machine Learning and AI with Kuberentes

Date: December 13, 2017
Session Page: link

Kubernetes as an application deployment platform can help set up and deploy machine learning applications, all the way from training to production. This talk describes the newly evolving ML stack built entirely on Kubernetes. It also goes into details of how one can use a combination of different tools to create a portable and powerful ML stack.

Global Big Data Conference

Date: June 11, 2017
Speaker Page: link

Stateful applications like databases, file systems and message queues, to data processing frameworks like Spark are being increasingly run on Kubernetes. This talk will focus on some Kubernetes features and constructs being worked on by the open source community to support all these classes of workloads.

Spark Summit 2017

Date: June 06, 2017
Speaker Page: link
Session Page: link

Back when Spark on Kubernetes was a nascent effort part of a separate project fork, this talk introduced the concepts underneath and explained the technical details of the community’s work.

Advanced Spark & Tensorflow Meetup

Date: Jan 19, 2017
Session Page: link

Introduction to Spark & HDFS in the context of Big Data applications in containerized environments.

SIG Big Data

Date: Jan 01, 2017 - Now
Group Page: link
YouTube: link
Meeting Notes: link

I (re)founded the SIG in Jan 2017 and incubated several projects such as Apache Spark, Apache Airflow HDFS, etc. The community in SIG Big Data deals with best practices around Big Data and ML applications in containers & Kubernetes.