Databricks is the leading big data analytics platform.  Based on Apache Spark, Databricks allows data engineers, data scientists and analysts to collaborate on a single SaaS platform using their preferred languages and toolkits.  

Introduction To Apache Spark
Apache Spark is an open source platform for analysing and processing large datasets. Originally developed in 2009, it is now the most widely deployed engine that is used for use cases as diverse as bioinformatics, fraud detection, web log analysis, and customer churn profiling. Parallel Processin…
From Spark To Databricks
Databricks is a commercial packaging of Spark which can be consumed as a Software As A Service. It aims to improves the user experience and reduce the operational overhead associated with using Spark directly, providing the environemnt as a cloud based environment subject to usage based billing. …
Why Databricks Is Winning In The Data & Analytics Market
Over the last 6 months, we’ve been working with Databricks for a client project. For those who aren’t aware, Databricks is a SaaS/managed version of Spark, the popular open source big data processing framework. Though we were initially sceptical about Databricks and leaned more towards DIY Spark,
What Is The Data Lakehouse Pattern?
For more than 30 years, Data Warehouses have been a central part of the business intelligence landscape. This pattern typically involved bringing structured data together from across the business into a centralised location for business intelligence reporting and analysis. For instance, banks often …
Databricks Structured Streaming Example
Spark 2 introduced the concept of structured streaming, giving users the ability to process streams of unbounded data using higher level abstractions. This is an extremely powerful capability which allows data engineers to do streaming transformations and analytics over data as it is ingested, an…