Posts

Is Apache Hudi (Uber Hoodie) a Game Changer for BigData Workloads?

Image
Motivation Companies like Uber scan petabyte scale of data after every few hours and getting this done in timely manner has been a challenge for Uber and BigData community for years.  Lambda architecture  promised a way of processing massive quantities of data (i.e. “Big Data”) and providing access to batch-processing and stream-processing methods with a hybrid approach. Figure 1: Lambda architecture requires double compute and double serving. However, the fundamental tradeoff between data ingest latency, scan performance, and compute resources and operational complexity remained unavoidable.  (Note: many  other architectures  tried addressing fundamental problem of BigData workloads as well) But for workloads that can tolerate latencies of about 10 minutes, there is no need for a separate “speed” serving layer if there is a faster way to ingest and prepare data in HDFS. This unifies the serving layer and reduces the overall complexity and resour...