Hi Shiyan Xu, great post, can't wait to read the next one!
Do you plan to explore the cost savings/overhead associated to Hudi?
I am wondering if CoW and MoR, in cloud environment, entail raise in cost for storage and processing. A huge question made when I propose Hudi as a solution is how is it different from parquet in terms of cost on HDFS like file systems in cloud enviroment, considered that you can have multiple snapshot of a table, you need to merge/copy files and so on. I see a lot of comparison between delta, hudi and iceberg in terms of performances but cost aren't explored at all.
Imho it is importanto to explore costs also because often the goal is to replace a DWH with a cheaper solution, if any.
Hi Nicola, thanks! yes cost is an important topic. For storage cost, Hudi has clean table service (which i'll cover in future) to bound the versions kept on storage. I won't see versioning as storage overhead since it's needed to unlock time travel and disaster recovery capabilities. For processing cost, efficient incremental processing is a key cost-saving strategy. I don't have numbers by hand but I would refer you to this blog for some insights https://www.uber.com/blog/ubers-lakehouse-architecture/
are you going to upload posts from 5-10th?
yes :)
Hi Shiyan Xu, great post, can't wait to read the next one!
Do you plan to explore the cost savings/overhead associated to Hudi?
I am wondering if CoW and MoR, in cloud environment, entail raise in cost for storage and processing. A huge question made when I propose Hudi as a solution is how is it different from parquet in terms of cost on HDFS like file systems in cloud enviroment, considered that you can have multiple snapshot of a table, you need to merge/copy files and so on. I see a lot of comparison between delta, hudi and iceberg in terms of performances but cost aren't explored at all.
Imho it is importanto to explore costs also because often the goal is to replace a DWH with a cheaper solution, if any.
Thanks
Hi Nicola, thanks! yes cost is an important topic. For storage cost, Hudi has clean table service (which i'll cover in future) to bound the versions kept on storage. I won't see versioning as storage overhead since it's needed to unlock time travel and disaster recovery capabilities. For processing cost, efficient incremental processing is a key cost-saving strategy. I don't have numbers by hand but I would refer you to this blog for some insights https://www.uber.com/blog/ubers-lakehouse-architecture/