[Feature] Paimon Spark 2025 Roadmap #4816

Zouxxyy · 2025-01-02T04:48:07Z

Motivation

2025 has arrived, and we would like to thank everyone for the contributions in the past! Here we present the 2025 Paimon Spark roadmap, and welcome to take ownership of them or expand upon them!

Name	Introduction	Link
Variant Type	[feat] Support Variant Type, unlocking support for semi-structured data.	#4471
Optimized Write	[perf] Optimize table writing, including automatic repartitioning and rebalancing data and so on.
Distributed Planning	[perf] Support distributed planning in the scan phase.	#4864
Dataframe Writer V2	[feat] Integrate Spark's Dataframe Writer V2.
Liquid Clustering	[perf] Support liquid clustering.	#4815
Isolation Level	[feat] Transaction isolation that supports more levels, like serializable isolation level.	#4616
Support For Spark Connect	[feat] Support Spark Connect, calling "Paimon Connect".
Default Value	[feat] Support default values for specified fields.
Constraints	[feat] Support adding constraints to fields, such as not null or other custom constraints.
Partition Stats	[feat] Support partition stats.
Row Lineage	[feat] Support tracking row lineage.
Identity Column	[feat] When no explicit values are provided during writing, generate unique values for identity column.
Generated Columns	[feat] Support generated columns whose values are automatically generated based on a user-specified function over other columns.
CDC For Non-PK Table	[feat] Support CDC for non-pk table.

YannByron · 2025-01-02T06:33:30Z

Any other features or requirements that you would like to have, please comment here. Then we can discuss and modify this roadmap together. Thanks.

YannByron · 2025-01-02T06:35:26Z

And if someone want to take one or some, take it and let us know.

Aiden-Dong · 2025-01-08T08:05:03Z

Can I take on this task?

Distributed Planning : [perf] Support distributed planning in the scan phase.

Zouxxyy · 2025-01-08T08:16:59Z

@Aiden-Dong Yes, feel free for it, you can create an issue for it, additionally, this feature actually requires changes in the core, and then each compute engine will need to support it.

Aiden-Dong · 2025-01-08T08:21:56Z

@Aiden-Dong Yes, feel free for it, you can create an issue for it, additionally, this feature actually requires changes in the core, and then each compute engine will need to support it.

Yes, I understand that we need to extend the functionality of AbstractFileStoreScan.readAndMergeFileEntries.

Aiden-Dong · 2025-01-08T08:35:37Z

#4864

zhongyujiang · 2025-01-09T08:32:24Z

@Zouxxyy Thank you for raising this, these optimizations are all highly anticipated!

[feat] Integrate Spark's DataFrame V2 API.

If no one has worked on this, I would like to volunteer to take it on. We are currently endeavoring to enhance write performance by utilizing the V2 write RequiresDistributionAndOrdering. In fact, I am on the verge of completing a MVP version locally.

YannByron · 2025-01-09T10:00:02Z

@Zouxxyy Thank you for raising this, these optimizations are all highly anticipated!

[feat] Integrate Spark's DataFrame V2 API.

If no one has worked on this, I would like to volunteer to take it on. We are currently endeavoring to enhance write performance by utilizing the V2 write RequiresDistributionAndOrdering. In fact, I am on the verge of completing a MVP version locally.

So glad if you can take it on. I just want to remind you that be aware of the support for scenarios with different bucket modes, especially dynamic bucket mode in your implementation. This is why we compromised to use V1 write at first.

zhongyujiang · 2025-01-09T11:19:33Z

especially dynamic bucket mode in your implementation

Yeah, I haven't found a easy way to support this yet. In fact, I've only implemented V2 write for the fixed bucket mode. I think we can first let the unsupported bucket modes fall back to V1 write.

Zouxxyy added the enhancement New feature or request label Jan 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Paimon Spark 2025 Roadmap #4816

[Feature] Paimon Spark 2025 Roadmap #4816

Zouxxyy commented Jan 2, 2025 •

edited

Loading

YannByron commented Jan 2, 2025

YannByron commented Jan 2, 2025

Aiden-Dong commented Jan 8, 2025

Zouxxyy commented Jan 8, 2025

Aiden-Dong commented Jan 8, 2025

Aiden-Dong commented Jan 8, 2025

zhongyujiang commented Jan 9, 2025

YannByron commented Jan 9, 2025

zhongyujiang commented Jan 9, 2025

[Feature] Paimon Spark 2025 Roadmap #4816

[Feature] Paimon Spark 2025 Roadmap #4816

Comments

Zouxxyy commented Jan 2, 2025 • edited Loading

Motivation

YannByron commented Jan 2, 2025

YannByron commented Jan 2, 2025

Aiden-Dong commented Jan 8, 2025

Zouxxyy commented Jan 8, 2025

Aiden-Dong commented Jan 8, 2025

Aiden-Dong commented Jan 8, 2025

zhongyujiang commented Jan 9, 2025

YannByron commented Jan 9, 2025

zhongyujiang commented Jan 9, 2025

Zouxxyy commented Jan 2, 2025 •

edited

Loading