-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add iceberg_arrow library #6
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems reasonable to me, but if we're going to require CMake 3.25, is there anything that would simplify dependency management? Arrow has been discussing requiring this version since there are apparently new features that greatly simplify things like this.
e.g. it seems we could integrate find_package and FetchContent more naturally and not rely on the pile of code Arrow uses: https://stackoverflow.com/questions/73621403/cmake-integrating-fetchcontent-with-find-package |
That would certainly be preferrable. The "Arrow way" of handling this has been a huge time sink. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should have some minor CI before starting to push more code. I've created:
#7
to have at least a build validation.
I agree with David and Antoine that the idea of using a newer CMake is to try and simplify all the Thirdparty dependency management
That makes sense! Let me spend some time to simplify it. |
3311817
to
9f7e725
Compare
Well, it seems that I cannot make the Windows CI happy. It complains unresolved Arrow symbols during linking. I think I need a Windows PC to debug it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can continue refining the CMake setup over time. If possible it would be good to use modern features since Arrow's setup (while well tested by now) can be rather brittle
@zhjwpku Yes, I am already working on simplifying it by integrating |
Sure, thanks. Take your time :) |
- add libiceberg_arrow - use single config file with three separate targets to support components - add fetchcontent support
api/iceberg/visibility.h
Outdated
#if defined(_WIN32) || defined(__CYGWIN__) | ||
# ifdef ICEBERG_STATIC | ||
# define ICEBERG_EXPORT | ||
# else | ||
# ifdef ICEBERG_EXPORTING | ||
# define ICEBERG_EXPORT __declspec(dllexport) | ||
# else | ||
# define ICEBERG_EXPORT __declspec(dllimport) | ||
# endif | ||
# endif | ||
#else | ||
# define ICEBERG_EXPORT | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to prepare this for each shared library. (We need ICEBERG_CORE_EXPORT
/ICEBERG_PUFFIN_EXPORT
/ICEBERG_ARROW_EXPORT
.)
Because libiceberg_arrow.dll
will use exported symbols in libiceberg_core.dll
. In this case, libiceberg_arrow.dll
uses _declspec(dllexport)
for libiceberg_arrow.dll
symbols and _declspec(dllimport)
for libiceberg_core.dll
symbols.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. What if we define ICEBERG_EXPORTING
to all of libiceberg_core.dll
, libiceberg_arrow.dll
and libiceberg_puffin.dll
? I think we can simply keep the same visibility to all libraries for now because the core
library provides the API and core logic while puffin
and arrow
provide concrete implementations of the core
library.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that it causes link errors. If libiceberg_arrow.dll
uses a libiceberg_core.dll
symbol, libiceberg_arrow.dll
uses __declspec(dllexport)
-ed libiceberg_core.dll
symbols...
Can we add a simple function to libiceberg_core.dll
and use it from libiceberg_arrow.dll
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try it on my Windows PC tonight. Thanks!
EDIT: I have added a new function DemoTable::name()
in demo_table.h and called it in demo_arrow.cc. It does not have linking error. @kou
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is relevant to how we manage the public headers from different iceberg libraries. As per previous discussion, we use a separate api
folder holds all public header files from all libraries (i.e. core, puffin, arrow, etc.). IIUC, the current approach makes it difficult to manage the visibility of the header file across libraries. I'd propose to remove the current api
folder and let each library manage its own header files. WDYT? @kou @raulcd @zhjwpku @gaborkaszab @lidavidm @pitrou
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think having separate directories per library is reasonable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added a new function
DemoTable::name()
in demo_table.h and called it in demo_arrow.cc. It does not have linking error.
Hmm. It may have a real world problem because libiceberg_arrow.lib
is always used with libiceberg_core.lib
...
FYI: If we use the same ICEBERG_EXPORTING
for all libraries, libiceberg_arrow.lib
also exports symbols in libiceberg_core.dll
but libiceberg_arrow.dll
doesn't have them:
$ nm iceberg_arrow.lib | grep Demo
0000000000000000 T ??0DemoTable@iceberg@@QEAA@AEBV01@@Z
0000000000000000 I __imp_??0DemoTable@iceberg@@QEAA@AEBV01@@Z
0000000000000000 T ??0DemoTable@iceberg@@QEAA@XZ
0000000000000000 I __imp_??0DemoTable@iceberg@@QEAA@XZ
0000000000000000 T ??1DemoTable@iceberg@@UEAA@XZ
0000000000000000 I __imp_??1DemoTable@iceberg@@UEAA@XZ
0000000000000000 T ??4DemoArrow@arrow@iceberg@@QEAAAEAV012@$$QEAV012@@Z
0000000000000000 I __imp_??4DemoArrow@arrow@iceberg@@QEAAAEAV012@$$QEAV012@@Z
0000000000000000 T ??4DemoArrow@arrow@iceberg@@QEAAAEAV012@AEBV012@@Z
0000000000000000 I __imp_??4DemoArrow@arrow@iceberg@@QEAAAEAV012@AEBV012@@Z
0000000000000000 T ??4DemoTable@iceberg@@QEAAAEAV01@AEBV01@@Z
0000000000000000 I __imp_??4DemoTable@iceberg@@QEAAAEAV01@AEBV01@@Z
0000000000000000 I __imp_??_7DemoTable@iceberg@@6B@
0000000000000000 T ?name@DemoArrow@arrow@iceberg@@QEBA?AV?$basic_string_view@DU?$char_traits@D@std@@@std@@XZ
0000000000000000 I __imp_?name@DemoArrow@arrow@iceberg@@QEBA?AV?$basic_string_view@DU?$char_traits@D@std@@@std@@XZ
0000000000000000 T ?print@DemoArrow@arrow@iceberg@@QEBA?AV?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@XZ
0000000000000000 I __imp_?print@DemoArrow@arrow@iceberg@@QEBA?AV?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@XZ
If we use separated _EXPORTING
macro for each library, DemoTable
symbols aren't exported by libiceberg_arrow.dll
:
$ nm iceberg_arrow.lib | grep Demo
0000000000000000 T ??4DemoArrow@arrow@iceberg@@QEAAAEAV012@$$QEAV012@@Z
0000000000000000 I __imp_??4DemoArrow@arrow@iceberg@@QEAAAEAV012@$$QEAV012@@Z
0000000000000000 T ??4DemoArrow@arrow@iceberg@@QEAAAEAV012@AEBV012@@Z
0000000000000000 I __imp_??4DemoArrow@arrow@iceberg@@QEAAAEAV012@AEBV012@@Z
0000000000000000 T ?name@DemoArrow@arrow@iceberg@@QEBA?AV?$basic_string_view@DU?$char_traits@D@std@@@std@@XZ
0000000000000000 I __imp_?name@DemoArrow@arrow@iceberg@@QEBA?AV?$basic_string_view@DU?$char_traits@D@std@@@std@@XZ
0000000000000000 T ?print@DemoArrow@arrow@iceberg@@QEBA?AV?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@XZ
0000000000000000 I __imp_?print@DemoArrow@arrow@iceberg@@QEBA?AV?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@XZ
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd propose to remove the current
api
folder and let each library manage its own header files.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have removed the api
folder and move the header files to each library instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
cmake_modules/BuildUtils.cmake
Outdated
string(TOUPPER ${LIB_NAME} LIB_NAME_UPPER) | ||
if(BUILD_SHARED) | ||
generate_export_header(${LIB_NAME}_shared BASE_NAME ${LIB_NAME_UPPER}) | ||
target_compile_definitions(${LIB_NAME}_shared PUBLIC ${LIB_NAME}_shared_EXPORTS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove this?
It seems that this is defined automatically: https://cmake.org/cmake/help/latest/prop_tgt/DEFINE_SYMBOL.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I added this to fix a linking error. I can try to remove this again to confirm.
Update: It failed at linking again: https://github.com/apache/iceberg-cpp/actions/runs/12591221855/job/35094013478?pr=6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/apache/iceberg-cpp/actions/runs/12591221855/job/35094013478#step:3:422
demo_puffin.obj : error LNK2019: unresolved external symbol "__declspec(dllimport) public: virtual __cdecl iceberg::Puffin::~Puffin(void)" (__imp_??1Puffin@iceberg@@UEAA@XZ) referenced in function "public: virtual __cdecl iceberg::puffin::DemoPuffin::~DemoPuffin(void)" (??1DemoPuffin@puffin@iceberg@@UEAA@XZ) [D:\a\iceberg-cpp\iceberg-cpp\build\src\iceberg\puffin\iceberg_puffin_shared.vcxproj]
demo_puffin.obj : error LNK2019: unresolved external symbol "__declspec(dllimport) public: __cdecl iceberg::Puffin::Puffin(void)" (__imp_??0Puffin@iceberg@@QEAA@XZ) referenced in function "public: __cdecl iceberg::puffin::DemoPuffin::DemoPuffin(void)" (??0DemoPuffin@puffin@iceberg@@QEAA@XZ) [D:\a\iceberg-cpp\iceberg-cpp\build\src\iceberg\puffin\iceberg_puffin_shared.vcxproj]
demo_puffin.obj : error LNK2019: unresolved external symbol "__declspec(dllimport) public: __cdecl iceberg::Puffin::Puffin(class iceberg::Puffin const &)" (__imp_??0Puffin@iceberg@@QEAA@AEBV01@@Z) referenced in function "public: __cdecl iceberg::puffin::DemoPuffin::DemoPuffin(class iceberg::puffin::DemoPuffin const &)" (??0DemoPuffin@puffin@iceberg@@QEAA@AEBV012@@Z) [D:\a\iceberg-cpp\iceberg-cpp\build\src\iceberg\puffin\iceberg_puffin_shared.vcxproj]
demo_puffin.obj : error LNK2019: unresolved external symbol "__declspec(dllimport) public: class iceberg::Puffin & __cdecl iceberg::Puffin::operator=(class iceberg::Puffin const &)" (__imp_??4Puffin@iceberg@@QEAAAEAV01@AEBV01@@Z) referenced in function "public: class iceberg::puffin::DemoPuffin & __cdecl iceberg::puffin::DemoPuffin::operator=(class iceberg::puffin::DemoPuffin const &)" (??4DemoPuffin@puffin@iceberg@@QEAAAEAV012@AEBV012@@Z) [D:\a\iceberg-cpp\iceberg-cpp\build\src\iceberg\puffin\iceberg_puffin_shared.vcxproj]
OK. It's an implementation problem. We should remove this line. (Or we should use PRIVATE
not PUBLIC
.)
iceberg::Puffin
should exist in libiceberg_core.dll
not libiceberg_puffin.dll
. Can we define iceberg::Puffin
in libiceberg_core.dll
something like the following?
diff --git a/src/iceberg/demo_table.cc b/src/iceberg/demo_table.cc
index 9e46d7a..d960f1b 100644
--- a/src/iceberg/demo_table.cc
+++ b/src/iceberg/demo_table.cc
@@ -18,6 +18,7 @@
*/
#include "iceberg/demo_table.h"
+#include "iceberg/puffin.h"
namespace iceberg {
BTW, is it OK that iceberg::Puffin
exists in libiceberg_core.dll
not libiceberg_puffin.dll
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tried to use PRIVATE
and it failed: https://github.com/apache/iceberg-cpp/actions/runs/12584082126/job/35072964038
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/apache/iceberg-cpp/actions/runs/12584082126/job/35072964038#step:3:410
LINK : fatal error LNK1104: cannot open file '..\Debug\iceberg.lib' [D:\a\iceberg-cpp\iceberg-cpp\build\src\iceberg\arrow\iceberg_arrow_shared.vcxproj]
Building Custom Rule D:/a/iceberg-cpp/iceberg-cpp/src/iceberg/CMakeLists.txt
demo_table.cc
iceberg_static.vcxproj -> D:\a\iceberg-cpp\iceberg-cpp\build\src\iceberg\Debug\iceberg_static.lib
Building Custom Rule D:/a/iceberg-cpp/iceberg-cpp/src/iceberg/arrow/CMakeLists.txt
demo_arrow.cc
iceberg_arrow_static.vcxproj -> D:\a\iceberg-cpp\iceberg-cpp\build\src\iceberg\arrow\Debug\iceberg_arrow_static.lib
Building Custom Rule D:/a/iceberg-cpp/iceberg-cpp/src/iceberg/puffin/CMakeLists.txt
demo_puffin.cc
LINK : fatal error LNK1104: cannot open file '..\Debug\iceberg.lib' [D:\a\iceberg-cpp\iceberg-cpp\build\src\iceberg\puffin\iceberg_puffin_shared.vcxproj]
I think that it's a different problem. Could you share the URL of the commit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for explaining the context.
How about discussing it as a follow-up task if it's needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we can merge this now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your help!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wgtmac I think let's go with the approach that is the most convenient at this point just to get going. We can re-evaluate later on once we actually do the implementation of Puffin. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good! Let's keep the puffin library as is and revisit its design when you start the implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Co-authored-by: Sutou Kouhei <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Xuanwo! |
libiceberg_arrow
to leverage arrow-backed implementations (e.g. filesystem and parquet).libiceberg_core
tolibiceberg
.libiceberg_arrow
andlibiceberg_puffin
link tolibiceberg
.api
directory and let each library manage its own header files.