You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
With set prefer_olap = 'on' we observe process crashes in running TPC-H benchmark queries (for instance Q2) already at scale factor 10 in parallel with more than 10 clients on a single coordinator. The time until occurance of a crash strongly reduces with the number of clients. With more than 200 we observe them already after a few seconds. (If useful, we can provide you directly with scripts to reproduce this issue.)
It seems that memory gets corrupted. During a crash, always the first element of the memory freelist points to a non-accessible region (here to 0x10):
This results in a SIGSEV in the memory allocation.
Stack trace:
#0 AllocSetAlloc (context=0x238ef18, size=16) at aset.c:707
#1 0x0000000000990f78 in palloc (size=size@entry=16) at mcxt.c:935
#2 0x0000000000724bb4 in new_list (type=type@entry=T_IntList) at list.c:68
#3 0x0000000000724d45 in lappend_int (list=list@entry=0x0, datum=4) at list.c:151
#4 0x0000000000677d56 in ExecInitQual (qual=<optimized out>, parent=parent@entry=0x24d0378) at execExpr.c:206
#5 0x000000000069d432 in ExecInitIndexScan (node=node@entry=0x24151a0, estate=estate@entry=0x23f65a0, eflags=eflags@entry=1) at nodeIndexscan.c:931
#6 0x0000000000684f76 in ExecInitNode (node=0x24151a0, estate=estate@entry=0x23f65a0, eflags=1) at execProcnode.c:225
#7 0x00000000006a6418 in ExecInitNestLoop (node=node@entry=0x2414620, estate=estate@entry=0x23f65a0, eflags=<optimized out>, eflags@entry=1)
atnodeNestloop.c:338#8 0x00000000006850aa in ExecInitNode (node=0x2414620, estate=estate@entry=0x23f65a0, eflags=eflags@entry=1) at execProcnode.c:298
#9 0x00000000006a63f6 in ExecInitNestLoop (node=node@entry=0x2414190, estate=estate@entry=0x23f65a0, eflags=eflags@entry=1) at nodeNestloop.c:333
#10 0x00000000006850aa in ExecInitNode (node=0x2414190, estate=estate@entry=0x23f65a0, eflags=eflags@entry=1) at execProcnode.c:298
#11 0x00000000006a63f6 in ExecInitNestLoop (node=node@entry=0x24132d8, estate=estate@entry=0x23f65a0, eflags=eflags@entry=1) at nodeNestloop.c:333
#12 0x00000000006850aa in ExecInitNode (node=0x24132d8, estate=estate@entry=0x23f65a0, eflags=eflags@entry=1) at execProcnode.c:298
#13 0x000000000069116b in ExecInitAgg (node=node@entry=0x24131c0, estate=estate@entry=0x23f65a0, eflags=eflags@entry=1) at nodeAgg.c:3911
#14 0x000000000068512e in ExecInitNode (node=0x24131c0, estate=estate@entry=0x23f65a0, eflags=eflags@entry=1) at execProcnode.c:331
#15 0x00000000006df01a in ExecShutdownRemoteSubplan (node=node@entry=0x23f71d0) at execRemote.c:11373
#16 0x0000000000684e11 in ExecShutdownNode (node=0x23f71d0) at execProcnode.c:873
#17 0x00000000007247cf in planstate_tree_walker (planstate=planstate@entry=0x23f6bc8, walker=walker@entry=0x684d9d <ExecShutdownNode>,
context=context@entry=0x0) atnodeFuncs.c:3784#18 0x0000000000684dc5 in ExecShutdownNode (node=0x23f6bc8) at execProcnode.c:856
#19 0x00000000007205b6 in planstate_walk_subplans (plans=<optimized out>, walker=walker@entry=0x684d9d <ExecShutdownNode>, context=context@entry=0x0)
atnodeFuncs.c:3864#20 0x0000000000724837 in planstate_tree_walker (planstate=planstate@entry=0x245f370, walker=walker@entry=0x684d9d <ExecShutdownNode>,
context=context@entry=0x0) atnodeFuncs.c:3844#21 0x0000000000684dc5 in ExecShutdownNode (node=0x245f370) at execProcnode.c:856
#22 0x00000000007247cf in planstate_tree_walker (planstate=planstate@entry=0x245eed8, walker=walker@entry=0x684d9d <ExecShutdownNode>,
context=context@entry=0x0) atnodeFuncs.c:3784#23 0x0000000000684dc5 in ExecShutdownNode (node=node@entry=0x245eed8) at execProcnode.c:856
#24 0x000000000067ee42 in ExecutePlan (estate=estate@entry=0x23f65a0, planstate=0x245eed8, use_parallel_mode=<optimized out>,
operation=operation@entry=CMD_SELECT, sendTuples=sendTuples@entry=1'\001', numberTuples=numberTuples@entry=0, direction=ForwardScanDirection,
dest=0x22c3948, execute_once=1'\001') atexecMain.c:2063#25 0x000000000067f0a9 in standard_ExecutorRun (queryDesc=0x2313a50, direction=ForwardScanDirection, count=0, execute_once=<optimized out>) at execMain.c:466
#26 0x000000000067f163 in ExecutorRun (queryDesc=queryDesc@entry=0x2313a50, direction=direction@entry=ForwardScanDirection, count=count@entry=0,
execute_once=<optimizedout>) atexecMain.c:409#27 0x0000000000861ed9 in PortalRunSelect (portal=portal@entry=0x2260510, forward=forward@entry=1 '\001', count=0, count@entry=9223372036854775807,
dest=dest@entry=0x22c3948) atpquery.c:1722#28 0x000000000086438a in PortalRun (portal=portal@entry=0x2260510, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=1 '\001',
run_once=<optimizedout>, dest=dest@entry=0x22c3948, altdest=altdest@entry=0x22c3948, completionTag=0x7ffe6a2b6b50 "") at pquery.c:1362
#29 0x000000000085fb15 in exec_execute_message (portal_name=portal_name@entry=0x22c3530 "p_1_1dfd6c_2_79f38aea", max_rows=9223372036854775807,
max_rows@entry=0) atpostgres.c:3065#30 0x0000000000860c65 in PostgresMain (argc=<optimized out>, argv=argv@entry=0x20853d0, dbname=<optimized out>, username=<optimized out>) at postgres.c:5645
#31 0x00000000007d3a48 in BackendRun (port=port@entry=0x20fb6b0) at postmaster.c:5034
#32 0x00000000007d5b3f in BackendStartup (port=port@entry=0x20fb6b0) at postmaster.c:4706
#33 0x00000000007d5d41 in ServerLoop () at postmaster.c:1963
#34 0x00000000007d7058 in PostmasterMain (argc=argc@entry=5, argv=argv@entry=0x20835a0) at postmaster.c:1571
#35 0x000000000072052f in main (argc=5, argv=0x20835a0) at main.c:233
The database itself throws error messages like this:
With
set prefer_olap = 'on'
we observe process crashes in running TPC-H benchmark queries (for instance Q2) already at scale factor 10 in parallel with more than 10 clients on a single coordinator. The time until occurance of a crash strongly reduces with the number of clients. With more than 200 we observe them already after a few seconds. (If useful, we can provide you directly with scripts to reproduce this issue.)It seems that memory gets corrupted. During a crash, always the first element of the memory freelist points to a non-accessible region (here to 0x10):
This results in a SIGSEV in the memory allocation.
Stack trace:
The database itself throws error messages like this:
The text was updated successfully, but these errors were encountered: