Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Oracle AQ consumers fail to read new message after having no messages for a long time #450

Open
mkmoisen opened this issue Feb 4, 2025 · 10 comments
Labels
bug Something isn't working

Comments

@mkmoisen
Copy link

mkmoisen commented Feb 4, 2025

I have a consumer thread listening to an Oracle AQ queue in dequeue wait forever mode.

Everything works great, except for when there has been no message in the queue for an hour or so.

After adding a message to the queue, the dequeue never happens.

I know the thread is still alive and apparently attempting to dequeue, because I can see it in gv$session with the event Streams AQ: waiting for messages in the queue, and the previous sql id is pointing to a query that hits the AQ$_queuename_F table. (If I kill the app, this session is terminated).

I know there is a message in the queue, and that it has never been dequeued, which I can tell from querying the queue table.

Killing the application and restarting it results in the message being immediately dequeued, so I know there is no filter dequeue condition for example stopping it.

I was under the impression that if there was some kind of network disconnect, then Oracle would detect this, and the session would be terminated, and not be visible in gv$session.


Would anyone happen to have any ideas on what could possibly be causing this behavior or how I could debug it?

The reason I think it may be a bug is because gv$session is still populated with this thread with the event Streams AQ: waiting for messages in the queue

I have been reliably seeing this behavior.


I am doing simple, single consumer queues with everything pretty standard:

CREATE OR REPLACE NONEDITIONABLE TYPE MYQUEUE_TYPE AS OBJECT (
    foo VARCHAR2(128),
    bar VARCHAR2(128)
);

BEGIN
    DBMS_AQADM.CREATE_QUEUE_TABLE(
        queue_table => 'MYQUEUE',
        queue_payload_type => 'MYQUEUE_TYPE'
    );

    DBMS_AQADM.CREATE_QUEUE(
        queue_name => 'MYQUEUE',
        queue_table => 'MYQUEUE'
    );

    DBMS_AQADM.START_QUEUE(
        queue_name => 'MYQUEUE'
    );
END;
/
while True:
    try:
        conn = oracledb.connect(...)
     
        myqueue_type = conn.gettype('MYSCHEMA.MYQUEUE_TYPE')
     
        queue = conn.queue('MYSCHEMA.MYQUEUE', myqueue_type)

        cur = conn.cursor()
        result = cur.execute(
            '''
                select inst_id, sid, serial#
                from gv$session 
                where 1=1
                    and audsid = userenv('SESSIONID') 
                    and inst_id = userenv('INSTANCE') 
                    and sid = userenv('SID')
            '''
        )
        print(result)
        
        print('attempting dequeue')
        
        entry = queue.dequeue()
        
        print('dequeued')

        ...
     
        conn.commit()
    except Exception as ex:
        conn.rollback()
  1. What versions are you using?

Oracle 19

>>> import sys
>>> import platform
>>>
>>> print("platform.platform:", platform.platform())
platform.platform: Windows-11-10.0.22631-SP0
>>> print("sys.maxsize > 2**32:", sys.maxsize > 2**32)
sys.maxsize > 2**32: True
>>> print("platform.python_version:", platform.python_version())
platform.python_version: 3.13.0

>>> import oracledb
>>> print("oracledb.__version__:", oracledb.__version__)
oracledb.__version__: 2.5.1
  1. Is it an error or a hang or a crash?

Hang

  1. What error(s) or behavior you are seeing?

Thread that is dequeing from the queue hangs forever and does not read new messages in the queue.

  1. Does your application call init_oracle_client()?

Yes, thick mode.

  1. Include a runnable Python script that shows the problem.

Cannot be reproduced without waiting for several hours after the last message.

import oracledb
@mkmoisen mkmoisen added the bug Something isn't working label Feb 4, 2025
@cjbj
Copy link
Member

cjbj commented Feb 4, 2025

What Oracle Client and DB versions are you using?

@mkmoisen
Copy link
Author

mkmoisen commented Feb 4, 2025

@cjbj Server is Oracle 19, and the client is 19.3.

@cjbj
Copy link
Member

cjbj commented Feb 4, 2025

19.3 client? Ouch - that is nearly 6 years old. We have Instant Client 19.26 available. Can you upgrade and try again?

Regarding the DB version what does this print:

import oracledb

with oracledb.connect(user="xxxx", password="xxxx", dsn="xxxx") as connection:
    print(connection.version)

Our AQ team said they'd take a look, but having a current client version is probably the first thing they would ask for.

@mkmoisen
Copy link
Author

mkmoisen commented Feb 5, 2025

@cjbj I have downloaded Instance Client 19.25. You mentioned 19.26, but I only saw 19.25 from the download page.

I'll give it a shot and report back, thanks.

with oracledb.connect(user="xxxx", password="xxxx", dsn="xxxx") as connection:
    print(connection.version)

This is returning 19.25.0.0.0.

@cjbj
Copy link
Member

cjbj commented Feb 5, 2025

@mkmoisen let us know if you still see the issue.

@shivani-ka
Copy link

@mkmoisen I tried to reproduce the issue on 19c DB and 19.25 instant client, I got connection lost error. Do you see any error or it just hangs?

@shivani-ka
Copy link

@mkmoisen please check in alert log if the dequeue session is killed by resource manager.

@mkmoisen
Copy link
Author

mkmoisen commented Feb 6, 2025

Hi @shivani-ka Thanks for reproducing. For me, it was hanging; I did not receive connection lost error.

I am using the new client (19.25) and haven't been able to reproduce it as of yet. I will report back tomorrow, thanks for your help.

@mkmoisen
Copy link
Author

mkmoisen commented Feb 7, 2025

Hi @shivani-ka, @cjbj

I reproduced it again using OCI 19.25.

It is still hanging, and did not throw a lost connection error, unlike what @shivani-ka saw.

Interestingly, this time there is no entry in gv$session corresponding to the session.

So it seems the session disconnected from the database, but that either OCI or oracledb didn't notice it and raise an exception.

I'll keep trying to reproduce it and see if the behavior is consistent or not.

Regarding the alert log, I don't see anything in v$diag_alert_ext relating to killed sessions within the timeframe. Is this sufficient or would I need to look into the actual alert log? I would need to raise a ticket to the DBA.

Thanks very much.

@mkmoisen
Copy link
Author

mkmoisen commented Feb 7, 2025

I did see a connection lost error this morning:

  File "C:\app\Lib\site-packages\oracledb\aq.py", line 86, in deqone
    message_impl = self._impl.deq_one()
  File "src\\oracledb\\impl/thick/queue.pyx", line 83, in oracledb.thick_impl.ThickQueueImpl.deq_one
  File "src\\oracledb\\impl/thick/utils.pyx", line 456, in oracledb.thick_impl._raise_from_odpi
  File "src\\oracledb\\impl/thick/utils.pyx", line 446, in oracledb.thick_impl._raise_from_info
oracledb.exceptions.DatabaseError: DPY-4011: the database or network closed the connection
DPI-1080: connection was closed by ORA-03113
ORA-03113: end-of-file on communication channel

However last night I did not receive this, it just hung.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants