Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CRITICAL message is issued when UNDO tablespace usage % is negative #44

Open
philippe-ebersohl opened this issue May 17, 2024 · 2 comments

Comments

@philippe-ebersohl
Copy link

The messages occurs regularly for the UNDO tablespace, always on the same server.
Should negative values be returned ?

CRITICAL - tbs UNDO01 usage is -0.00% | 'tbs_undo01_usage_pct'=-0.00%;80;90 'tbs_undo01_usage'=0MB;52428;58982;0;65535 'tbs_undo01_alloc'=51199MB;;;0;65535

CRITICAL - tbs UNDO01 usage is -0.02% | 'tbs_undo01_usage_pct'=-0.02%;80;90 'tbs_undo01_usage'=-9MB;52428;58982;0;65535 'tbs_undo01_alloc'=51199MB;;;0;65535

OK - no problems | 'tbs_undo01_usage_pct'=0.00%;80;90 'tbs_undo01_usage'=1MB;52428;58982;0;65535

@philippe-ebersohl philippe-ebersohl changed the title CRITICAL message is issued when tablespace usage % is negative CRITICAL message is issued when UNDO tablespace usage % is negative Sep 5, 2024
Copy link

codeautopilot bot commented Sep 5, 2024

Potential solution

The plan to solve the bug involves addressing the calculation logic for tablespace usage in the Tablespace.pm file and adding validation checks to ensure that negative values are not returned. The issue arises due to the lack of validation for bytes_free and the calculation of percent_used, which can result in negative values if bytes_free exceeds bytes or bytes_max.

What is causing this bug?

The bug is caused by the calculation logic in the Tablespace.pm file, specifically in the init method. The formula used to calculate percent_used does not account for cases where bytes_free might be greater than bytes or bytes_max. This can happen due to anomalies in the database or incorrect reporting of bytes_free. As a result, the calculated percent_used can be negative, leading to the reported issue.

Code

To fix the bug, we need to modify the init method in the Tablespace.pm file to include validation checks and adjust the calculation logic for percent_used. Here are the specific changes:

if ($params{mode} =~ /server::[c]*database::tablespace::(usage|free)/) {
    if (! defined $self->{bytes_max} || $self->{bytes_max} eq '') { 
        $self->{bytes} = 0;
        $self->{bytes_max} = 0;
        $self->{bytes_free} = 0;
        $self->{percent_used} = 0;
        $self->{real_bytes_max} = $self->{bytes};
        $self->{real_bytes_free} = $self->{bytes_free};
    } else {
        if ($params{calcmeth} eq "classic") {
            if ($self->{bytes_max} == 0) { 
                $self->{percent_used} =
                    ($self->{bytes} - $self->{bytes_free}) / $self->{bytes} * 100;
                $self->{real_bytes_max} = $self->{bytes};
                $self->{real_bytes_free} = $self->{bytes_free};
            } elsif ($self->{bytes_max} > $self->{bytes}) {
                $self->{percent_used} =
                    ($self->{bytes} - $self->{bytes_free}) / $self->{bytes_max} * 100;
                $self->{real_bytes_max} = $self->{bytes_max};    
                $self->{real_bytes_free} = $self->{bytes_free} + ($self->{bytes_max} - $self->{bytes});
            } else {
                $self->{percent_used} =
                    ($self->{bytes} - $self->{bytes_free}) / $self->{bytes} * 100;
                $self->{real_bytes_max} = $self->{bytes};
                $self->{real_bytes_free} = $self->{bytes_free};
            }
        } elseif ($params{calcmeth} eq "vigna") {
            // Vigna calculation logic here
        }
    }
    // Ensure percent_used is not negative
    if ($self->{percent_used} < 0) {
        $self->{percent_used} = 0;
    }
    $self->{percent_free} = 100 - $self->{percent_used};
    my $tlen = 20;
    my $len = int((($params{mode} =~ /server::[c]*database::tablespace::usage/) ?
        $self->{percent_used} : $self->{percent_free} / 100 * $tlen) + 0.5);
}

How to replicate the bug

To replicate the bug, follow these steps:

  1. Set up a Nagios monitoring environment with the check_oracle_health.pl script.
  2. Configure the script to monitor an Oracle database with an UNDO tablespace.
  3. Ensure that the UNDO tablespace has conditions where bytes_free might exceed bytes or bytes_max.
  4. Run the script and observe the output. The bug will manifest as negative values for the UNDO tablespace usage percentage.

By implementing the recommended changes, the script will correctly handle cases where bytes_free exceeds bytes or bytes_max, preventing negative values from being returned. This will ensure the reliability of the monitoring system and address the reported issue.

Click here to create a Pull Request with the proposed solution

Files used for this task:

Changes on plugins-scripts/check_oracle_health.pl

Analysis Report for check_oracle_health.pl

Overview

The check_oracle_health.pl script is a Nagios plugin designed to monitor various parameters of Oracle databases. It supports multiple modes, including checking tablespace usage, which is relevant to the reported bug.

Issue Summary

The user reported that the script sometimes returns negative values for the UNDO tablespace usage percentage, which should not happen. The issue is critical as it affects the reliability of the monitoring system.

Key Points in the Script

  1. Modes and Parameters: The script supports various modes, including tablespace-usage, which is likely the mode in question.
  2. Command-line Options: The script parses numerous command-line options to configure its behavior.
  3. Server Initialization: The script initializes a DBD::Oracle::Server object with the parsed parameters and calls its nagios and calculate_result methods to perform the checks and generate the output.

Potential Areas of Concern

  1. Parameter Handling: The script handles many parameters, and any incorrect handling could lead to unexpected results.
  2. Mode Implementation: The specific logic for each mode, including tablespace-usage, is implemented in the DBD::Oracle::Server module, which is not part of this file but is crucial for understanding the issue.
  3. Output Formatting: The script formats the output based on the results from the DBD::Oracle::Server object, which includes the usage percentage.

Recommendations

  1. Review DBD::Oracle::Server Logic: The logic for calculating tablespace usage is likely implemented in the DBD::Oracle::Server module, specifically in the Tablespace.pm file. This logic should be reviewed and corrected to ensure that negative values are not returned.
  2. Validation Checks: Implement additional validation checks in this script to ensure that the usage percentage is non-negative before printing the output.

Next Steps

  1. Review and Correct Tablespace.pm: Ensure that the logic for calculating tablespace usage in Tablespace.pm is correct and handles edge cases appropriately.
  2. Implement Validation in check_oracle_health.pl: Add validation checks in this script to handle any negative values returned by the DBD::Oracle::Server object.

Conclusion

The primary cause of the bug is likely in the logic for calculating tablespace usage in the Tablespace.pm file. This script should also include validation checks to ensure that negative values are not reported. Further investigation and corrections should be made in the Tablespace.pm file, followed by implementing necessary validation in this script.

Changes on plugins-scripts/Nagios/DBD/Oracle/Server/Database/Tablespace.pm

Bug Analysis Report

File: plugins-scripts/Nagios/DBD/Oracle/Server/Database/Tablespace.pm

Overview

This file is responsible for managing Oracle tablespace information, including calculating and reporting tablespace usage. The issue reported involves negative values being returned for the UNDO tablespace usage, which should not occur.

Potential Causes of the Bug

  1. Calculation of percent_used:

    • The calculation of percent_used in the init method could lead to negative values if bytes_free exceeds bytes or bytes_max.
    • Specifically, the formula ($self->{bytes} - $self->{bytes_free}) / $self->{bytes_max} * 100 can result in negative percentages if bytes_free is greater than bytes.
  2. Handling of bytes_free and bytes_max:

    • There is no explicit check to ensure that bytes_free does not exceed bytes or bytes_max.
    • The logic does not account for edge cases where bytes_free might be incorrectly reported or calculated.
  3. SQL Queries:

    • The SQL queries used to fetch tablespace data might return unexpected values, especially for bytes_free. This could be due to expired extents or other anomalies in the database.

Recommendations for Fixes

  1. Add Validation Checks:

    • Before calculating percent_used, ensure that bytes_free does not exceed bytes or bytes_max.
    • If such a condition is detected, set percent_used to 0 or another appropriate value.
  2. Adjust Calculation Logic:

    • Modify the calculation logic to handle cases where bytes_free might be greater than bytes or bytes_max.
    • Example:
      if ($self->{bytes_free} > $self->{bytes}) {
          $self->{bytes_free} = $self->{bytes};
      }
      if ($self->{bytes_free} > $self->{bytes_max}) {
          $self->{bytes_free} = $self->{bytes_max};
      }
  3. Review SQL Queries:

    • Ensure that the SQL queries used to fetch tablespace data are accurate and handle edge cases properly.
    • Specifically, review the queries related to dba_undo_extents and dba_free_space.

Code Changes

Here are the specific changes to be made in the init method:

if ($params{mode} =~ /server::[c]*database::tablespace::(usage|free)/) {
    if (! defined $self->{bytes_max} || $self->{bytes_max} eq '') { 
        # eq '' kommt z.b. vor, wenn ein datafile online_status recover hat
        # in dba_data_files sind dann bytes und maxbytes nicht belegt (Null)
        $self->{bytes} = 0;
        $self->{bytes_max} = 0;
        $self->{bytes_free} = 0;
        $self->{percent_used} = 0;
        $self->{real_bytes_max} = $self->{bytes};
        $self->{real_bytes_free} = $self->{bytes_free};
    } else {
        if ($params{calcmeth} eq "classic") {
            # (total - free) / total * 100 = % used
            # (used + free - free) / ( used + free)
            if ($self->{bytes_max} == 0) { 
                $self->{percent_used} =
                    ($self->{bytes} - $self->{bytes_free}) / $self->{bytes} * 100;
                $self->{real_bytes_max} = $self->{bytes};
                $self->{real_bytes_free} = $self->{bytes_free};
            } elsif ($self->{bytes_max} > $self->{bytes}) {
                $self->{percent_used} =
                    ($self->{bytes} - $self->{bytes_free}) / $self->{bytes_max} * 100;
                $self->{real_bytes_max} = $self->{bytes_max};    
                $self->{real_bytes_free} = $self->{bytes_free} + ($self->{bytes_max} - $self->{bytes});
            } else {
                # alter tablespace USERS add datafile 'users02.dbf'
                #     size 5M autoextend on next 200K maxsize 6M;
                # bytes = 5M, maxbytes = 6M
                # ..... data arriving...until ORA-01652: unable to extend temp segment
                # bytes = 6M, maxbytes = 6M
                # alter database datafile 5 resize 8M;
                # bytes = 8M, maxbytes = 6M
                $self->{percent_used} =
                    ($self->{bytes} - $self->{bytes_free}) / $self->{bytes} * 100;
                $self->{real_bytes_max} = $self->{bytes};
                $self->{real_bytes_free} = $self->{bytes_free};
            }
        } elsif ($params{calcmeth} eq "vigna") {
        }
    }
    # Ensure percent_used is not negative
    if ($self->{percent_used} < 0) {
        $self->{percent_used} = 0;
    }
    $self->{percent_free} = 100 - $self->{percent_used};
    my $tlen = 20;
    my $len = int((($params{mode} =~ /server::[c]*database::tablespace::usage/) ?
        $self->{percent_used} : $self->{percent_free} / 100 * $tlen) + 0.5);
}

Conclusion

The primary cause of the bug appears to be the lack of validation checks for bytes_free and the calculation of percent_used. Implementing the recommended changes should prevent negative values from being returned for tablespace usage. Further review of the SQL queries might also be necessary to ensure accurate data retrieval.

Disclaimer: This comment was entirely generated using AI. Be aware that the information provided may be incorrect.

Current plan usage: 15.54%

Have feedback or need help?
Discord
Documentation
[email protected]

@philippe-ebersohl
Copy link
Author

Duplicate issue, same as #30
It seems linked to an Oracle bug I read about. I will update this with the Oracle bug issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant