Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix inconsistency that could cause the optimization algorithm to oscillate #228

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

fumoboy007
Copy link

Fixes #225.

Background

The optimization algorithm has three main calculations:

  1. Select the working set {i, j} that minimizes the decrease in the objective function.
  2. Change alpha[i] and alpha[j] to minimize the decrease in the objective function while respecting constraints.
  3. Update the gradient of the objective function according to the changes to alpha[i] and alpha[j].

All three calculations make use of the matrix Q, which is represented by the QMatrix class. The QMatrix class has two main methods:

  • get_Q, which returns an array of values for a single column of the matrix; and
  • get_QD, which returns an array of diagonal values.

Problem

Q values are of type Qfloat while QD values are of type double. Qfloat is currently defined as float, so there can be inconsistency in the diagonal values returned by get_Q and get_QD. For example, in #225, one of the diagonal values is 181.05748749793070829 as double and 180.99411909539512067 as float.

The first two calculations of the optimization algorithm access the diagonal values via get_QD. However, the third calculation accesses the diagonal values via get_Q. This inconsistency between the minimization calculations and the gradient update can cause the optimization algorithm to oscillate, as demonstrated by #225.

Solution

We change get_Q to return a new class called QColumn instead of a plain array of values. The QColumn class overloads the subscript operator, so accessing individual elements is the same as before. Internally though, the QColumn class will return the QD value when the diagonal element is accessed. This guarantees that all calculations are using the same values for the diagonal elements, eliminating the inconsistency.

Alternatives Considered

Alternatively, we could change Qfloat to be defined as double. This would also eliminate the inconsistency; however, it would reduce the cache capacity by half.

Future Changes

The Java code will be updated similarly in a separate commit.

…llate.

Fixes cjlin1#225.

# Background

The optimization algorithm has three main calculations:
1. Select the working set `{i, j}` that minimizes the decrease in the objective function.
2. Change `alpha[i]` and `alpha[j]` to minimize the decrease in the objective function while respecting constraints.
3. Update the gradient of the objective function according to the changes to `alpha[i]` and `alpha[j]`.

All three calculations make use of the matrix `Q`, which is represented by the `QMatrix` class. The `QMatrix` class has two main methods:
- `get_Q`, which returns an array of values for a single column of the matrix; and
- `get_QD`, which returns an array of diagonal values.

# Problem

`Q` values are of type `Qfloat` while `QD` values are of type `double`. `Qfloat` is currently defined as `float`, so there can be inconsistency in the diagonal values returned by `get_Q` and `get_QD`. For example, in cjlin1#225, one of the diagonal values is `181.05748749793070829` as `double` and `180.99411909539512067` as `float`.

The first two calculations of the optimization algorithm access the diagonal values via `get_QD`. However, the third calculation accesses the diagonal values via `get_Q`. This inconsistency between the minimization calculations and the gradient update can cause the optimization algorithm to oscillate, as demonstrated by cjlin1#225.

# Solution

We change `get_Q` to return a new class called `QColumn` instead of a plain array of values. The `QColumn` class overloads the subscript operator, so accessing individual elements is the same as before. Internally though, the `QColumn` class will return the `QD` value when the diagonal element is accessed. This guarantees that all calculations are using the same values for the diagonal elements, eliminating the inconsistency.

# Alternatives Considered

Alternatively, we could change `Qfloat` to be defined as `double`. This would also eliminate the inconsistency; however, it would reduce the cache capacity by half.

# Future Changes

The Java code will be updated similarly in a separate commit.
fumoboy007 added a commit to fumoboy007/scikit-learn that referenced this pull request Dec 21, 2024
…to oscillate.

See more details in the upstream pull request: cjlin1/libsvm#228.
fumoboy007 added a commit to fumoboy007/scikit-learn that referenced this pull request Dec 21, 2024
…to oscillate.

See more details in the upstream pull request: cjlin1/libsvm#228.

Fixes scikit-learn#30353.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Training gets stuck on a specific dataset
1 participant