Skip to content

Commit

Permalink
Removes APIs that have been deprecated and have exceeded the grace pe…
Browse files Browse the repository at this point in the history
…riod, improves *_warning_wrapper performance (#4176)

* Removes APIs that have been deprecated >1 releases ago.
* Updates tests that assumed the deprecated APIs were in place.
* Changes the `*_warning_wrapper` utilities to accept a namespace name to use in the warning messages instead of making an expensive `inspect.stack()` call.  `inspect.stack()` is still used if a namespace name was not provided.  This reduces overall `import` time, especially when used with `cudf.pandas` (attn: @shwina).
* Removes the redundant `cugraph.utilities.api_tools` module in favor of just using the equivalent `pylibcugraph` module directly.
* Fixes jaccard notebook to remove deprecated API calls that are no longer present and updates it for clarity and to match the current implementation.
* Cleans up jaccard docstrings.

Authors:
  - Rick Ratzel (https://github.com/rlratzel)

Approvers:
  - Naim (https://github.com/naimnv)
  - Joseph Nke (https://github.com/jnke2016)
  - Don Acosta (https://github.com/acostadon)
  - Brad Rees (https://github.com/BradReesWork)

URL: #4176
  • Loading branch information
rlratzel authored Feb 23, 2024
1 parent 3499f28 commit 2c478fb
Show file tree
Hide file tree
Showing 20 changed files with 161 additions and 1,096 deletions.
368 changes: 38 additions & 330 deletions notebooks/algorithms/link_prediction/Jaccard-Similarity.ipynb

Large diffs are not rendered by default.

Binary file modified notebooks/img/karate_similarity.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
8 changes: 1 addition & 7 deletions python/cugraph/cugraph/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (c) 2019-2023, NVIDIA CORPORATION.
# Copyright (c) 2019-2024, NVIDIA CORPORATION.
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
Expand Down Expand Up @@ -80,9 +80,6 @@
overlap_coefficient,
sorensen,
sorensen_coefficient,
jaccard_w,
overlap_w,
sorensen_w,
)

from cugraph.traversal import (
Expand All @@ -100,9 +97,6 @@

from cugraph.utilities import utils

from cugraph.experimental import strong_connected_component
from cugraph.experimental import find_bicliques

from cugraph.linear_assignment import hungarian, dense_hungarian
from cugraph.layout import force_atlas2

Expand Down
31 changes: 3 additions & 28 deletions python/cugraph/cugraph/dask/link_prediction/jaccard.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (c) 2022-2023, NVIDIA CORPORATION.
# Copyright (c) 2022-2024, NVIDIA CORPORATION.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -66,38 +66,13 @@ def jaccard(input_graph, vertex_pair=None, use_weight=False):
of their intersection divided by the volume of their union. In the context
of graphs, the neighborhood of a vertex is seen as a set. The Jaccard
similarity weight of each edge represents the strength of connection
between vertices based on the relative similarity of their neighbors. If
first is specified but second is not, or vice versa, an exception will be
thrown.
NOTE: If the vertex_pair parameter is not specified then the behavior
of cugraph.jaccard is different from the behavior of
networkx.jaccard_coefficient.
between vertices based on the relative similarity of their neighbors.
cugraph.dask.jaccard, in the absence of a specified vertex pair list, will
compute the two_hop_neighbors of the entire graph to construct a vertex pair
list and will return the jaccard coefficient for those vertex pairs. This is
not advisable as the vertex_pairs can grow exponentially with respect to the
size of the datasets
networkx.jaccard_coefficient, in the absence of a specified vertex
pair list, will return an upper triangular dense matrix, excluding
the diagonal as well as vertex pairs that are directly connected
by an edge in the graph, of jaccard coefficients. Technically, networkx
returns a lazy iterator across this upper triangular matrix where
the actual jaccard coefficient is computed when the iterator is
dereferenced. Computing a dense matrix of results is not feasible
if the number of vertices in the graph is large (100,000 vertices
would result in 4.9 billion values in that iterator).
If your graph is small enough (or you have enough memory and patience)
you can get the interesting (non-zero) values that are part of the networkx
solution by doing the following:
But please remember that cugraph will fill the dataframe with the entire
solution you request, so you'll need enough memory to store the 2-hop
neighborhood dataframe.
size of the datasets.
Parameters
----------
Expand Down
54 changes: 18 additions & 36 deletions python/cugraph/cugraph/experimental/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,59 +11,41 @@
# See the License for the specific language governing permissions and
# limitations under the License.

from cugraph.utilities.api_tools import experimental_warning_wrapper
from cugraph.utilities.api_tools import deprecated_warning_wrapper
from cugraph.utilities.api_tools import promoted_experimental_warning_wrapper
from pylibcugraph.utilities.api_tools import (
experimental_warning_wrapper,
promoted_experimental_warning_wrapper,
)

# Passing in the namespace name of this module to the *_wrapper functions
# allows them to bypass the expensive inspect.stack() lookup.
_ns_name = __name__

from cugraph.structure.property_graph import EXPERIMENTAL__PropertyGraph

PropertyGraph = experimental_warning_wrapper(EXPERIMENTAL__PropertyGraph)
PropertyGraph = experimental_warning_wrapper(EXPERIMENTAL__PropertyGraph, _ns_name)

from cugraph.structure.property_graph import EXPERIMENTAL__PropertySelection

PropertySelection = experimental_warning_wrapper(EXPERIMENTAL__PropertySelection)
PropertySelection = experimental_warning_wrapper(
EXPERIMENTAL__PropertySelection, _ns_name
)

from cugraph.dask.structure.mg_property_graph import EXPERIMENTAL__MGPropertyGraph

MGPropertyGraph = experimental_warning_wrapper(EXPERIMENTAL__MGPropertyGraph)
MGPropertyGraph = experimental_warning_wrapper(EXPERIMENTAL__MGPropertyGraph, _ns_name)

from cugraph.dask.structure.mg_property_graph import EXPERIMENTAL__MGPropertySelection

MGPropertySelection = experimental_warning_wrapper(EXPERIMENTAL__MGPropertySelection)

# FIXME: Remove experimental.triangle_count next release
from cugraph.community.triangle_count import triangle_count

triangle_count = promoted_experimental_warning_wrapper(triangle_count)
MGPropertySelection = experimental_warning_wrapper(
EXPERIMENTAL__MGPropertySelection, _ns_name
)

from cugraph.experimental.components.scc import EXPERIMENTAL__strong_connected_component

strong_connected_component = experimental_warning_wrapper(
EXPERIMENTAL__strong_connected_component
)

from cugraph.experimental.structure.bicliques import EXPERIMENTAL__find_bicliques

find_bicliques = deprecated_warning_wrapper(
experimental_warning_wrapper(EXPERIMENTAL__find_bicliques)
EXPERIMENTAL__strong_connected_component, _ns_name
)

from cugraph.gnn.data_loading import BulkSampler

BulkSampler = promoted_experimental_warning_wrapper(BulkSampler)


from cugraph.link_prediction.jaccard import jaccard, jaccard_coefficient

jaccard = promoted_experimental_warning_wrapper(jaccard)
jaccard_coefficient = promoted_experimental_warning_wrapper(jaccard_coefficient)

from cugraph.link_prediction.sorensen import sorensen, sorensen_coefficient

sorensen = promoted_experimental_warning_wrapper(sorensen)
sorensen_coefficient = promoted_experimental_warning_wrapper(sorensen_coefficient)

from cugraph.link_prediction.overlap import overlap, overlap_coefficient

overlap = promoted_experimental_warning_wrapper(overlap)
overlap_coefficient = promoted_experimental_warning_wrapper(overlap_coefficient)
BulkSampler = promoted_experimental_warning_wrapper(BulkSampler, _ns_name)
10 changes: 7 additions & 3 deletions python/cugraph/cugraph/experimental/gnn/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (c) 2023, NVIDIA CORPORATION.
# Copyright (c) 2023-2024, NVIDIA CORPORATION.
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
Expand All @@ -12,6 +12,10 @@
# limitations under the License.

from cugraph.gnn.data_loading import BulkSampler
from cugraph.utilities.api_tools import promoted_experimental_warning_wrapper
from pylibcugraph.utilities.api_tools import promoted_experimental_warning_wrapper

BulkSampler = promoted_experimental_warning_wrapper(BulkSampler)
# Passing in the namespace name of this module to the *_wrapper functions
# allows them to bypass the expensive inspect.stack() lookup.
_ns_name = __name__

BulkSampler = promoted_experimental_warning_wrapper(BulkSampler, _ns_name)
19 changes: 1 addition & 18 deletions python/cugraph/cugraph/link_prediction/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (c) 2019-2023, NVIDIA CORPORATION.
# Copyright (c) 2019-2024, NVIDIA CORPORATION.
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
Expand All @@ -11,26 +11,9 @@
# See the License for the specific language governing permissions and
# limitations under the License.


from cugraph.utilities.api_tools import deprecated_warning_wrapper
from cugraph.link_prediction.jaccard import jaccard
from cugraph.link_prediction.jaccard import jaccard_coefficient

from cugraph.link_prediction.sorensen import sorensen
from cugraph.link_prediction.sorensen import sorensen_coefficient

from cugraph.link_prediction.overlap import overlap
from cugraph.link_prediction.overlap import overlap_coefficient

# To be deprecated
from cugraph.link_prediction.wjaccard import jaccard_w

jaccard_w = deprecated_warning_wrapper(jaccard_w)

from cugraph.link_prediction.woverlap import overlap_w

overlap_w = deprecated_warning_wrapper(overlap_w)

from cugraph.link_prediction.wsorensen import sorensen_w

sorensen_w = deprecated_warning_wrapper(sorensen_w)
70 changes: 3 additions & 67 deletions python/cugraph/cugraph/link_prediction/jaccard.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (c) 2019-2023, NVIDIA CORPORATION.
# Copyright (c) 2019-2024, NVIDIA CORPORATION.
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
Expand Down Expand Up @@ -56,7 +56,6 @@ def ensure_valid_dtype(input_graph, vertex_pair):
def jaccard(
input_graph: Graph,
vertex_pair: cudf.DataFrame = None,
do_expensive_check: bool = False, # deprecated
use_weight: bool = False,
):
"""
Expand All @@ -66,43 +65,13 @@ def jaccard(
of their intersection divided by the volume of their union. In the context
of graphs, the neighborhood of a vertex is seen as a set. The Jaccard
similarity weight of each edge represents the strength of connection
between vertices based on the relative similarity of their neighbors. If
first is specified but second is not, or vice versa, an exception will be
thrown.
NOTE: If the vertex_pair parameter is not specified then the behavior
of cugraph.jaccard is different from the behavior of
networkx.jaccard_coefficient.
between vertices based on the relative similarity of their neighbors.
cugraph.jaccard, in the absence of a specified vertex pair list, will
compute the two_hop_neighbors of the entire graph to construct a vertex pair
list and will return the jaccard coefficient for those vertex pairs. This is
not advisable as the vertex_pairs can grow exponentially with respect to the
size of the datasets
networkx.jaccard_coefficient, in the absence of a specified vertex
pair list, will return an upper triangular dense matrix, excluding
the diagonal as well as vertex pairs that are directly connected
by an edge in the graph, of jaccard coefficients. Technically, networkx
returns a lazy iterator across this upper triangular matrix where
the actual jaccard coefficient is computed when the iterator is
dereferenced. Computing a dense matrix of results is not feasible
if the number of vertices in the graph is large (100,000 vertices
would result in 4.9 billion values in that iterator).
If your graph is small enough (or you have enough memory and patience)
you can get the interesting (non-zero) values that are part of the networkx
solution by doing the following:
>>> from cugraph.datasets import karate
>>> input_graph = karate.get_graph(download=True, ignore_weights=True)
>>> pairs = input_graph.get_two_hop_neighbors()
>>> df = cugraph.jaccard(input_graph, pairs)
But please remember that cugraph will fill the dataframe with the entire
solution you request, so you'll need enough memory to store the 2-hop
neighborhood dataframe.
size of the datasets.
Parameters
----------
Expand All @@ -121,21 +90,11 @@ def jaccard(
current implementation computes the jaccard coefficient for all
adjacent vertices in the graph.
do_expensive_check : bool, optional (default=False)
Deprecated.
This option added a check to ensure integer vertex IDs are sequential
values from 0 to V-1. That check is now redundant because cugraph
unconditionally renumbers and un-renumbers integer vertex IDs for
optimal performance, therefore this option is deprecated and will be
removed in a future version.
use_weight : bool, optional (default=False)
Flag to indicate whether to compute weighted jaccard (if use_weight==True)
or un-weighted jaccard (if use_weight==False).
'input_graph' must be weighted if 'use_weight=True'.
Returns
-------
df : cudf.DataFrame
Expand All @@ -161,13 +120,6 @@ def jaccard(
>>> df = jaccard(input_graph)
"""
if do_expensive_check:
warnings.warn(
"do_expensive_check is deprecated since vertex IDs are no longer "
"required to be consecutively numbered",
FutureWarning,
)

if input_graph.is_directed():
raise ValueError("Input must be an undirected Graph.")

Expand Down Expand Up @@ -220,7 +172,6 @@ def jaccard(
def jaccard_coefficient(
G: Union[Graph, "networkx.Graph"],
ebunch: Union[cudf.DataFrame, Iterable[Union[int, str, float]]] = None,
do_expensive_check: bool = False, # deprecated
):
"""
For NetworkX Compatability. See `jaccard`
Expand All @@ -244,14 +195,6 @@ def jaccard_coefficient(
pairs. Otherwise, the current implementation computes the overlap
coefficient for all adjacent vertices in the graph.
do_expensive_check : bool, optional (default=False)
Deprecated.
This option added a check to ensure integer vertex IDs are sequential
values from 0 to V-1. That check is now redundant because cugraph
unconditionally renumbers and un-renumbers integer vertex IDs for
optimal performance, therefore this option is deprecated and will be
removed in a future version.
Returns
-------
df : cudf.DataFrame
Expand All @@ -277,13 +220,6 @@ def jaccard_coefficient(
>>> df = jaccard_coefficient(G)
"""
if do_expensive_check:
warnings.warn(
"do_expensive_check is deprecated since vertex IDs are no longer "
"required to be consecutively numbered",
FutureWarning,
)

vertex_pair = None

G, isNx = ensure_cugraph_obj_for_nx(G)
Expand Down
Loading

0 comments on commit 2c478fb

Please sign in to comment.