Skip to content
This repository has been archived by the owner on Nov 5, 2022. It is now read-only.

TFX Evaluator takes too long in "TFX_Pipeline_for_Bert_Preprocessing" #71

Open
deep-diver opened this issue Feb 17, 2021 · 4 comments
Open

Comments

@deep-diver
Copy link

deep-diver commented Feb 17, 2021

@hanneshapke

It seems like 'Evaluator' component takes too long time (more than 2 hours, and it hadn't done yet) in Kubeflow environment on GCP AI Platform Pipeline. It is very unexpected behaviour when comparing the notebook version which took about less than 5 minutes with GPU.

  • I have tried a number of different VM options with different CPU/Memory (but not GPU, because GCP team didn't let me have more GPU quota)

I am assuming that environments with and without GPU behaves differently (since Evaluator tries to evaluate two models[blessing, current] by inferencing inputs). If that is the case, the problem is that I want to allocate one GPU k8s node for one specific TFX component. Otherwise I have to equip every single nodes with GPU which is not desirable.

Any possible thoughts?

@hanneshapke
Copy link
Contributor

Hi @deep-diver,

If that is the case, the problem is that I want to allocate one GPU k8s node for one specific TFX component.
Last time, I checked TFX didn't support component specific node assignments when KFP is used as an orchestrator. Maybe @rcrowe-google knows if/when TFX supports component specific node assignments.

Regarding GCP AI Platform: Can you share a bit more info around your configuration?
I suspect that the eval is running on a non-GPU node and depending on the number of samples during your eval step, it can take forever. Try to run it on a GPU or switch to ALBERT which executes faster on CPUs (example is in the same Github folder).

I hope this helps.

@deep-diver
Copy link
Author

@hanneshapke

Here is my personal repo.

I am not sure how to set smaller step number?

  eval_config = tfma.EvalConfig(
      model_specs=[tfma.ModelSpec(label_key='label')],
      slicing_specs=[tfma.SlicingSpec()],
      metrics_specs=[
          tfma.MetricsSpec(metrics=[
              tfma.MetricConfig(
                  class_name='CategoricalAccuracy',
                  threshold=tfma.MetricThreshold(
                      value_threshold=tfma.GenericValueThreshold(
                          lower_bound={'value': 0.5}),
                      change_threshold=tfma.GenericChangeThreshold(
                          direction=tfma.MetricDirection.HIGHER_IS_BETTER,
                          absolute={'value': -1e-2})))
          ])
      ]
  )

  evaluator = Evaluator(
      examples=example_gen.outputs['examples'],
      model=trainer.outputs['model'],
      baseline_model=model_resolver.outputs['model'],
      eval_config=eval_config
  )
  components.append(evaluator)

I will try the ALBERT version! Thanks! 👍 👍

But I want to know if there is a way to allocate GPU machine only for TFX Evaluator component. Since this is a simple personal project, I don't want have multiple k8s GPU nodes. (can I set one node with GPU, and two nodes without GPU?)

@rcrowe-google
Copy link
Contributor

Configuring nodes for specific components is in development now, and should be available soon.
@zhitaoli

@deep-diver
Copy link
Author

@rcrowe-google

Or is there a way to wrap TFX component in @func_to_container_op?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants