TFX Evaluator takes too long in "TFX_Pipeline_for_Bert_Preprocessing" #71

deep-diver · 2021-02-17T23:31:05Z

It seems like 'Evaluator' component takes too long time (more than 2 hours, and it hadn't done yet) in Kubeflow environment on GCP AI Platform Pipeline. It is very unexpected behaviour when comparing the notebook version which took about less than 5 minutes with GPU.

I have tried a number of different VM options with different CPU/Memory (but not GPU, because GCP team didn't let me have more GPU quota)

I am assuming that environments with and without GPU behaves differently (since Evaluator tries to evaluate two models[blessing, current] by inferencing inputs). If that is the case, the problem is that I want to allocate one GPU k8s node for one specific TFX component. Otherwise I have to equip every single nodes with GPU which is not desirable.

Any possible thoughts?

hanneshapke · 2021-02-18T03:05:10Z

Hi @deep-diver,

If that is the case, the problem is that I want to allocate one GPU k8s node for one specific TFX component.
Last time, I checked TFX didn't support component specific node assignments when KFP is used as an orchestrator. Maybe @rcrowe-google knows if/when TFX supports component specific node assignments.

Regarding GCP AI Platform: Can you share a bit more info around your configuration?
I suspect that the eval is running on a non-GPU node and depending on the number of samples during your eval step, it can take forever. Try to run it on a GPU or switch to ALBERT which executes faster on CPUs (example is in the same Github folder).

I hope this helps.

deep-diver · 2021-02-18T03:29:30Z

@hanneshapke

Here is my personal repo.

I am not sure how to set smaller step number?

  eval_config = tfma.EvalConfig(
      model_specs=[tfma.ModelSpec(label_key='label')],
      slicing_specs=[tfma.SlicingSpec()],
      metrics_specs=[
          tfma.MetricsSpec(metrics=[
              tfma.MetricConfig(
                  class_name='CategoricalAccuracy',
                  threshold=tfma.MetricThreshold(
                      value_threshold=tfma.GenericValueThreshold(
                          lower_bound={'value': 0.5}),
                      change_threshold=tfma.GenericChangeThreshold(
                          direction=tfma.MetricDirection.HIGHER_IS_BETTER,
                          absolute={'value': -1e-2})))
          ])
      ]
  )

  evaluator = Evaluator(
      examples=example_gen.outputs['examples'],
      model=trainer.outputs['model'],
      baseline_model=model_resolver.outputs['model'],
      eval_config=eval_config
  )
  components.append(evaluator)

I will try the ALBERT version! Thanks! 👍 👍

But I want to know if there is a way to allocate GPU machine only for TFX Evaluator component. Since this is a simple personal project, I don't want have multiple k8s GPU nodes. (can I set one node with GPU, and two nodes without GPU?)

rcrowe-google · 2021-02-18T20:01:37Z

Configuring nodes for specific components is in development now, and should be available soon.
@zhitaoli

deep-diver · 2021-02-19T00:55:33Z

@rcrowe-google

Or is there a way to wrap TFX component in @func_to_container_op?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TFX Evaluator takes too long in "TFX_Pipeline_for_Bert_Preprocessing" #71

TFX Evaluator takes too long in "TFX_Pipeline_for_Bert_Preprocessing" #71

deep-diver commented Feb 17, 2021 •

edited

Loading

hanneshapke commented Feb 18, 2021

deep-diver commented Feb 18, 2021

rcrowe-google commented Feb 18, 2021

deep-diver commented Feb 19, 2021

TFX Evaluator takes too long in "TFX_Pipeline_for_Bert_Preprocessing" #71

TFX Evaluator takes too long in "TFX_Pipeline_for_Bert_Preprocessing" #71

Comments

deep-diver commented Feb 17, 2021 • edited Loading

hanneshapke commented Feb 18, 2021

deep-diver commented Feb 18, 2021

rcrowe-google commented Feb 18, 2021

deep-diver commented Feb 19, 2021

deep-diver commented Feb 17, 2021 •

edited

Loading