Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to import current cluster to temporal operator ? #562

Open
mfractal opened this issue Nov 23, 2023 · 4 comments
Open

Is it possible to import current cluster to temporal operator ? #562

mfractal opened this issue Nov 23, 2023 · 4 comments

Comments

@mfractal
Copy link

we have a production cluster deployed via helm chart, i'd like to migrate to use the operator without any downtime if possible. what would be the best path to do it ?

@alexandrevilain
Copy link
Owner

Hi!

I never did that but it could be doable. Let's try this!

What I could suggest to you:

Create a new TemporalCluster on a dev cluster (with brand new storage) and fill the spec fields to make the operator generate a configmap that looks like (as much as possible) with the current running cluster configmap (the config generated by the helm chart). The only diff should be database users, password and endpoints. If you have diff on the configmap, please raise them on this issue. Maybe we have missing feature on the operator spec.

Then try to make services managed by the operator to join your existing cluster. To do that, create a new TemporalCluster with the spec you filled on the dev cluster. The operator will try to configure the database for you. To make it skip the persistence reconciliation, update the TemporalCluster's status with the following fields:

    persistence:
      defaultStore:
        created: true
        schemaVersion: 1.21.2
        setup: true
        type: postgres
      visibilityStore:
        created: true
        schemaVersion: 1.21.2
        setup: true
        type: postgres

(update it with the right values).
This will make the operator to only deploy the components.

If the services deployed by the operator have successfully joined the current existing cluster you'll be able to uninstall the helm chart.

I have no clue if it could work, let's try this :)

@yujunz
Copy link
Contributor

yujunz commented Nov 25, 2023

Then try to make services managed by the operator to join your existing cluster. To do that, create a new TemporalCluster with the spec you filled on the dev cluster. The operator will try to configure the database for you. To make it skip the persistence reconciliation, update the TemporalCluster's status

Interesting. We have the similar requirement too. Could you elaborate a bit on the terms and steps?

One known issue during our previous attempt to take ownership of existing database is the conflict on cluster meta info which seems to be a checksum and hard to reverse engineering to the source. The workaround was deleting it and let temporal regenerate it from the configmap created by operator.

@Aoao54
Copy link

Aoao54 commented Nov 30, 2023

Hi @mfractal @alexandrevilain
We have the similar requirement. I was gonna try this.

generate a configmap that looks like (as much as possible) with the current running cluster configmap (the config generated by the helm chart)

But in our production cluster's clusterMetadata config , the Cluster Name is "active" , the value of Helm chart default : )

 clusterMetadata:
  enableGlobalDomain: false
  failoverVersionIncrement: 10
  masterClusterName: "active"
  currentClusterName: "active"
  clusterInformation:
    active:
      enabled: true
      initialFailoverVersion: 1
      rpcName: "temporal-frontend"
      rpcAddress: "127.0.0.1:7933"

Looks like it's impossible to make those two config fits. Since the clusterMetadata config is auto generated by operator and can't config now. Unless we set our cluster name to "active".

In our solution, there will be a downtime.But it works.

  1. Scale down production to zero replicas
  2. Deploy temporal cluster with operator using the same prod db

It will take over the running and closed workflow executions.

If you got panic: Cluster info initial versions have duplicates with the new deployment. The reason is below image.

You can bypass it by deleting the old clusterMetadata which is stored in the table cluster_metadata_info of default DB.

image

@alexandrevilain
Copy link
Owner

Hi!
Good news, I think that #494 would help you :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants