Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: milvus-backup failed for Azure Blob Storage Account #446

Open
gifi-siby opened this issue Oct 26, 2024 · 6 comments
Open

[Bug]: milvus-backup failed for Azure Blob Storage Account #446

gifi-siby opened this issue Oct 26, 2024 · 6 comments

Comments

@gifi-siby
Copy link
Contributor

Current Behavior

When I try to backup using azure, I am getting the below error.
Using same azure bucket for milvus storage and backup storage.
["CopyFromURL err"] [error="PUT https://<container-name>.blob.core.windows.net:443/<bucket-name>/devfilesBackup/Backk/binlogs/insert_log/453399115307976339/453399115307976340/453399115308176349/453399115308176349/0\n--------------------------------------------------------------------------------\nRESPONSE 401: 401 Server failed to authenticate the request. Please refer to the information in the www-authenticate header.\nERROR CODE: CannotVerifyCopySource\n--------------------------------------------------------------------------------\n<?xml version="1.0" encoding="utf-8"?><Error><Code>CannotVerifyCopySource</Code><Message>Server failed to authenticate the request. Please refer to the information in the www-authenticate header.\nRequestId:89115e49-f01e-0058-1310-259a52000000\nTime:2024-10-23T05:54:21.2939119Z</Message></Error>\n--------------------------------------------------------------------------------\n"]

Expected Behavior

No response

Steps To Reproduce

# Configures the system log output.
log:
  level: debug # Only supports debug, info, warn, error, panic, or fatal. Default 'info'.
  console: true # whether print log to console
  file:
    rootPath: "logs/backup.log"

http:
  simpleResponse: true

# milvus proxy address, compatible to milvus.yaml
milvus:
  address: localhost
  port: 19530 #443
  authorizationEnabled: true
  tlsMode: 1
  user: "ibmlhadmin"
  password: "password"

# Related configuration of minio, which is responsible for data persistence for Milvus.
minio:
  # Milvus storage configs, make them the same with milvus config
  storageType: "azure"
  # storageType: azure # support storage type: local, minio, s3, aws, gcp, ali(aliyun), azure, tc(tencent), gcpnative
  address: core.windows.net # Address of MinIO/S3
  port: 443 # Port of MinIO/S3
  accessKeyID: my_access_key # accessKeyID of MinIO/S3
  secretAccessKey:my_secret_key # MinIO/S3 encryption string
  useSSL: false # Access to MinIO/S3 with SSL
  useIAM: false
  iamEndpoint: ""
  bucketName: bucket-name # Milvus Bucket name in MinIO/S3, make it the same as your milvus instance
  rootPath: "files" # Milvus storage root path in MinIO/S3, make it the same as your milvus instance

  # Backup storage configs, the storage you want to put the backup data
  backupStorageType: azure # support storage type: local, minio, s3, aws, gcp, ali(aliyun), azure, tc(tencent)
  backupAddress: core.windows.net # Address of MinIO/S3
  backupPort: 443 # Port of MinIO/S3
  backupAccessKeyID: <access-key>  # accessKeyID of MinIO/S3
  backupSecretAccessKey: <secret-key> # MinIO/S3 encryption string
  backupBucketName: <buckeyt-name> # Bucket name to store backup data. Backup data will store to backupBucketName/backupRootPath
  backupRootPath: "filesBackup" # Rootpath to store backup data. Backup data will store to backupBucketName/backupRootPath

  # If you need to back up or restore data between two different storage systems, direct client-side copying is not supported. 
  # Set this option to true to enable data transfer through Milvus Backup.
  # Note: This option will be automatically set to true if `minio.storageType` and `minio.backupStorageType` differ.
  # However, if they are the same but belong to different services, you must manually set this option to `true`.
  crossStorage: "false"
  
backup:
  maxSegmentGroupSize: 2G

  parallelism: 
    # collection level parallelism to backup
    backupCollection: 4
    # thread pool to copy data. reduce it if blocks your storage's network bandwidth
    copydata: 128
    # Collection level parallelism to restore
    restoreCollection: 2
  
  # keep temporary files during restore, only use to debug 
  keepTempFiles: true
  
  # Pause GC during backup through Milvus Http API. 
  gcPause:
    enable: true
    seconds: 7200
    address: http://localhost:9091

Environment

No response

Anything else?

No response

@nisharyan
Copy link

Hey gifi-siby did you get any resolution for this?

@nisharyan
Copy link

It works if you enable anonymous access to the storage container

@gifi-siby
Copy link
Contributor Author

gifi-siby commented Nov 20, 2024

No, I didn't get the resolution.
So, we need to allow public access for this to work? I am not sure if that is correct way.

@nisharyan
Copy link

Yes, enabling anonymous blob access may not be the best approach. Need a more concrete solution.

@gifi-siby
Copy link
Contributor Author

Yeah, that is the problem. Need to figure out a robust solution.

@gifi-siby
Copy link
Contributor Author

  • Backup operation with Azure storage was being failed with below error:
    ["CopyFromURL err"] [error="PUT https://<container-name>.blob.core.windows.net:443/<bucket-name>/devfilesBackup/Backk/binlogs/insert_log/453399115307976339/453399115307976340/453399115308176349/453399115308176349/0\n--------------------------------------------------------------------------------\nRESPONSE 401: 401 Server failed to authenticate the request. Please refer to the information in the www-authenticate header.\nERROR CODE: CannotVerifyCopySource\n--------------------------------------------------------------------------------\n<?xml version="1.0" encoding="utf-8"?><Error><Code>CannotVerifyCopySource</Code><Message>Server failed to authenticate the request. Please refer to the information in the www-authenticate header.\nRequestId:89115e49-f01e-0058-1310-259a52000000\nTime:2024-10-23T05:54:21.2939119Z</Message></Error>\n--------------------------------------------------------------------------------\n"]
    It was found that CopyFromURL should be used along with SAS token (current implementation is in a way that SAS token will be added with the request only if milvus storage and backup storage are different containers only).
    Also, we had to modify the current function to get the SAS token as it was also failing.

  • Also, CopyFromURL will only work for files and not for directories. So, additional check is added to check if the path to be copied is file or directory. CopyFromURL will be called only if it is a file.

  • Even if Backup and restore operation is success now, any query operation in the restored collection is failed

  • When milvus instance bucket and backup buckets are different, temporary files are generated as part of binlog copyng during restore.
    These temporary files will be deleted after restore and goroutines are used for faster deletion. Below error is found during the deletion of the temporary files,
    [2024/12/04 22:38:32.435 -08:00] [WARN] [storage/azure_chunk_manager.go:301] ["failed to remove object"] [bucket=milvus-b] [path=restore-temp-restore_2024_12_05_06_38_11_21842790-default-flute_fromGCS1/flutyyBack/binlogs/insert_log/454195780983311142] [error="DELETE https://sparkadlsiae.blob.core.windows.net:443/milvus-b/restore-temp-restore_2024_12_05_06_38_11_21842790-default-flute_fromGCS1/flutyyBack/binlogs/insert_log/454195780983311142\n--------------------------------------------------------------------------------\nRESPONSE 409: 409 This operation is not permitted on a non-empty directory.\nERROR CODE: DirectoryIsNotEmpty\n--------------------------------------------------------------------------------\n<?xml version=\"1.0\" encoding=\"utf-8\"?><Error><Code>DirectoryIsNotEmpty</Code><Message>This operation is not permitted on a non-empty directory.\nRequestId:47bb902f-f01e-002a-5ee0-469d1d000000\nTime:2024-12-05T06:38:32.4195101Z</Message></Error>\n--------------------------------------------------------------------------------\n"]
    We suspect that this occurs when goroutine deletes parent directory before deleting the child directory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants