You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am writing here because the discord invite in the README.md is invalid.
I am not sure I am doing this "right". Using the dataset provided on Google Drive and the prompt "violins playing Tchaikovsky", it takes 10 minutes on an RTX 4070Ti to generate tokens and create a 4-second clip of chaotic humming sounds, and when I make a 30 seconds clip, which takes over an hour to generate tokens, it creates a 3 meg file that sounds like car horns under water :/
Is there a preferred prompt to use with the test data? What sounds were sampled to make the test data?
When I tried to sample my own sounds, after 24 hours, the semantic encoding was less than 10% finished. It is "normal' that it should take 10 days to sample a clip?
Also, using the Google Drive data, and --model_config ./model/musiclm_large_small_context.json I get the errors...
`Some weights of RobertaModel were not initialized from the model checkpoint at roberta-base and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
You are using a model of type mert_model to instantiate a model of type hubert. This is not supported for all configurations of models and can yield errors.
What are the correct settings for using the Google Drive data?
I had to use the Goggle Drive because the code, while not generating any errors, generated a 0 byte preprocessed.db file in the semantic section, which caused errors in the generation section.
Is there a working example of this code somewhere with proper checkpoints?
Thanks
The text was updated successfully, but these errors were encountered:
I am writing here because the discord invite in the README.md is invalid.
I am not sure I am doing this "right". Using the dataset provided on Google Drive and the prompt "violins playing Tchaikovsky", it takes 10 minutes on an RTX 4070Ti to generate tokens and create a 4-second clip of chaotic humming sounds, and when I make a 30 seconds clip, which takes over an hour to generate tokens, it creates a 3 meg file that sounds like car horns under water :/
Is there a preferred prompt to use with the test data? What sounds were sampled to make the test data?
When I tried to sample my own sounds, after 24 hours, the semantic encoding was less than 10% finished. It is "normal' that it should take 10 days to sample a clip?
Also, using the Google Drive data, and
--model_config ./model/musiclm_large_small_context.json
I get the errors...`Some weights of RobertaModel were not initialized from the model checkpoint at roberta-base and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
You are using a model of type mert_model to instantiate a model of type hubert. This is not supported for all configurations of models and can yield errors.
What are the correct settings for using the Google Drive data?
My current command is:
I had to use the Goggle Drive because the code, while not generating any errors, generated a 0 byte
preprocessed.db
file in the semantic section, which caused errors in the generation section.Is there a working example of this code somewhere with proper checkpoints?
Thanks
The text was updated successfully, but these errors were encountered: