You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
which contain the file -------------0_b0ddbee86cdf7d47_0_0.parquet
after it used the command this
the command you give
python scripts/fit_embedding_normalizer.py --ds dataset1:4 dataset2:1 dataset3:10 --save_path "path/to/new/normalizer.pt" --max_nb_samples 1000000
my dataset schema of the 0_b0ddbee86cdf7d47_0_0.parquet
i making the normal.pt file will be according to my ymal file
uv run python scripts/fit_embedding_normalizer.py --ds pretraining_data:1 --save_path "/home/cpatwadityasharma/lcm/large_concept_model/output/normalizer.pt" --max_nb_samples 1000000
the error i mentation in above
i also folow these step
training_data:
name: "pretraining_data"
source_suffix_text: "End of text."
validation_data:
name: "some_other_separate_validation_data"
source_suffix_text: "End of text."
provided me the appropriate for it base lcm traing how to done it solution how to done it .thank you
The text was updated successfully, but these errors were encountered:
this ymal.file
name: "pretraining_data"
parquet_path:
s3: "wiki_data"
source_column: "text_sentences_sonar_emb"
source_text_column: "text_sentences
my data saved after the download
wiki_data folder
which contain the file -------------0_b0ddbee86cdf7d47_0_0.parquet
after it used the command this
the command you give
python scripts/fit_embedding_normalizer.py --ds dataset1:4 dataset2:1 dataset3:10 --save_path "path/to/new/normalizer.pt" --max_nb_samples 1000000
my dataset schema of the 0_b0ddbee86cdf7d47_0_0.parquet
import pyarrow as pa
schema = pa.schema([
("id", pa.int64()),
("url", pa.string()),
("text_sentences_sonar_emb", pa.list_(pa.list_(pa.float32()))),
])
i making the normal.pt file will be according to my ymal file
uv run python scripts/fit_embedding_normalizer.py --ds pretraining_data:1 --save_path "/home/cpatwadityasharma/lcm/large_concept_model/output/normalizer.pt" --max_nb_samples 1000000
the error i mentation in above
i also folow these step
training_data:
name: "pretraining_data"
source_suffix_text: "End of text."
validation_data:
name: "some_other_separate_validation_data"
source_suffix_text: "End of text."
provided me the appropriate for it base lcm traing how to done it solution how to done it .thank you
The text was updated successfully, but these errors were encountered: