No match for FieldRef.Name(split) in id: string #15

nanxue2023 · 2025-01-13T07:37:27Z

Hello together, I get the following error when execute:

No match for FieldRef.Name(split) in id: string 
url: string 
title: string
text: string
text_sentences: list<element: string>
text_sentences_sonar_emb: list<element: fixed_size_list<element: float>[1024]>

I use pdb to print the stack information and find that the program run to line 1087 of parquet_utils.py fragments = list(dataset._dataset.get_fragments(filter=dataset._filter_expression))
I can't fix this. Could u help me to figure out?

The text was updated successfully, but these errors were encountered:

hiskuDN · 2025-01-13T10:10:13Z

@elbayadm gave a good answer here, maybe it'll help. #9 (comment)

nanxue2023 · 2025-01-14T05:32:34Z

@hiskuDN Thx!!!! #9 (comment) helps me solve the problem but I meet a new error: 'pyarrow.lib.ListScalar' object has no attribute 'to'. The error appears in line 191 embs = [x.to(self.gang.device).to(dtype) for x in batch[col_name]] in dataloader.py
I replace embs = [x.to(self.gang.device).to(dtype) for x in batch[col_name]] with the following:

batch_py = torch.tensor(batch[col_name].to_pylist())
embs = [x.to(self.gang.device).to(dtype) for x in batch_py]

Another error occurs! error=expected sequence of length 25 at dim 1 (got 26)
😭

hiskuDN · 2025-01-14T06:03:09Z

What dataset are you using? Are you pytorch for data handling at any point in the pipeline?

nanxue2023 · 2025-01-14T06:07:38Z

wikipedia dataset. I use this code to process the data:

import pandas as pd

dataset_path = "/content/large_concept_model/sample_data/0_a25e918a7789ecfa_0_0.parquet" # dataset path 

df = pd.read_parquet(dataset_path) # load dataset in pandas df

df['split'] = 'train' # adding 'split' column to dataset because of the missing split column

df.to_parquet(dataset_path) # convert dataset to parquest again

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No match for FieldRef.Name(split) in id: string #15

No match for FieldRef.Name(split) in id: string #15

nanxue2023 commented Jan 13, 2025

hiskuDN commented Jan 13, 2025

nanxue2023 commented Jan 14, 2025

hiskuDN commented Jan 14, 2025

nanxue2023 commented Jan 14, 2025

No match for FieldRef.Name(split) in id: string #15

No match for FieldRef.Name(split) in id: string #15

Comments

nanxue2023 commented Jan 13, 2025

hiskuDN commented Jan 13, 2025

nanxue2023 commented Jan 14, 2025

hiskuDN commented Jan 14, 2025

nanxue2023 commented Jan 14, 2025