Splitting and Merging Multiple Columns #194
-
Running into a problem. This came up specifically with the South Carolina scrapers. Scraping the PDFs, sometimes a dataframe comes out where a portion of the rows' data is combined in the first column. This seems to only happen for the first three columns.
I need a way to recognize when this happens, and then split only the rows that are affected, and move the split data into the respective columns. I know I can use the |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 3 replies
-
I think you are on the right track with the Would it work to first create a boolean index for which rows are impacted by this? Perhaps checking for columns 1 and 2 to be empty & column 0 having two Something like this: col_names = list(df)
bad_rows = df.loc[:, col_names[1:3]].isna().all(axis=1) & (df.loc[: col_names[0]].str.count("\n") == 2)
df.loc[bad_rows, col_names[1:3]] = df.loc[bad_rows, col_names[0]).str.split("\n", expand=True) |
Beta Was this translation helpful? Give feedback.
-
How difficult/stable is it to modify the camelot settings to better identify the header columns? or only pull a subset of those header column |
Beta Was this translation helpful? Give feedback.
How difficult/stable is it to modify the camelot settings to better identify the header columns? or only pull a subset of those header column