whats the fastest way to add a new column that already has the same partitions (probably)? #7391

Liquidmasl · 2024-09-06T16:26:28Z

There are a bunch of ways to add a column to a dataframe..

what is the fastest with modin?

say get a new column by applying a function to another one

new_c = df['column'].apply(lambda x: abs(x))

the resulting series should have the same partitions as the dataframe right?

we can use...
merge, or concat, or just do

df['new_col'] = new_c

which is the most readable IMO

and probably a few other ways

but what is the fastest?

Thank you!

Liquidmasl · 2024-09-06T16:57:13Z

And also:

How to add multple columns at once?

concat ? will it play nice with partitions?

cause

df[['col1','col2']] = <some np array with 2 columns and the corrent amount of rows>

just defaults to pandas... because inserting with unhashable key is not supported..?

I dont want to make a new modin dataframe out of the np array for concatenation because i dont want to cause trouble with partitions that dont fit.

Liquidmasl added question ❓ Questions about Modin Triage 🩹 Issues that need triage labels Sep 6, 2024

Provide feedback