You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The issue is that Modin's merge operation does not support the indicator=True option, which is available in Pandas. This option typically creates an additional column, _merge, indicating the source of each row after the merge (i.e., whether the row came from the left DataFrame, the right DataFrame, or both).
Expected Behavior
When performing a merge with indicator=True, Modin should add a column _merge that shows whether a row is from the left DataFrame (left_only), the right DataFrame (right_only), or from both (both).
Error Logs
2024-09-05 16:44:04,679 INFO worker.py:1783 -- Started a local Ray instance.
Traceback (most recent call last):
File "C:\Users\Pichau\WorkSpaceIdeos\partech\aux.py", line 27, in <module>
df_merged[df_merged['_merge'] =='both']
File "C:\Users\Pichau\miniconda3\envs\partech_gloe_update\lib\site-packages\modin\logging\logger_decorator.py", line 144, in run_and_logreturn obj(*args, **kwargs)
File "C:\Users\Pichau\miniconda3\envs\partech_gloe_update\lib\site-packages\modin\pandas\base.py", line 3948, in __getitem__returnself._getitem(key)
File "C:\Users\Pichau\miniconda3\envs\partech_gloe_update\lib\site-packages\modin\logging\logger_decorator.py", line 144, in run_and_logreturn obj(*args, **kwargs)
File "C:\Users\Pichau\miniconda3\envs\partech_gloe_update\lib\site-packages\modin\pandas\dataframe.py", line 3247, in _getitemreturnself._getitem_column(key)
File "C:\Users\Pichau\miniconda3\envs\partech_gloe_update\lib\site-packages\modin\logging\logger_decorator.py", line 144, in run_and_logreturn obj(*args, **kwargs)
File "C:\Users\Pichau\miniconda3\envs\partech_gloe_update\lib\site-packages\modin\pandas\dataframe.py", line 2581, in _getitem_columnraiseKeyError("{}".format(key))
KeyError: '_merge'
Installed Versions
INSTALLED VERSIONS
commit : c8bbca8
python : 3.10.14.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.22621
machine : AMD64
processor : AMD64 Family 23 Model 113 Stepping 0, AuthenticAMD
byteorder : little
LC_ALL : None
LANG : None
LOCALE : pt_BR.cp1252
Modin version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest released version of Modin.
I have confirmed this bug exists on the main branch of Modin. (In order to do this you can follow this guide.)
Reproducible Example
Issue Description
The issue is that Modin's merge operation does not support the indicator=True option, which is available in Pandas. This option typically creates an additional column, _merge, indicating the source of each row after the merge (i.e., whether the row came from the left DataFrame, the right DataFrame, or both).
Expected Behavior
When performing a merge with indicator=True, Modin should add a column _merge that shows whether a row is from the left DataFrame (left_only), the right DataFrame (right_only), or from both (both).
Error Logs
Installed Versions
INSTALLED VERSIONS
commit : c8bbca8
python : 3.10.14.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.22621
machine : AMD64
processor : AMD64 Family 23 Model 113 Stepping 0, AuthenticAMD
byteorder : little
LC_ALL : None
LANG : None
LOCALE : pt_BR.cp1252
Modin dependencies
modin : 0.31.0
ray : 2.35.0
dask : 2023.3.2
distributed : 2023.3.2.1
pandas dependencies
pandas : 2.2.0
numpy : 1.24.3
pytz : 2024.1
dateutil : 2.8.2
setuptools : 69.5.1
pip : 24.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.9.2
html5lib : None
pymysql : None
psycopg2 : 2.9.6
jinja2 : 3.1.4
IPython : 8.27.0
pandas_datareader : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.12.3
bottleneck : None
dataframe-api-compat : None
fastparquet : 2024.2.0
fsspec : 2024.3.1
gcsfs : None
matplotlib : 3.9.2
numba : 0.57.0
numexpr : None
odfpy : None
openpyxl : 3.1.2
pandas_gbq : None
pyarrow : 17.0.0
pyreadstat : None
python-calamine : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : 2.0.34
tables : None
tabulate : 0.9.0
xarray : 2024.7.0
xlrd : None
zstandard : None
tzdata : 2024.1
qtpy : None
pyqt5 : None
None
The text was updated successfully, but these errors were encountered: