What’s the deal with the SettingWithCopyWarning?
You may have noticed this popping up on occasion, usually with a pink background:
/opt/conda/lib/python3.7/site-packages/ipykernel_launcher.py:6: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
This warning can be a strange one, since it can crop up unexpectedly and sometimes seems (or is) nearly random when it does. Indeed, it is especially confounding when it happens even when we are using using the .loc accessor.
Now, the good thing about it is that it is only a warning. The operation you desired to perform most likely worked just fine. But you may be tired of the pink hued warning cropping up all the time. And in certain circumstances there’s a chance that things may not work as they should.
Here’s a way to grasp the problem and fix it.
What’s Happening
The culprit is typically in a prior step. In my recent experience, these steps have sometimes resulted in the behavior occurring a few steps later:
df = df[['title','date','budget','revenue']]
Or:
df = df[df['budget'] > 0]
It seems rather simple: I want to update the dataframe itself so that it has fewer columns or only a filtered set of records. And I’m overwriting the original dataframe with the new, assigning it to become the new df.
And then, at a later step, I sometimes start getting the dreaded SettingWithCopyWarning.
Why is this Happening?
Under certain circumstances, when we update a dataframe and save over the original variable, pandas stores this as a *view* of the original dataframe. In pandas memory, it retains a connection to the dataframe as it was before. Thus this view is, in the words of the warning, “a copy of a slice” of the original dataframe.
What to Do About It
Here’s a quick and effective way to deal with it. When you store a new version of the dataframe to a variable, chain the .copy() method on the end of the operation. This severs the connection to the original dataframe and makes it an entirely new object.
For example:
df = df[['title','date','budget','revenue']].copy()
Or:
df = df[df['budget'] > 0].copy()
When we use .copy(), this forces pandas to wipe the old dataframe from memory and re-assign df as an entirely new dataframe, with no connection to a prior version.
Operations you perform after that point should no longer provoke the dreaded SettingWithCopyWarning.
Try it for yourself. It should help!
References
- This article discusses the oddness of the behavior and hopes it will be changed: Views and Copies in pandas — Practical Data Science
- pandas.DataFrame.copy() — pandas Documentation