SettingWithCopyWarning? Try using .copy()

What’s the deal with the SettingWithCopyWarning?

You may have noticed this popping up on occasion, usually with a pink background:

/opt/conda/lib/python3.7/site-packages/ipykernel_launcher.py:6: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

This warning can be a strange one, since it can crop up unexpectedly and sometimes seems (or is) nearly random when it does. Indeed, it is especially confounding when it happens even when we are using using the .loc accessor.

Now, the good thing about it is that it is only a warning. The operation you desired to perform most likely worked just fine. But you may be tired of the pink hued warning cropping up all the time. And in certain circumstances there’s a chance that things may not work as they should.

Here’s a way to grasp the problem and fix it.

What’s Happening

The culprit is typically in a prior step. In my recent experience, these steps have sometimes resulted in the behavior occurring a few steps later:

df = df[['title','date','budget','revenue']]

Or:

df = df[df['budget'] > 0]

It seems rather simple: I want to update the dataframe itself so that it has fewer columns or only a filtered set of records. And I’m overwriting the original dataframe with the new, assigning it to become the new df.

And then, at a later step, I sometimes start getting the dreaded SettingWithCopyWarning.

Why is this Happening?

Under certain circumstances, when we update a dataframe and save over the original variable, pandas stores this as a *view* of the original dataframe. In pandas memory, it retains a connection to the dataframe as it was before. Thus this view is, in the words of the warning, “a copy of a slice” of the original dataframe.

What to Do About It

Here’s a quick and effective way to deal with it. When you store a new version of the dataframe to a variable, chain the .copy() method on the end of the operation. This severs the connection to the original dataframe and makes it an entirely new object.

For example:

df = df[['title','date','budget','revenue']].copy()

Or:

df = df[df['budget'] > 0].copy()

When we use .copy(), this forces pandas to wipe the old dataframe from memory and re-assign df as an entirely new dataframe, with no connection to a prior version.

Operations you perform after that point should no longer provoke the dreaded SettingWithCopyWarning.

Try it for yourself. It should help!

References

5 Reasons to choose Python for your first programming language

5 Reasons to choose Python for your first programming language

Aspiring programmers often ask the question, “What language should I begin with.”

The first answer is: the first programming language you learn doesn’t matter much. Learn the principles. If you keep programming, you’ll learn more languages anyway.

But the question returns: “I still need to pick a language to begin learning the principles.”

This is where it gets messy. The recommendations come flooding in:

  • C
  • C++
  • C#
  • JavaScript
  • Java
  • Python

Each has its advantages.

Among these Python is a standout, for a few key reasons.

Why Python?

1. Python is fun.

Python is a relatively efficient language, requiring fewer lines of code to produce results.

2. Python is easy to set up.

Some languages, such as Java or C#, have much higher setup and maintenance overhead.

3. Python is not going away anytime soon.

In fact, Python has become the most frequently used teaching language at major universities.

4. Python is a tool of choice for doing some very big things.

For instance: YouTube, Google, Instagram, Pinterest, Quora, Reddit, Dropbox, Civilization IV, and more. It’s also widely used for penetration testingdata analysis, scientific computing, and more.

5. Python jobs pay well, and python programmers are in high demand.

Yi-Jirr Chen has gathered a number of relevant statistics in an excellent article comparing benefits of several languages here.

Ready to get Started?

For all of the above reasons, I’ve elected to add Python to my own repertoire, and it’s what I’ll be using to teach Intro to Computer Programming in our Information Systems program at Oklahoma Wesleyan University.

If you’d like to learn more and get started, here are some resources, below.

For Further Reading

Python is Now the Most Popular Introductory Teaching Language at Top U.S. Universities – ACM.org

If I had to choose between learning Java and Python, what should I choose to learn first? – Quora?

What Programming Language Should a Beginner Learn in 2016? – CodeMentor

Which Programming Language Should I Learn First? – Lifehacker

Courses

Complete Python Bootcamp – Udemy

Learning Python for Data Analysis – Udemy

Python for Programmers – Udemy

Python at Cybrary

Apps for Learning

Learn Python by Sololearn – iOS App