How to properly copy in Python (Deep Copy vs Shallow Copy)

This topic is not talked about enough at the beginner level, but not knowing the differences between assignment, shallow copy and deep copy can cause a huge headache.

Let's define what the differences are by taking a look into 3 concrete examples, to get this right once and for all. But first we need to stablish the distinction between mutable and immutable objects.

logo

Mutable and Immutable Objects

In python there are 2 different types of objects:

  • Immutable objects like integers, strings, tuples and Booleans.
  • Mutable objects like lists, dictionaries and sets.

As the name suggest, this distinction refers to the mutability of the object, that is if you can change/alter the object after it's creation.

There is no practical need for copying an immutable object in python.

That's why in this post we will be talking about copying mutable objects exclusively, using as an example a list object.

Assignment vs Copying

Let's say you have some list called original_list, which is a mutable object.

original_list = [1, 2, 3]

And you want to copy it, that is to create a new list called copy_list with the exact same elements as the original one. You might be tempted to use the assignment operator = to do so.

copy_list = original_list

But as soon as you modify something on this new list you created.

copy_list[0] = "apple"

and print them both out.

>>> print(original_list)
['apple', 2, 3]
>>> print(copy_list)
['apple', 2, 3]

You can see that the modification we made to copy_list affected the original_list as well (and viceversa: modifing original_list affects copy_list).

This is because both variables original_list and copy_list refer to the exact same object, a list object in this case. We can confirm this by using the built-in id() method and comparing the two.

>>> id(original_list) == id(copy_list)
True

This diagram might help you understand this example visually.

Diagram 1

Deep copy

Now, let's try again copying the original_list. This time using the deepcopy() method which creates a deep copy of the object, imported from the copy library.

from copy import deepcopy
deep_list = deepcopy(original_list)

Let's change the last element of this new list called deep_list and see what deepcopy() did.

deep_list[2] = "orange"

>>> print(original_list)
['apple', 2, 3]
>>> print(deep_list)
['apple', 2, "orange"]

Let's check if their ids are the same.

>>> id(original_list) == id(deep_list)
False

deepcopy() created a brand new list object different to the original, so deep_list doesn't reference the same object that original_list does.

Visual representation of the example. Diagram 2

Shallow Copy vs Deep Copy

However, there are some situations where using deepcopy() might be simply to much and not what we want.

Let's make one last example using a dummy class called Foo that holds a parameter called name which can be anything, it's irrelevant. Note: If you haven't learned what classes are or for what are they useful yet, check out our article on the basics of OOP.

class Foo(object):
    def __init__(self, name=str):
        self.name = name

Now let's create a list that contains 2 instances of our Foo class and give them some cool names just for fun.

original_foos = [Foo("Billy"), Foo("Veronica")]

And use the deepcopy() method we use before and the copy() method which creates a shallow copy of the object, both from the same copy library.

from copy import deepcopy, copy
shallow_foos = copy(original_foos)
deep_foos = deepcopy(original_foos)

Comparing them with the id() method give us the following.

>>> id(original_foos) == id(deep_foos)
False
>>> id(original_foos) == id(shallow_foos)
False

Both the shallow and the deep copy created different list objects. The key difference is that the shallow copy contains references to the original Foo objects, and the deep copy contains brand new objects.

Let's confirm this by comparing the first original Foo object (Billy) with the ones contained on the deep and shallow copies respectively.

>>> id(original_foos[0]) == id(deep_foos[0])
False
>>> id(original_foos[0]) == id(shallow_foos[0])
True

Keep in mind when designing your code, that deepcopy() is much more slower than copy() because of the need for extra memory storage python needs to create when using deepcopy(), resulting in longer execution time.

Diagram of the example. Diagram 3

Summary

There is no practical reason for copying an immutable object. A shallow copy creates a new object containing references, A deep copy creates a new object containing copies. The assignment operator doesn't copy anything at all.

More information

In this post we have been working with lists which are built-in objects, but if you want some custom behaviour for the shallow and deep copy of your own-created class, you can overwrite or repurpose the __copy__() and the __deepcopy__() methods respectively. For example, a deep copy method that copies just certain attributes of the class and leaves the rest as references to the original. More info obout it here: https://docs.python.org/3/library/copy.html

Thanks for reading. Hope you find this helpful!

3 Komentar

  1. This comment has been removed by a blog administrator.

    ReplyDelete
  2. Thanks for sharing nice information. Are you looking for an assignment writing service? Get the academic writing help from the assignment helper with chat support. dissertation writing help,Summary Generator,do my math homework,accounting assignment help

    ReplyDelete

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel