Dataclasses are great!

Data is everywhere these days. From statistics to healthcare, media, marketing and even sports, humans have been collecting and analyzing data ever since technology has provided the storage space for it.

In this article we will be talking about dataclasses, a built-in Python module. Also we discuss what problems dataclasses solve and what you can and cannot do with them.

In this article

logo

What are dataclasses: The basics

Dataclasses were introduced in Python 3.7 and as the name implies, a dataclass is a class that contains data. Let's see an example on how you can write a dataclass, so first of all import the dataclass function from the dataclasses module.

from dataclasses import dataclass

And now we can declare a dataclass like so.

@dataclass
class Player:
    '''Class to represent a football player.'''
    name: str
    age: int
    team: str = 'No team yet'

def assign_team(self, new_team) -> None:
    self.team = new_team

First thing you might notice is the @dataclass at the top, which is called a decorator. This decorator is actually the function dataclass(), which is taking our Player class and adding some functionality to it as we will see later. Also notice how the attributes of the class are written following PEP526 "Syntax for Variable Annotations", each having its own type annotation.

The syntax is as follows:

    variable: type_annotation = default_value

Dataclasses also support default values as in the team attribute, so when no team is passed, its value would be 'No team yet', as we can see when printing a Player object.

player_1 = Player('Victor O. Sullivan', 23)
print(player_1)
>>> Player(name='Victor O. Sullivan', age=23, team='No team yet')

player_1.assign_team('Black Eagles FC')
print(player_1)
>>> Player(name='Victor O. Sullivan', age=23, team='Black Eagles FC')

But keep in mind: Type annotation doesn't mean type validation. Python still is a dynamic language, so there are no static type variables. The type annotation just provides a better code comprehension, but it is not a type validator. *Note: Such type validation functionality can be implemented with the mypy library.

Now let's check all the functions @dataclass created for us under the hood. For this we can use the inspect module, like so.

import inspect

Player_functions = inspect.getmembers(Player, inspect.isfunction)

for name, value in Player_functions:
    print(name, value)
>>> 
__eq__ <function Player.__eq__ at 0x000001B512576200>
__init__ <function Player.__init__ at 0x000001B512576050>
__repr__ <function Player.__repr__ at 0x000001B512575FC0>

As you can see, some dunder methods were added to our Player class, and most noticeable the __init__() and __repr__() methods, all thanks to the dataclasses. The __repr__() method has been responsible for the nice representation of the Player object each time we have print()ed it, which is excellent for debugging.

More specific parameters: field() and post init()

Let's try a different example, this time with users' data on a blog where they can comment.

So for this new class, we would like to implement a way to keep track of comments made by every user and for that the field() method comes into play First, import field from the dataclasses module as usual.

from dataclasses import dataclass, field

Then the user class.

@dataclass
class User:
    '''Class to represent users on a blog where they can commment.'''
    username: str
    comments: list[str] = field(default_factory=list, repr=False)

Using the field() method allow us to better specify our constructor parameters, for example if the default value is a mutable object like a list we must set the default_factory=list, because an empty list will cause an error as you see in here.

@dataclass
class User:
    '''Class to represent users on a blog where they can commment.'''
    username: str
    comments: list[str] = []

>>> ValueError: mutable default <class 'list'> for field comments is not allowed: use default_factory

Also if you don't want to show a parameter in the __repr__() call, use repr=False.

user_1 = User('turner_rox', ['Thanks for sharing!', 'Why did you use that there?'])
print(user_1)
>>> User(username='turner_rox')

*Note: If you want type hints in your IDE when writing your dataclass, the typing module provides just that and it's built-in. From the last example, the code can be change like so.

...
from typing import List

 @dataclass
 class User:
    ...
    comments: List[str] = field(default_factory=list, repr=False) # Notice the change

Another common practice is defining a parameter from other parameters by using the field() method in conjunction with the __post_init__() method (which is exclusive to dataclasses). Let's say you might like to implement a comment_counter variable that counts the comments for each user. You can do it like so.

from dataclasses import dataclass, field

@dataclass
class User:
    '''Class to represent users on a blog where they can commment.'''
    username: str
    comments: list[str] = field(default_factory=list, repr=False)
    comments_counter: int = field(default=0, init=False)

    def __post_init__(self):
        self.comments_counter = len(self.comments)

The __post_init__() is called after initialization, so that variables declared in __init__() are available for us. And now let's check what we did.

user_2 = User('davidjam87', ['Hey, I got a question.', 'Cool post', 'I did something really similar in ''here'''])
print(user_2)
>>> User(username='davidjam87', comments_counter=3)

Dataclasses as immutable objects: The frozen instance

If a class that represents an immutable object is what you need, you can fake it with the frozen=True parameter. I said "fake it" because remember that python is dynamic and nothing can change that... ironically.

@dataclass(frozen = True)
class Color:
    '''Class to represent a Color in RGB format'''
    r: int # representing the Red channel
    g: int # representing the Green channel
    b: int # representing the Blue channel

With the class Color "frozen", whenever we try to change one of its parameters after creation, we get an Error message.

red=Color(255, 0, 0)
red.g = 100
>>> dataclasses.FrozenInstanceError: cannot assign to field 'g'

Sub-dataclasses: Inheritance

If it's not obvious at this point dataclasses are still classes, so just as regular classes inheritance is allowed. So let's create a sub-class TransparentColor object that inherits from our Color object. We will do this by adding a new parameter a to our new TransparentColor class, which stands for "alpha" representing how transparent the color is, 0 being full transparent (invisible) and 255 being full opaque.

@dataclass(frozen = True)
class TransparentColor(Color): # sub-class of Color
    '''Class to represent a Color in RGBA format'''
    a: int # representing the Alpha channel (transparency)

blue = TransparentColor(0, 0, 255, 100)
print(blue)
>>> TransparentColor(r=0, g=0, b=255, a=100)

You must keep in mind 2 things when doing inheritance:

  • If the super-class have at least 1 default_value for 1 of its parameters, all of the sub-class' parameters must have default_values.
  • If the super-class is frozen, the sub-class must also be frozen.

Ordering dataclasses: Enable comparison

When it comes to data, we often times need to compare and sort it.

In dataclasses, the eq parameter is True by default so it writes a __eq__()method by default, which compares if 2 objects share the same data, and if so there are consider equal which is really useful and save you from manually writing the method by hand.

redish_green=Color(100, 100, 0)
greenish_red=Color(100, 100, 0)

print(redish_green == greenish_red)
>>> True

The other comparison methods __lt__, __le__, __gt__, and __ge__ are also generated for you if you pass the order=True parameter to the @dataclass decorator. In that case, the criteria used for comparison is magnitude for int and alphabetical order for str. So if you want to have those features, you should write them yourself for full control over how they work.

Dataclasses as dictionaries: Compatibility with .json

One last thing to know about dataclasses is that they can be transformed into a dictionary, with the method asdict(object) which returns a dictionary with the data of the object passed.

from dataclasses import asdict, astuple

yellow = Color(255, 255, 0)

dict_color_yellow = asdict(yellow)
print(dict_color_yellow)
>>> {'r': 255, 'g': 255, 'b': 0}

This is very useful, because .json files are essentially python dictionaries, so we can storage the data on a .json file and then read from the same file to create the same objects. Here is a way to do it.

import json

data_out = [asdict(yellow), asdict(red)]

# Write to the json file
with open('data.json', mode='w') as json_out:
    json.dump(data_out, json_out)

# Read the json file we created
with open('data.json', mode='r') as json_in:
    data_in = json.load(json_in)
    colors_db = [Color(**item) for item in data_in]

print(colors_db)
>>> [Color(r=255, g=255, b=0), Color(r=255, g=0, b=0)]

Other method at your disposal is the astuple(object) method which returns a tuple, if that is what you need.

Summary and alternative libraries

In this article we have covered mainly 2 things.

  • The basics of dataclasses and the functionality they can provide to our regular classes.
  • Dataclasses save us a lot of time writing boilerplate code, and improve the comprehension of our code tremendously.

Other alternative to dataclasses

Even though they are very useful, dataclasses are not the definitive answer to any programming challenge you might encounter involving data. Here are other similar libraries that also are used for storing data on Python classes, so it's up to you deciding on which one suites you best.

dataclass namedtuples pydantic
built-in Yes ✅ Yes ✅ No ❌
Default values Yes ✅ No ❌ Yes ✅
Type validation No ❌ No ❌ Yes ✅

So, if you really need the type validation at runtime and you don't want to implement it yourself or don't want to use mypy as mentioned earlier, pydantic might be what you need.

Dataclasses still are a great built-in library to go with when it comes to storing data in Python.

Thanks for reading. Hope you find this helpful!

0 Comments

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel