Coding guidelines help engineering teams to write consistent code which is easy to read and understand for all team members.

Python has an excellent style guide called PEP8. It covers most of the situations you will step into while writing Python. I like PEP8, I believe there has been much effort and thinking put into it. On the other hand, PEP8 can be considered a generic Python guideline rather than strict rules as it allows different approaches to achieve similar goals. And that may be a problem for teams with different skill levels that using methodologies where the team members are equal and there is always a place to argue. It is way easier to write the code or do a code review by a strictly defined practical style guide. Establishing such guidelines can be problematic but it will be very beneficial for the whole team if it is done the right way.

Throughout my career, I defined one for myself and want to share it.

The final goal of this guide is to have code that is clean, consistent, and efficient. Remember — сode is read more often than it is written and only incidentally for machines to execute. Some parts of the guide are opinionated and meant to be strictly followed to preserve consistency when writing new code.

Code Layout

Use 4 spaces instead of tabs

instead of tabs Maximum line length is 120 symbols

2 blank lines between classes and functions

between classes and functions 1 blank line within class between class methods

within class between class methods No blank line following a def line.

line. No whitespace inside parentheses, brackets, or braces.

# good spam(ham[1], {eggs: 2}, []) # bad spam( ham[ 1 ], { eggs: 2 }, )

Surround operators with single whitespace on either side.

# good x == 1 # bad x<1

Never use whitespaces around = when passing keyword arguments or defining a default parameter value,

Use blank lines for logic separation of functionality within functions/methods wherever it is justified.

Move function arguments to a new line with an indentation, if they do not fit into the specified line length

# Good def long_function_name(var_one, var_two, var_three, var_four): print(var_one) # Good def long_function_name( var_one, var_two, var_three, var_four ): print(var_one)

Move the logical conditions to the new line if the line does not fit the maximum line size. This will help you understand the condition by looking from top to bottom. Poor formatting makes it difficult to read and understand complex predicates.

# Good if (this_is_one_thing and that_is_another_thing or that_is_third_thing or that_is_yet_another_thing and one_more_thing ): do_something()

Use multiline strings, not \\ since it gets much more readable.

raise AttributeError( 'Here is a multiline error message ' 'shortened for clarity.' )

Place a class' __init__ at the beginning of each class

at the beginning of each class Use named arguments to improve readability and avoid dummy mistakes in the future

# Bad urlget('[http://google.com](http://google.com/)', 20) # Good urlget('[http://google.com](http://google.com/)', timeout=20)

Never use 3.8’ function parameter syntax / (forbid keyword arguments), it is considered one of the code smells.

Never end your lines with a semicolon, and do not use a semicolon to put two statements on the same line.

Chaining methods should be broken up on multiple lines for better readability

(df.write \ .format('jdbc') .option('url', 'jdbc:postgresql:dbserver') .option('dbtable', 'schema.tablename') .option('user', 'username') .option('password', 'password') .save() )

Naming

Use snake_case for modules, variables, attributes, functions, and method names, not CamelCase

for modules, variables, attributes, functions, and method names, not CamelCase Use CamelCase for class names and fabrics

for class names and fabrics Names should be clear about what a variable, class, function contains/do. If a developer cannot come up with one clear name, then something is wrong with the implementation (see SRP)

Don't include the type of a variable in its name. E.g. use senders instead of senderlist

Formatting

Use double quotes( " ) around strings that are used for interpolation or that intended for the end-user to see, otherwise use single quotes( ' ).

CONFIG = { 'db_name': "db", 'port': 4321, } MESSAGES = { 'en': "Hello %s", } def welcome(language): """Return a language-appropriate greetings""" return MESSAGES[language] % locals()

Add trailing commas in sequences of items only when the closing container token ] , ) , or } does not appear on the same line as the final element

# good x = [1, 2, 3] # good y = [ 0, 1, 4, 6, ] z = { 'a': 1, 'b': 2, } # bad y = [ 0, 1, 4, 6 ] z = { 'a': 1, 'b': 2 }

To format strings use format function or if you using Python≥3.6 use f-strings:

# Bad print('var: %s' % var) # Good print('var: {}'.format(var)) # Good for Python3.6>= print(f'var: {var}')

Always start a new block on a new line

# Bad if flag: return None # Good if flag: return None

Set your IDE to normalize inconsistent line endings

Commenting

First of all, if the code needs comments to clarify its work, you should think about refactoring or rewriting it. The best comments to code are the code itself.

Describe complex, possibly incomprehensible points and side effects in the comments

Separate comments and # with a whitespace

#bad comment # good comment

If a piece of code is poorly understood, in case of future refactoring or a possible change, mark the piece with todo note and your last name and Jira ticket number(if you have one) — @TODO(lastnameN)

def get_ancestors_ids(self): # @TODO(mysurnameN): do a cache reset while saving and changing the category tree cache_name = '{0}_ancestors_{1}'.format(self._meta.model_name, self.pk) cached_ids = cache.get(cache_name) if cached_ids: return cached_ids ids = [c.pk for c in self.get_ancestors(include_self=True)] cache.set(cache_name, ids, timeout=3600) return ids

Type annotations

Type annotations in function signatures and module-scope variables are required. This is good documentation and can also be used with mypy for type checking and error checking. Use them whenever possible. Use pyi files for third-party or extension modules.

Docstrings

All docstrings should be written in RST format (Sphinx). Cheetsheet

Write docstrings for each method that is more complicated than hello world. In docstring summarize function/method behavior and document its arguments, return value(s), side effects, exceptions raised, and restrictions

Wrap docstrings with triple double quotes ( """ )

The docstrings and the description of the arguments must be indented

def some_method(name, state=None): """This function does something :param name: The name to use :type name: string :param state: Current state to be in (optional, default: None) :type state: bool :returns: int -- the return code :raises: AttributeError, KeyError """ ... return 0

Similarly to branching, do not write methods on one line in any case:

def do_something(self): print('Something')

Exceptions

Use more specific exceptions, not base Exception . Make errors obvious.

. Make errors obvious. Exceptions should be written only where they are really needed. No need to write them in cases where you can use a simple if statement

statement Minimize the amount of code in a try/except block. The larger the body of the try, the more likely that an exception will be raised by a line of code that you didn’t expect to raise an exception.

Imports

Avoid creating circular imports by importing modules more specialized than the one you are editing

Relative imports are forbidden (PEP-8 only “highly discourages” them). Where absolutely needed, the from future import absolute_import syntax should be used (see PEP-328)

syntax should be used (see PEP-328) Never use * in imports. Always be explicit about what you're importing. Namespaces make code easier to read so please use them

in imports. Always be explicit about what you're importing. Namespaces make code easier to read so please use them Break long imports using parentheses and indent by 4 spaces. Include the trailing comma after the last import and place the closing bracket on a separate line

from my_pkg.utils import (some_utility_method_1, some_utility_method_2, some_utility_method_3, some_utility_method_4, some_utility_method_5, )

Imports should be written in the following order, indented by newlines: build-in modules third-party modules modules of the current project



import os import logging import typing as T import pandas as pd import pyspark import pyspark.sql from .config import DBConfig

Even a file intended for use as an executable should be importable, and a simple import should not have any side effects. The main functionality must be in the main() function. Thus, the code can be imported as a module for testing or reused in the future.

def main(): ... if __name__ == '__main__': main()

Unit-tests

All unit tests should be written using pytest framework.

There is no need to write a huge unit test with a bunch of assertions — each unit test should check only one specific thing — that's unit tests are all about

# Bad def test_smth(): result = f() assert isinstance(result, list) assert result[0] == 1 assert result[1] == 2 assert result[2] == 3 assert result[3] == 4 # Good def test_smth_type(): result = f() assert isinstance(result, list), 'Result should be list' def test_smth_values(): result = f() assert set(result) == set(expected), f'Result should be {set(expected)}'

The name of the test must clearly express what is being tested.

assert should be followed by a message explaining what it is checking.

It a bad idea to...

use constructions like:

global variables,

iterators where they can be replaced by vectorized operations,

lambda where it is not required,

map and lambda where it can be replaced by a simple list comprehension,

multiple nested maps and lambdas,

nested functions, they are hard to test and debug.

And one final thing

Conclusion

Sure you don't like my guidelines(save your breath here), it's not ideal and I am constantly working on it. In the next post, we will talk about how to create a style guide for your team as well as tooling for making the process consistent among the team.

By the way what style guide are you following?