Python Intermediate – Course

Yoan Mollard

Chapter 1

DEVELOP PYTHON PROJECTS

1.1. Python types and protocols
1.2. Reminder about iteration with for and while
1.3. Reminder about function definition
1.4. Type annotations (aka type hints)
1.5. Exceptions
1.6. Virtual environments (venv)

Chapter 2 (optional, if appliable)

OBJECT-ORIENTED PROGRAMMING (O.O.P.)

2.1. The self object
2.2. The constructor
2.3. Inheritance
2.4. Magic methods (aka dunder methods)
2.5. Summary of OOP terminology

Chapter 3

STRUCTURE AND DISTRIBUTE PYTHON CODE

3.1. Python Enhancement Proposals (PEPs)
3.2. Decorators
3.3. Context manager: the with statement
3.4. Structure code as Python packages
3.5. The Python Package Index (PyPI.org)
3.6. Test Python packages
3.7. Distribute Python packages

Chapter 4

MANAGE IOs OF PYTHON CODE

4.1. Logging
4.2. Manipulate file/directory paths with pathlib
4.3. Use json and csv formats
4.4. Emit HTTP requests from Python with requests
4.5. Call external commands with subprocess
4.6. Multithreading, multiprocessing
4.7. Charset and encoding

Optional topics for day 3 (if appliable)

CHAPTER 1

DEVELOP PYTHON PROJECTS

With reminders

Python types and protocols

Primitive types

i = 9999999999999999999999999                   # int (unbound)
f = 1.0                                         # float
b = True                                        # bool
n = None                                        # NoneType (NULL)

🚨 Beware with floats

Python floats are IEEE754 floats with mathematically incorrect rounding precision:

0.1 + 0.1 + 0.1 - 0.3 == 0    # This is False 😿
print(0.1 + 0.1 + 0.1 - 0.3)  # Returns 5.551115123125783e-17 but not 0

Also, they are not able to handle large differences of precisions:

1e-10 + 1e10 == 1e10          # This is True 😿

When you deal with float number and if precision counts, use the decimal module!

from decimal import Decimal
Decimal("1e-10") + Decimal("1e10") == Decimal("1e10")   # This is False 🎉

Beware not to initialize Decimal with float since the precision is already lost: Decimal(0.1) will show Decimal('0.10000000000000000555111512312578270215')

Reminder about basic collections

Collections allow to store data in structures.

General purpose built-in containers are tuple, string, list, and dict.

Other containers exist in module 🐍 collections.

The tuple

The tuple is the Python type for an ordered sequence of elements (an array).

t = (42, -15, None, 5.0)
t2 = True, True, 42.5
t3 = (1, (2, 3), 4, (4, 5))
t4 = 1,

Selection of an element uses the [ ] operator with an integer index starting at 0:

element = t[0]  # Returns the 0th element from t 

Tuple can be unpacked:

a, b = b, a   # Value swapping that unpacks tuple (b, a) into a and b

The string

The str type is an ordered sequence of characters.

s = "A string"
s2 = 'A string'           # Simple or double quotes make no difference
s3 = s + ' ' + s2         # Concatenation builds and returns a new string
letter = s2[0]            # Element access with an integer index

Tuples and strings are immutable. Definition: An object is said immutable when its value canot be updated after the initial assignment. The opposite is mutable.

Demonstration: put the first letter of these sequences in lower case:

s = "This does not work"
s[0] = "t"
# TypeError: 'str' object does not support item assignment

The list

A list is a mutable sequence of objects using integer indexes:

l = ["List example", 42, ["another", "list"], True, ("a", "tuple")]

element = l[0]             # Access item at index 0
l[0] = "Another example"   # Item assignment works because the list is mutable

some_slice = l[1:3]  # Return a sliced copy of l between indexes 1 (inc.) & 3 (ex.)

42 in l    # Evaluates to True if integer 42 is present in l

l.append("element") # Append at the end (right side)
element = l.pop()             # Remove from the end.

If needed, pop(i) and insert(value, i) operate at index i, but...

... ⚠️ list is fast to operate only at the right side!

Need a left-and-right efficient collection? Use 🐍 deque or 🐍 compare efficiency

The double-ended queue (deque)

A deque is great to append or remove elements at both extremities:

from collections import deque
queue = deque(["Kylie", "Albert", "Josh"])
queue.appendleft("Anna")   # list.insert(0, "Anna") would be slow here: O(n)
queue.popleft()    # list.pop(0) would be slow here: O(n)

Deques perform great for appendleft() and popleft() while lists have poor performances for the equivalent operations insert(0, value) and pop(0).

The dictionary

The dictionary is a key-value pair container, mutable and ordered. Keys are unique.

d = {"key1": "value1", "key2": 42, 1: True} 
# Many types are accepted as keys or values, even mixed together

"key1" in d   # Evaluates to True if "key1" is a key in d
# Operator "in" always and only operates on keys

d["key2"]    # Access the value associated to a key

d.keys()     # dict_keys ["key1", "key2", 1]

d.values()   # dict_values ["value1", 42, True]

d["key3"] = "some other value"   # Insertion of a new pair

d.update({"key4": "foo", "key1": "bar"})

With Python 3.7 and below, dictionaries are unordered (see OrderedDict if needed)

Python typing and protocols

Python typing is dynamic. Type is inferred from the value ➡️ Runtime type

Runtime type of v can be introspected with type(v)

But pythonistas rely on 🦆 duck typing: To state about the suitability of an obj object, the runtime type type(obj) is less important than the methods it declares

Example: As soon as method __iter__ exist in class C, then C is considered an iterable, no matters what type(C) returns.

Python has built-in protocols:

  • Iterable: __iter__
  • Iterator: __iter__ and __next__
  • Sequence __getitem__ and __len__ (used by [] for instance)
  • Container: __contains__ (used by in for instance)
  • Callable: __call__
  • Hashable: __hash__
  • Context Manager: __enter__ and __exit__
  • ...

You can use them in type annotations.

Reminder about iteration with for and while

Iteration on list i (or tuple t)

for i in range(len(l)):
    print(f"{l[i]} is the value at index {i}")

for v in l:
    print(f"{v} is the value at index {i}")
    # Warning: v is a copy: any modification of v will remain local

for i, v in enumerate(l):
    print(f"{v} is the value at index {i}")

Iteration on string s

for c in "Hello":
    print(f"The next letter in that string is {c}")

Iteration with a while loop

i = 0  # All variables in the condition must preexist (here, i and l)
while i < len(l) and l[i] == 0:
    i += 1 

Iteration on dict d

for k in d:    # by default, "in" operates on KEYS
    print(f"Value at key {k} is {d[k]}")

for i, k in enumerate(d):
    print(f"Value at key {k} is {d[k]} at position {i}")

for k, v in d.items():
    print(f"Value at key {k} is {v}")
    # Useful for nested dictionaries

List-comprehensions and dict-comprehensions

A comprehension is an inline notation to build a new sequence (list, dict, set).
Here is a list-comprehension:

l = [i*i for i in range(10)]  # Select i*i for each i in the original "range" sequence
# Returns [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

You may optionally filter the selected values with an if statement:

l = [i*i for i in range(100) if i*i % 10 == 0]  # Select values that are multiple of 10
# Returns [0, 100, 400, 900, 1600, 2500, 3600, 4900, 6400, 8100]

l = [(t, 2*t, 3*t) for t in range(5)] # Here we select tuples of integers:
# Returns [(0, 0, 0), (1, 2, 3), (2, 4, 6), (3, 6, 9), (4, 8, 12)]

Dict-comprehensions also work:

d = {x: x*x for x in range(10)}
# Returns {0: 0, 1: 1, 2: 4, 3: 9, 4: 16, 5: 25, 6: 36, 7: 49, 8: 64, 9: 81}

Reminder about function definition

def compute(a, b, c, d):
    number = a + b + c + d
    return number

the_sum = compute(1, 1, 1, 1)

No return statement is equivalent to return None

The star * is the flag that means 0 or n values. They are received in a list:

def compute(*args):
    sum, difference, product, quotient = 0, 0, 1, 1
    for value in args:   # args is a list
        sum += value
        difference -= value
        product *= value
        quotient /= value
    return sum, difference, product, quotient

sum, *other_results = compute(42, 50, 26, 10, 15)

A named parameter is passed to a function via its name instead of its position:

def sentence(apples=1, oranges=10):
   return f"He robbed {apples} apples and {oranges} oranges"

p = sentence(2, 5)
p = sentence()
p = sentence(oranges=2) 

The double star ** in the flag that means 0 or n named parameters. They are received as a dictionary:

def sentence(**kwargs):
    for item, quantity in kwargs.items():  # kwargs is a dict
        print(f"He robbed {quantity} {item}")

sentence(apples=2, oranges=5)
# He robbed 2 apples
# He robbed 5 oranges

One can return 2 values or more:

def compute(a, b):
   return a + b, a - b, a * b, a / b

Call to compute() returns a tuple:

results = compute(4, 6)
the_quotient = results[3]

This tuple can also be unpacked:

the_sum, the_difference, the_product, the_quotient = compute(4, 6)
the_sum, *other = compute(4, 6)      # Unpacking N elements into a list

The star usually means 0 or N element(s)

Type annotations (aka type hints)

Any Python variable can optionnally be associated to a type hint:

def compute(a: int, b: int) -> int:
    return a + b

Inconsistent types and values are NOT noticed by the interpreter.

Annotations are ONLY intended to an (optional) type checker, such as PyCharm.

my_value : int = compute(5, 5)   # OK: Type checking passes
s: bool = compute(5.0, 5)
# Linter warning: Expected "int", got "float" instead
# Linter warning: Expected "bool", got "int" instead

compute(5, 5).capitalize()
# Linter warning:  Unresolved attribute reference "capitalize" for "int"

To specify more complex annotations, import them from typing:

  • Any: every type
  • Union[X, Y, Z]: one among several types (e.g. int, float or str)
  • Callable[[X], Y]: function that takes X in input and returns Y
  • Optional[X]: either X or NoneType
  • ForwardRef(X): forward reference to X, used to circumvent circular imports
from typing import Union

def sum(a: Union[int, float], b: Union[int, float]) -> Union[int, float]:
    return a+b

sum(5.0, 5) # Now, this call is valid for the type checker

Data containers can also be fully typed, e.g. list[list[int]], dict[str: float]

Exceptions

An exception is an error.

The mechanism of exceptions allows to trigger, propagate and fix errors according to a specific mechanism.

An exception can be:

  • raised with the raise keyword: it is triggered
  • catched with the except keyword: it is catched and a fix is provided

When it is catched, a fix (workaround) is executed:

For instance, when value = a / b fails because b = 0, you might want to pursue the execution with value = 0.

Propagation of exceptions

At runtime, if an exception is raised but not catched, it is automatically propagated to the calling function.

If it is not catched there either, it goes up again ... and again ...

If it reaches the top of the interpreter without being catched, the interpreter exits.

Propagation is one of the main benefits of exceptions that allows to find the right balance between:

  • no error management
  • all functions calls individually tested for errors

Compared to a custom handling of errors, exceptions have the following benefits:

  • they propagate automatically: allowing to provide general workarounds for large parts of code
  • they are typed and all exception types are hierarchized

Common exception types

  • ValueError: value error (e.g. square root of a negative number)
  • TypeError: type error (e.g. adding an int with a str)
  • IndexError: access to an index exceeding the list size
  • KeyError: access to a dictionary key that does not exist
  • NameError: access to an undeclared variable name or function name
  • IOError: I/O error (e.g. corrupted data, unexpected end of file...)
  • FileNotFoundError: file not found
  • RuntimeError: error happening at runtime (parent of all of the above)
  • SyntaxError: bad syntax (indentation, unexpected keywords, ...)
  • KeyboardInterrupt: received SIGINT signal (Ctrl +C)

The try/except block

The basic syntax to catch an exception:

try:
    protected_code()  # Raises IOError
except IOError:
    subtitution_code()

What happens at runtime:

try:
    protected_code()  # Raises IOError
    skipped_code()
    skipped_code2()
except IOError:
    substituted_code()
    substituted_code2()
resumed_code()
resumed_code2()

Other uses of exceptions

try: # Different exception types associated to the same substitution block
    protected_code()
except IOError, FileNotFoundError:
    substitution_code()
try:         # Different substitution blocks for different exception types
    protected_code()
except IOError:
    substitution_code1()
except FileNotFoundError:
    substitution_code2()
if some_positive_value < 0:     # Trigger an exception by yourself
    raise ValueError("Negative values are not authorized")

Virtual environments (venv)

Context: All installed packages go into the site-packages directory of the interpreter.

The venv module provides support for creating lightweight “virtual environments” with their own site directories, optionally isolated from system site directories.

🐍 Learn more

For each new project you create/clone, create it its own dedicated virtual environment:

/usr/bin/python3.9 -m venv dev/PythonTraining/venv

Then, every time you work on this project, activate its environment first:

source PythonTraining/venv/bin/activate

Your terminal must prefix the prompt with the name of the env:

(venv) yoan@humancoders ~/dev/PythonTraining $

And quit the venv every time you stop working on the project:

(venv) yoan@humancoders ~/dev/PythonTraining $ deactivate
yoan@humancoders ~/dev/PythonTraining $ 

In an activated venv, every call to the interpreter and every package installation will target the isolated virtual environment:

(venv) yoan@humancoders ~/dev/PythonTraining $ python

will run the Python version targeted by the venv

(venv) yoan@humancoders ~/dev/PythonTraining $ pip install numpy

will install the latest numpy version into the venv

In practice, your IDE can handle venv creation, activation and deactivation automatically for you when you create or open/close a project.

🆕 PEP 668

You can no longer use pip to install packages outside a venv.

You can override this behaviour by passing --break-system-packages.

CHAPTER 2

OBJECT-ORIENTED PROGRAMMING (O.O.P.)

Python is multi-paradigm:

  • Imperative: instructions create state changes
  • Object-oriented: instructions are grouped with their data in objects/classes
  • Functional: instructions are math function evaluations

All 3 paradigms are popular in the Python community, and often mixed all together.

Here is a program to handle the sales of an apartment:

apartment_available = True
apartment_price = 90000

def sell():
   apartment_available = False

def reduce_price(percentage=5):
   apartment_price = apartment_price * (1-percentage/100)

Note: because of the scope of variables, global variables would be required here

In classic programming, these are variables...

apartment_available = True
apartment_price = 90000

... and these are functions:

def sell():
   apartment_available = False

def reduce_price(percentage=5):
   apartment_price = apartment_price * (1-percentage/100)

However, functions usually manipulate on data stored in variables. So functions are linked to variables.

In Object-Oriented Programming, variables and functions are grouped into a single entity named a class that behaves as a data type:

class Apartment:
    def initialize_variables():
        apartment_available = True
        apartment_price = 90000

    def sell():
        apartment_available = False

    def reduce_price(percentage=5):
        apartment_price = apartment_price * (1-percentage/100)

Note: this intermediary explanation is not yet a valid Python code snippet

Object-Oriented Programming introduced specific vocabulary:

Types are called classes:

class Apartment:   

Functions are called methods:

    def sell():    

Variables are called attributes:

        apartment_available = False

Since the declaration of a class defines a new type (here, Apartment), the program can declare several independant apartments:

apartment_dupont = Apartment()
apartment_muller = Apartment()

apartment_dupont.reduce_price(15)
apartment_muller.reduce_price(7)
apartment_dupont.sell()
apartment_muller.reduce_price(3)
apartment_muller.sell()
apartment_dupont = Apartment()

In this statement:

  • Apartment is a class
  • apartment_dupont is an object (an instance of a class)
  • Apartment() is the constructor (the method creating an object out of a class)
apartment_dupont.reduce_price(15)

This statement is a method call on object apartment_dupont.

Method calls can create side effects to the object (modifications of its attributes).

Like regular functions, methods can take parameters in input. Here, an integer, 15.

The self object

  • self is the name designating the instanciated object
  • self is implicitly passed as the first argument for each method call
  • self can be read as "this object"

In other languages like Java or C++, self is named this.

The constructor

The constructor is the specific method that instanciates an object out of a class. It is always named __init__.

class Test:
    def __init__(self):
        self.attribute = 42

Here is now a valid Python syntax for our class.

This is the class declaration:

class Apartment:
    def __init__(self):       # Implicit first parameter is self
        self.available = True       # We are creating an attribute in self
        self.price = 90000

    def sell(self):
        self.available = False

    def reduce_price(self, percentage=5):
        self.price = self.price * (1-percentage/100)

This is the class instanciation:

apart_haddock = Apartment()

The constructor, like any other method, can accept input parameters:

class Apartment:
    def __init__(self, price):
        self.available = True	
        self.price = price

apart_dupont = Apartment(120000)    # Now the price is compulsory
apart_haddock = Apartment(90000)

While attributes are accessed using the prefix self. from inside the class...

...they can be accessed from outside the class, using object name as the prefix:

print(f"This flat costs {apart_haddock.price}")
apart_haddock.available = False

However some attributes may have a protected or private scope:

class Foo:
    def __init__(self):
        self.public = 0
        self._protected = 0
        self.__private = 0        # ⚠ Name mangling applies here

Protected attributes are not enforced but private ones rely on name mangling:

class BankAccount:
    def __init__(self):
        self.__balance = 3000
         
class Client:
    def make_transaction(self, bank_account: "BankAccount"):
        bank_account.__balance += 1000
         
Client().make_transaction(BankAccount())
# AttributeError: 'BankAccount' object has no attribute '_Client__balance'

Inheritance

A furnished apartment is the same as an Apartment. But with additional furniture.

class FurnishedApartement(Apartment):   # The same as an Apartment...
   def __init__(self, price):
	   self.furnitures = ["bed", "sofa"]  # ...but with furniture	
	   super().__init__(price)


furnished_apart = FurnishedApartment(90000)
furnished_apart.available = False
furnished_apart.reduce_price(5)
furnished_apart.furnitures.append("table")

The super() function allows to call the same method in the parent class.

Note: Former Pythons require a longer syntax: super(CurrentClassName, self)

Magic methods (aka dunder methods)

  • apart1 + apart2 → Apartment.__add__(self, other) → Addition
  • apart1 * apart2 → Apartment.__mul__(self, other) → Multiplication
  • apart1 == apart2 → Apartment.__eq__(self, other) → Equality test
  • str(apart) → Apartment.__str__(self) → Readable string
  • repr(apart) → Apartment.__repr__(self) → Unique string

Magic methods reading or altering attributes:

  • getattr(apart, "price")Apartment.__getattr__(self, name)
  • setattr(apart, "price", 10)Apartment.__setattr__(self, name, val)
  • delattr(apart, "price")Apartment.__delattr__(self, name)

This is why Python's duck typing does not rely on runtime types.

In [1]: dir(int)

Out[1]: 
['__abs__', '__add__', '__and__', '__bool__', '__ceil__', '__class__', 
'__delattr__', '__dir__', '__divmod__', '__doc__', '__eq__', '__float__', 
 '__floor__', '__floordiv__',  '__format__', '__ge__', 
 '__getattribute__', '__getnewargs__',  '__gt__', '__hash__', '__index__', 
  '__init__', '__init_subclass__',   '__int__', '__invert__', '__le__', 
  '__lshift__', '__lt__', '__mod__',   '__mul__', '__ne__', '__neg__', 
  '__new__', '__or__', '__pos__',   '__pow__', '__radd__', '__rand__', 
  '__rdivmod__', '__reduce__',   '__reduce_ex__', '__repr__', 
  '__rfloordiv__', '__rlshift__',   '__rmod__', '__rmul__', '__ror__', 
  '__round__', '__rpow__',   '__rrshift__', '__rshift__', 
  '__rsub__', '__rtruediv__',   '__rxor__', '__setattr__', 
  '__sizeof__', '__str__', '__sub__',   '__subclasshook__', '__truediv__', 
   '__trunc__', '__xor__',    'as_integer_ratio', 'bit_length', 
   'conjugate', 'denominator',    'from_bytes', 'imag', 'numerator', 
   'real', 'to_bytes']

Summary of OOP terminology

  • A class is a type owning attributes and methods
  • An object is an instance of a class
  • Instanciating a class consists into building an object from this class
  • The constructor is the method initializing the object: __init__()
  • An attribute is a variable from a class (or from an object)
  • A method is a function from a class (or from an object)
  • A (child) class may inherit from another (parent)
  • A method from a child class may override the same name in the parent

CHAPTER 2

STRUCTURE AND DISTRIBUTE PYTHON CODE

Python Enhancement Proposals (PEPs)

The PEPs rule the Python language. In their lifetime the PEPs are proposed, debated, rejected/accepted and implemented.

They are usually not very user-friendly but allow to understand some implementation choices and the implementation of the interpreter.

PEP 0 (Python PEPs)

PEP 0 is the index of all PEPs, on peps.python.org

PEP 8 (style guide for Python langage)

PEP 8 is the style guide for Python code: Indentation, line length, blank lines...

PEP 257 (docstrings)

PEP 257 tells how to document your code with docstrings (triple quotes).

Docstrings can document a function, class, variable, whole file, package...

Example docstring on a function definition:

def compute(a: int, b: int) -> int:
    """
    Computes the sum of 2 floats or integers
    :param a: the first element to sum
    :param b: the second element to sum
    :return: the sum of a and b
    """
    return a + b

Docstring format is RST : ReStructuredText, but other formats coexist.

After typing """ Your IDE may autocomplete the docstring with a sketch.

Decorators

The role of a decorator is to alter the behaviour of the function that follows with no need to modify the implementation to the function itself.

It can be seen as adding "options" to a function, in the form of a wrapper code.

@decorator
def function():
    pass

In that case the call of function() will be equivalent to decorator(function()).

Decorators may take parameters in input.

🐍 Learn more

Example 1: @classmethod is a decorator that passes the class type cls passed as the first parameter to the following function.

class Animal:
    @classmethod
    def define(cls):
        return "An " + str(cls) + " is an organism in the biological kingdom Animalia."

Example 2: Web frameworks usually use decorators to associate a function e.g. get_bookings_list() to:

  • an endpoint e.g. /bookings/list
  • a HTTP method e.g. GET

Here is how Flask works:

app = Flask()   # We create a web app

@app.route("/bookings/list", method="GET")
def get_bookings_list():
    return "<ul><li>Booking A</li><li>Booking B</li></ul>"

To define your own decorator, you need to write a function returning a function:

from functools import wraps

def log_this(f):
    @wraps(f)
    def __wrapper_function(*args, **kwargs):
        print("Call with params", args, kwargs)
        return f(*args, **kwargs)
    return __wrapper_function
@log_this
def mean(a, b, round=False):
    m = (a + b)/2
    return int(m) if round else m
mean(5, 15, round=True) # shows: Call with params (5, 15) {'round': True}

The functools module is a set of decorators intended to alter functions.

Context manager: the with statement

The keyword with is a context manager that protects a resource to make sure it is actually teared down after allocation in any case.

f = open("file.json", "w")
f.write()
# PROCESSING WRITING [...] 
f.write()
f.close()

What if an exception occurs during the processing of f? It wouldn't be closed.

The context manager ensures that the resource is closed in any case:

with open("file.json", "w") as f:
    f.write()

The standard library is compatible with context managers for files, locks, and synchronisation primitives. But you may also create your own:

class ProtectedResource:
   def __enter__(self):
       print("The resource is being open")

   def __exit__(self, type, value, traceback):
       print("The resource is being closed, with or without exception")
resource = ProtectedResource()
with resource:
    raise ValueError("Let's see if it works")

# The resource is being open
# The resource is being closed, with or without exception
# Traceback (most recent call last):
#  File "<input>", line 3, in <module>
# ValueError: Let's see if it works

The 🐍 contextlib module provides other tools to manage contexts.

from contextlib import suppress

with suppress(FileNotFoundError):
    os.remove('somefile.tmp')

Managers can also be decorators:

@track_entry_and_exit('widget loader')
def activity():
    print('Some time consuming activity goes here')
    load_widget()

See full example with __enter__ and __exit__ in the official doc

Structure code as Python packages

Difference between modules and packages

A module is a Python file, e.g. some/folder/mymodule.py. The module name uses the dotted notation to mirror the file hierarchy: some.folder.mymodule

Either the module is made to be:

  • executed from a shell: it is a script: python some/folder/mymodule.py
  • imported from another module: import mymodule (need to be installed in sys.path, see p55)

A Python package is a folder containing modules and optional sub-packages: some is a package, folder is a sub-package.

Scripts : the shebangs

On UNIX OSes a shebang is a header of a Python script that tells the system shell which interpreter is to be called to execute this Python module.

Invoke the env command to fetch the suitable interpreter for python3 with:

#!/usr/bin/env python3

Direct call to the interpreter is possible but NOT recommended, since it will force the interpreter by ignoring any virtual environment you could be in:

#!/usr/local/bin/python3

ℹ️ The Windows shell ignores shebangs, but you should provide them anyway.

Regular structure of packages

  • Packages and sub-packages allow to bring a hierarchy to your code
  • The package's hierarchy is inherited from the files-and-folders hierarchy
  • Modules hold resources that can be imported later on, e.g.:
    • Constants
    • Classes
    • Functions...
  • All packages and sub-packages must contain an __init__.py file each
  • In general __init__.py is empty but may contain code to be executed at import time

Then the package or subpackages can be imported:

import my_math.trigo
my_math.trigo.sin.sinus(0)
import my_math.trigo.sin as my_sin
my_sin.sinus(0)

Specific resources can also be imported:

from my_math.matrix.complex.arithmetic import product
sixteen = product(4, 4)

Relative imports (Imports internal to a package)

Relative import from the same folder:

from .my_math import my_sqrt
value = my_sqrt(25)

Relative import from a parent folder:

from ..my_math import my_sqrt
value = my_sqrt(25)
  • Do not put any slash such as import ../my_math
  • Relative imports can fetch . (same dir), .. (parent), ... (parent of parent)
  • Relative imports are forbidden when run from a module outside a package
  • Using absolute imports instead of relatives could result in name collisions

Execution of packages (only) stores Python bytecode in __pycache__ folder

As a developer, you can ignore them, the interpreter handles compiling by itself.

This is the compiling behaviour of CPython, the most popular interpreter, in C.

Other interpreters, eg Pypy, implement more efficient compiling: JIT compiling.

The Python path

The interpreter seeks for absolute import statements in the Python path sys.path.

This is a regular Python list and it can be modified at runtime (with append) to add paths to your libs.

The Python Package Index (PyPI.org)

pypi.org is a global server that allows to find, install and share Python projects.

pypi.org is operated by the Python Packaging Authority (PyPA): a working group from the Python Software Foundation (PSF).

The command-line tool Package Installer for Python (pip) can be used to install packages by their name, e.g. bottle. It can install from various sources (Link to code repos, ZIP file, local server...) and seeks on PyPI if no source is given:

pip install git+https://gitlab.com/bottlepy/bottle
pip install https://gitlab.com/bottlepy/bottle/archive/refs/heads/master.zip
pip install path/to/my/python/package/folder/
pip install path/to/my/python/package/zip/file.zip
pip install numpy    # Will seek on PyPI
pip install numpy==1.21.5   # Force a specific version
pip uninstall numpy

Non-installable Python projects usually have a file requirements.txt at their root

# requirements.txt
redis==3.2.0
Flask
celery>=4.2.1
pytest

pip has the following options:

  • pip install -r requirements.txt to install all dependencies form the file
  • pip freeze > requirements.txt to create a file of frozen versions

💡 installable packages have no such file but specify dependencies elsewhere (e.g. in pyproject.toml for installable packages using setuptools).

PyPI Security warning 🚨

PyPI packages caught stealing credit card numbers & Discord tokens

Perform sanity checks before installing a package

  • Is the package still maintained and documented?
Last update: November, 2017
  • Does the developer consider bugs and improvements?
# of solved GitLab issues
  • Is the package developer reliable?
Moral entity or individual, which company, experience...
  • If not opensource, is the development of this package likely to continue?
# of opensource users, # of clients, company financial health if not opensource, ...

PyPI Typosquatting warning 🚨

pip install -r requirements.txt
# 🚨 pip install requirements.txt

pip install rabbitmq
# 🚨 pip install rabitmq

pip install matplotlib
# 🚨 pip install matploltib

Where to find/host Python documentation?

Doc of built-in packages

All builtin packages are documented in 8 languages on:
📖 docs.python.org

ℹ️ builtin = anything that comes with the interpreter itself

Doc of non-built-in packages

Non-builtin packages (installed via pip) have their doc on their own website.

ℹ️ non-builtin = what is installed on top of the interpreter (eg. with pip)

💡 Search that package on pypi.org to easily find its doc

Test Python packages

  • Packages pytest and unittest are frequently used to test Python apps
  • unittest relies on the regular test framework:
    • Setup: Prepare every prerequisite for the test
    • Call: call the tested function with input parameters setuped before
    • Assertion: an assert is a ground truth that must be true
    • Tear down: Cleanup everything that has been created for this test
  • pytest is a light test framework
  • On top of these, tox allows to run tests in multiple environments (e.g. Python versions)

Test files are sometimes placed in a tests/ directory, file names are prefixed with test_*.py and test function names are also prefixed with test_

pyproject.toml
mypkg/
    __init__.py
    app.py
    view.py
tests/
    test_app.py
    test_view.py
    ...

Naming tests according to these conventions will allow auto-discovery of tests by the test tool: it will go through all directories and subdirectories looking for tests to execute.

# water_tank.py
class WaterTank:
    def __init__(self):
        self.level = 10
    def pour_into(self, recipient_tank: "WaterTank", quantity: int):
        self.level -= quantity
        recipient_tank.level += quantity
# test_water_tank.py
from water_tank import WaterTank
def test_water_transfer():
    a = WaterTank()
    b = WaterTank()
    a.pour_into(b, 6)
    assert a.level == 4 and b.level == 16 

Then just type pytest and the test report will be printed in the terminal!

============ 1 test passed in 0.01s ============

Distribute Python packages

setuptools simplifies the package distribution. 🐍 Learn more

You need a pyproject.toml file that tells:

  • The package name and version number
  • The list of deps on other packages from PyPI, git repos, ...
  • The entry points (executables, commands, ...)
  • How to build the package (using hatching, setuptools...)

ℹ️ setuptools replaces distutils, deprecated in 3.10.

⚠️ setup.py is now discouraged in place of pyproject.toml

# pyproject.toml example file

[build-system]
requires = ["setuptools"]
build-backend = "setuptools.build_meta"

[project]
name = "my_package"
description = "My package description"
readme = "README.rst"
requires-python = ">=3.7"
license = {text = "BSD 3-Clause License"}
classifiers = [
    "Framework :: Django",
    "Programming Language :: Python :: 3",
]
dependencies = [
    "requests",
]

[project.scripts]
my-script = "my_package.module:function"

The TOML file then offers distribution tools:

  • Install the package in the current environement:
pip install .
  • Build distribution:
    • sdist : Source distribution
    • bdist_wheel : Binary distribution
pip install build  # The latest recommended build tool by PyPA
python3 -m build   # Will build both a sdist and bdist

-rw-rw-r-- 1 16699  nov.  12 00:00 hampy-1.4.2-py3-none-any.whl
-rw-rw-r-- 1 326913 nov.  12 00:00 hampy-1.4.2.tar.gz

🐍 Learn more about package distribution: PyPA docs

Remarks about binary distribution bdist_*

  • Binary format at platform-dependant (OS, arch, Python implementation, ABI)
  • .egg files are just zip files containing sdist or bdist, you can unzip them
  • Several binary formats exist: wheel, egg... Nowadays, wheel is preferred
  • wheel files are named this way: my_math-3.0.4-py3-none-any.whl where:
    • my_math is your package name
    • 3.0.4 is your package version
    • py3 is the Python implementation tag
    • none is the ABI tag (the C API for Python)
    • any is the platform (x86_64, arm, macos...)

🐍 Learn more about package distribution: Python docs

Uploading your package distribution on PyPI

Once sdist and/or bdist are available, several pipelines exist to share your project.

Nowadays, uploading to PyPI with twine is the preferred option:

  1. Create an account on PyPI or in the sandbox TestPyPI if you're just testing
  2. pip install twine
  3. twine upload dist/* --repository testpypi

Drop the --repository argument to upload to the regular PyPI.
Parameter --repository can also target your own mirror server. Learn more.

CHAPTER 3

MANAGE IOs OF PYTHON CODE

Logging

Python has a module dedicated to logging. It classes each log entry in levels of criticity: debug, info, warning, error and allows to filter them. 🐍 Learn more

logging.debug('Debug message')  # Lowest priority
logging.info('Info message')    # Higher priority
# Prefer the use of lazy evaluation
%timeit logging.info("%s", 42)
# 645 ns ± 43.1 ns per loop
%timeit logging.info(f"{42}")
# 787 ns ± 51.1 ns per loop
%timeit logging.info("{}".format(42))
# 876 ns ± 54.9 ns per loop

Step 1: Produce log entries

The logging library uses modular approach to organize logs in a big app. Usually every module has its own logger named as itself:

logger = logging.getLogger(__name__)
# foo/bar.py will be named "foo.bar"

When a message is posted to logger L:

  1. L decides whether to handle the event based on the level/filters
  2. Handlers of L get notified and react if their own level/filter match
  3. L's parent is notified, if appropriate

Step 2: Consume log entries

h = logging.StreamHandler()
h.setLevel("INFO")   # Accept all entries more critical than INFO

l = logging.getLogger("mymodule.submodule.subsubmodule")
l.setLevel("DEBUG")    # Accept all entries more critical than DEBUG

l.addHandler(h)

Both the logger & handler must accept the minimum level so the entry is printed

The simple config is a quick way to activate a stream handler for all loggers. But the output will be fussy since logs from all modules will be printed, imports too:

logging.basicConfig(stream=sys.stderr, level=logging.DEBUG)
logging.basicConfig(filename='app.log', level=logging.INFO)

Manipulate file/directory paths with pathlib

Found in legacy code running on Windows:

csv_path = "/home/alice/data/example.csv"               # DON'T do that
csv_path = "C:\\Users\\Alice\\Documents\\example.csv"   # DON'T do that
csv_path = "data_folder\\" + today + "\\example.csv"    # DON'T do that

⚠️ Problems:

  • Error-prone manipulation of the double-backslash
  • Not cross-platform: \ is only Windows-compatible
  • Does not offer path-related features: get filename, get extension, get parent...

The solution : The built-in pathlib library

from pathlib import Path
p = Path('example.csv')

Then you can manipulate that path:

new_path = "data" / date / p   # Join paths together (Slash is OUTSIDE the str)
parent = p.parent              # Get parent folder
other_parent = p.parent.parent # Get parent of parent
file_name = p.name             # Filename (example.csv)
file_name_no_ext = p.stem      # Filename without extension (example)
file_suffix = p.suffix         # File extension (.csv)
p.is_file()                    # returns True/False. Or p.is_dir() or p.exists()

User's personal data folder must be retrieved using pathlib to be cross-platform:

homedir = Path.home           # C:\Users\Alice\Documents or /home/alice/

⚠️ os.path is getting old, prefer the use of the equivalent pathlib instead

Use json and csv formats

Manipulate JSON data as strings (e.g. network payload data) or files:
🐍 json documentation

Manipulate CSV files (mostly for datascience):
🐍 csv documentation

with open(path) as f:
    data = json.load(f)

data = {"Alice": 12, "Bob": 15}
with open(path, "w") as f:
    json.dump(data, f)

Emit HTTP requests from Python with requests

requests allows to emit synchronous HTTP requests to servers.

This is NOT a built-in module, install it from PyPI.org with pip install requests and read the 🐍 requests documentation

import requests

response = requests.get('https://api.github.com')
print(response.status_code)  # HTTP status code
print(response.text)         # Response body (text)
print(response.json())       # Parse JSON response (if applicable)
data = {'key': 'value'}
response = requests.post('https://httpbin.org/post', data=data)
print(response.json())

Call external commands with subprocess

This is a built-in module, read the 🐍 subprocess documentation

Run a command

import subprocess

result = subprocess.run(['echo', 'Hello World'], capture_output=True, text=True)

# run() is blocking until the end of the process

print(result.stdout)  # Will print "Hello World"
  • capture_output=True captures the standard output and error to Python
  • text=True decodes the output bytes into a decoded string

Advanced usecases (e.g. long-running commands) use Popen instead of run()

from subprocess import Popen
import signal
import time

process = Popen(['ping', 'example.com'],
                stdout=subprocess.PIPE, # capture stdout
                stderr=subprocess.PIPE, # capture stderr
                text=True)

time.sleep(5)                           # Send SIGINT after 5 seconds
process.send_signal(signal.SIGINT)

stdout, stderr = process.communicate()  # Get the captured output
print(f"Returned stdout {stdout}")

ℹ️ if the external process is your own Python code, use multiprocessing.
⚠️ os.system is deprecated, use subprocess instead

Multithreading, multiprocessing

Definitions:

  • Multithreading: Split work into several threads within the same process and CPU.
  • Multiprocessing: Split work into several processes dispatched to several CPUs.
center

The bottleneck of Python Multithreading: the reference counter

The interpreter holds a counter counting how many references point to a literal.

In [1]: s = "Hello world!" 

In [2]: sys.getrefcount(s)
Out[2]: 2

In [3]: s2 = s             

In [4]: sys.getrefcount(s)
Out[4]: 3

In [5]: del s2             

In [6]: sys.getrefcount(s)
Out[6]: 2

If the counter reaches 0, the literal is destroyed. This is how Python frees memory.

The Python Global Interpreter Lock (GIL)

Several implementations of the Python interpreter exist:

  • CPython (By far the most popular)
  • Jython, IronPython, PyPy...

The GIL is a mutex that protects the reference counters of CPython objects.

However it prevents multiple threads from executing Python bytecodes at once. It offers poor performance for multi-threaded programs, if they are CPU-bound.

The GIL was a regular debate: so far too hard for low benefits

But hardware multiplied CPUs so Python 3.13 introduced --disable-gil

Because of the GIL, multiprocessing is way more efficient that multithreading.

Should I use multiprocessing or multithreading?

It depends of the nature of your application:

  • CPU-bound code: Because of the GIL, only multiprocessing will be able to use all available CPUs
  • IO-bound code: Both will work, but multithreading may be more handy. Also, asynchronous programming can be a solution since it's specialy tailored to optimize IOs.

Multithreading example

import time, threading
def second_thread():
    for i in range(10):
        print("Hello from the second thread!")
        time.sleep(1)

new_thread = threading.Thread(target=second_thread)
new_thread.start()

for i in range(10):
    print("Hello from the main thread!")
    time.sleep(1)

new_thread.join()
Hello from the second thread! # We can't tell in which order the prints will happen
Hello from the main thread!   # The OS schedules threads as fairly as possible
Hello from the main thread!   # ...according to the system load.
Hello from the second thread! # The GIL limits Python to 1 CPU 

Multiprocessing example

import time, multiprocessing
def second_process():
    for i in range(10):
        print("Hello from the second process!")
        time.sleep(1)

new_process = multiprocessing.Process(target=second_process)
new_process.start()

for i in range(10):
    print("Hello from the first process!")
    time.sleep(1)

new_process.join()
Hello from the first process!  # Pretty much the same as threads
Hello from the second process! # But mutliprocessing may use  all CPUs
Hello from the second process! # The OS schedules the processes on the CPUs
Hello from the first process!

Inter-Process Communication (IPC)

Python offers the following IPC primitives:

  • multiprocessing.Lock and threading.Lock (mutex, authorizes 1)
  • multiprocessing.Semaphore and threading.Semaphore (authorizes up to n)
  • multiprocessing.Pipe (1-to-1 FIFO)
  • multiprocessing.Queue (n-to-n FIFO)
  • multiprocessing.Event and threading.Event

Java's thread-safe collections offer some level of safety but do not solve all risks:

if (!synchronizedList.isEmpty()) {
    synchronizedList.remove(0); // NOT thread-safe despite the synchronizedList 
}               // This is Java code. Synchronized lists do not exist in Python

High-level multiprocessing API

multiprocessing has higher-level tools compatible with with:

  • Pool(n) : represents a Pool of n process
  • Manager : manages the pool, e.g. by sharing variables between processes:
  • manager.dict() for a shared dict, or manager.list() or manager.int()...
def f(shared_dict: dict):
    shared_dict[randint(0, 99)] = "running"

with multiprocessing.Manager() as manager:
    shared_dict = manager.dict()
    with multiprocessing.Pool(processes=4) as pool:
        results = pool.map(f, (shared_dict, shared_dict, shared_dict, shared_dict))
    print(shared_dict) # {26: 'running', 88: 'running', 60: 'running', 76: 'running'}

ℹ️ Unpack all args: pool.starmap(f, ((a, b), (c, d)) ➡️ f(a, b), f(c, d)
ℹ️ Apply same args: pool.apply(f, (a, b)) ➡️ f(a, b), f(a, b)

Charset and encoding

All text assets (Python string or .py file, .json file, .txt file…) are encoded using a charset, a correspondance table between Bytes ↔ Actual character

  • e.g. in utf-8: 0xC3A9 ↔ é
  • e.g. in latin-1: 0xE9 ↔ é

You MUST know the encoding of a text asset in order to read it.

If you do not, it can be guessed but there is a chance to make mistakes.

This is what happens if you do not specify explicit encodings and rely on default parameters of file reading libraries.

If the guess is wrong it may result in e.g. hétérogène instead of hétérogène.

Rule of thumb for encoding and decoding

  • I RECEIVE data coming IN the interpreter (from stdin, the network, a file...):

    • If it is a str: it has already been decoded by reading functions
      (Prey that they used the right charset 🙏)
    • If it is a bytes: decode-it with the charset declared by the source e.g.
      data.decode("utf-8") if the source sends UTF-8 strings
  • My Python code must operate only on Unicode (Python type str)

  • I SEND data OUT of the interpreter (to stdout, to the network, to a file...):

    • If it is a bytes: it has already been encoded by writing functions
      (Prey that they used the right charset 🙏)
    • If it is a str: encode-it with the charset declared by the recipient e.g.
      data.encode("utf-8") if the recipient expects UTF-8 strings

What is Unicode?

Unicode is NOT a charset, this is the global table of all world code points.
e.g. U+1F601: 😁

Encodings (ASCII, latin-1, UTF-8...) may be able to code Unicode in whole or in part.

ASCII and latin-1 can only code a subset of Unicode (resp 128 & 256 code points).

UTF-8, UTF-16 and UTF-32 can code all Unicode code points.

The difference between UTF-8, 16 and 32 is about how they code characters:

  • UTF-8 uses a variable number of bytes
  • UTF-16 uses a variable number of bytes
  • UTF-32 uses a fixed number of 4 bytes

In average, UTF-16 is more efficient for Asian texts compared to UTF-8. But UTF-8 is more widely recommanded as a global standard.

  • The Python interpreter holds:
    • decoded unicode strings in type str
    • encoded strings in type bytes
  • encode() and decode() methods on these objects can convert them between bytes and str
  • stdout and stderr in your terminal are outputs, they have a charset
  • stdin in your terminal is an input, it has a charset
  • Your .py file itself is an input, it has a charset