What Python can learn from Haskell Bob Ippolito ( @etrepum )

EuroPython Berlin

21 July 2014 bob.ippoli.to/python-haskell-ep2014

Who am I? Python user since 2001 (simplejson, Mac stuff)

Founded Mochi Media (2005-2012, sold in 2010)

Haskell user since 2012 (ported exercism.io curriculum)

Doing a bit of advising/investing in startups

Currently focusing on code education non-profit Mission Bit

Python is not all bad! I love Python's community

Many great ideas and libraries come from here

Works wonderfully for many users

Workarounds exist for most issues (PyPy, Numba, Cython, …)

Haskell is not all good! Smaller community

Non-strict evaluation

Sometimes documentation is an academic paper

Different terminology (Monoid, Functor, Monad, …)

Ignorance is bliss If you are perfectly happy with Python, and don't want it to change even a little bit, you might want to leave now :)

I was a lot happier with Python until I started learning Erlang and then Haskell

… but I'm a better programmer because of it (Blub Paradox?)

Nasty little paradox: the better something works, the less likely it'll be improved.

— Richard O'Keefe

Python makes […] hard writing code that works

maintaining code

creating good abstractions

running code efficiently

Correctness You will make mistakes in any language

Python defers all type checking to runtime

Static analysis tools for Python are primitive

Writing code to test for common mistakes is a lot of work

Python nonsense The Python language guarantees that this should raise TypeError """This module is nonsense!""" def main(): """What is this, JavaScript?""" print ( 1 + "1" ) if __name__ == '__main__' : main()

Let's see what Python thinks $ python -mcompileall nonsense.py Compiling nonsense.py ... $ echo $? 0

pyflakes to the rescue? $ pyflakes nonsense.py $ echo $? 0

What about Pylint? $ pylint --enable=all nonsense.py […] Global evaluation ----------------- Your code has been rated at 10.00/10

Erlang nonsense -module ( nonsense ). -export ([ start / 0 ]). start() -> io:format( "~p" , [ 1 + "1" ]).

erlc nonsense $ erlc -Werror nonsense.erl compile : warnings being treated as errors nonsense.erl :5: this expression will fail with a 'badarith' exception $ echo $? 1 Note that Erlang is a dynamic language!

Erlang dialyzer nonsense $ dialyzer nonsense.erl nonsense.erl :4: Function start/0 has no local return nonsense.erl :5: The call erlang: '+' (1,[49,...]) will never return since it differs in the 2nd argument from the success typing arguments: ( number() , number() ) done in 0m0.52s done (warnings were emitted )

Haskell nonsense main = print ( 1 + "1" )

Haskell nonsense main :: IO () main = print ( 1 + "1" )

ghc nonsense $ ghc nonsense.hs [ 1 of 1] Compiling Main ( nonsense.hs, nonsense.o ) nonsense.hs :1:17: No instance for (Num [Char]) arising from a use of ‘+’ In the first argument of ‘print’, namely ‘(1 + "1" )’ In the expression: print (1 + "1" ) In an equation for ‘main’: main = print (1 + "1" )

No instance for… WTF? No instance for ( Num [ Char ]) arising from a use of '+' By default, strings in Haskell are lists of Char.

Operations such as + , - , * , negate belong to the Num typeclass (handwavingly similar to a Python ABC)

, , , belong to the Num typeclass (handwavingly similar to a Python ABC) Only truly numeric types such as Int, Double, Integer, Rational have built-in instances for the Num typeclass

You could (but absolutely should not!) implement such an instance to make this code compile. Don't do that. Seriously.

Why is refactoring hard? Very hard to refactor without good tests

Types are the obvious tests and documentation

Refactoring in Haskell is easier due to types and referential transparency

Can we do something like this in Python? Yes.

"""Breadth-first traversal""" def breadth_first(starting_node, edges): """Yield all nodes reachable from starting_node""" visited = [] queue = [starting_node] while queue: node = queue[ 0 ]; del queue[: 1 ] # queue.pop(0) if node not in visited: yield node visited.append(node) queue.extend(edges[node])

"""Breadth-first traversal""" from typing import List, Dict, Iterator def breadth_first(starting_node: int , edges: Dict[ int , List[ int ]]) -> Iterator[ int ]: """Yield all nodes reachable from starting_node""" visited = List[ int ]() queue = [starting_node] while queue: node = queue[ 0 ]; del queue[: 1 ] # queue.pop(0) if node not in visited: yield node visited.append(node) queue.extend(edges[node])

"""Breadth-first traversal""" from typing import List, Dict, Set, Iterator from collections import deque def breadth_first(starting_node: int , edges: Dict[ int , List[ int ]]) -> Iterator[ int ]: """Yield all nodes reachable from starting_node""" visited = Set[ int ]() queue = deque([starting_node]) while queue: node = queue[ 0 ]; del queue[: 1 ] # queue.pop(0) if node not in visited: yield node visited.append(node) queue.extend(edges[node])

breadth_first2.py : In function "breadth_first" : breadth_first2.py , line 11: deque [int] has no attribute "__delitem__" breadth_first2.py , line 14: Set [int] has no attribute "append"

"""Breadth-first traversal""" from typing import Iterator, Dict, List, Set from collections import deque def breadth_first(starting_node: int , edges: Dict[ int , List[ int ]]) -> Iterator[ int ]: """Yield all nodes reachable from starting_node""" visited = Set[ int ]() queue = deque([starting_node]) while queue: node = queue.popleft() if node not in visited: yield node visited.add(node) queue.extend(edges[node])

Solution: mypy? mypy is the most sensible approach I've seen so far

The author needs some help! There are many tasks to do that do not require deep knowledge of mypy or type systems

mypy does not yet catch the 1 + "1" error yet, but can catch many other mistakes such as the refactoring example

error yet, but can catch many other mistakes such as the refactoring example Why not X (PyPy, Nuitka, …): these projects do nothing to assist with correctness and code quality!

Modest mypy proposal Start using typing module even in vanilla Python 3

module even in vanilla Python 3 Support typing annotations in documentation tools

annotations in documentation tools Stop using PEP 3107 function annotations for any other purpose

Start using mypy to typecheck code when running tests / continuous integration

Start contributing to mypy! :)

So what? Code quality tools for Python are far behind other languages

mypy is a huge step forward and can be used with existing Python 3 syntax and interpreter

Would have to give up using function annotations for other purposes

Mutability Mutability is the wrong default

Very common source of bugs

Prevents many optimizations (sharing, lock-free, etc.)

Should be opt-in

Why is mutability wrong? Hard to understand code when underlying data might change

Defensive programming is difficult

Sharing is prevented by copying

Concurrent access requires synchronization

Can we even fix this? Requires large changes to the language and libraries :(

See Rust or Swift for good examples

Haskell goes all the way (perhaps too far)

Maybe opt-in immutability (Swift's let ) is more practical for Python?

Expensive Abstractions Nothing is free. Function calls, classes, etc. do not typically get optimized away

Classes aren't easy to analyze for correctness, as they are always open. Subclasses ruin everything.

Algebraic Data Types Python has grown many hacks such as Enum, namedtuple, etc. that solve small parts of this problem

Haskell (and other ML family languages) have an elegant solution to this

class AST( object ): def eval ( self ): raise NotImplementedError class Constant(AST): def __init__ ( self , value): self .value = value def eval ( self ): return self .value class Minus(AST): def __init__ ( self , node): self .node = node def eval ( self ): return self .node. eval ()

class Add(AST): def __init__ ( self , left, right): self .left = left self .right = right def eval ( self ): return self .left. eval () + self .right. eval () class Multiply(AST): def __init__ ( self , left, right): self .left = left self .right = right def eval ( self ): return self .left. eval () * self .right. eval ()

ADT in Haskell module NumAST ( AST ( .. ), eval ) where data AST = Constant Int | Add AST AST | Minus AST | Multiply AST AST deriving ( Show , Eq ) eval :: AST -> Int eval node = case node of Constant n -> n Add a b -> eval a + eval b Minus n -> negate (eval n) Multiply a b -> eval a * eval b

class Add(AST): def __init__ ( self , left, right): self .left = left self .right = right def eval ( self ): return self .left. eval () + self .right. eval () def __eq__ ( self , other): return ( isinstance (other, Add) and self .left == other.left and self .right == other.right) def __repr__ ( self ): return 'Add( %r , %r )' % ( self .left, self .right)

Wrapping can be free Haskell can define a wrapper for an existing data type at no runtime space cost ( newtype )

) Strict fields can be inlined without boxing (no object overhead)

Functions can be inlined (even cross-module)

No reason for 1 + True == 2

Performance? Everything I've talked about provide more ahead-of-time information

This information could be used by PyPy, Numba, Cython, etc.

These features save developer time by making it easier to write correct code

More information means more opportunities for optimization, so can lead to better performance

Writing less code in C/C++ will allow the language to improve much more quickly (thank you PyPy!)

Concurrency? The Python C API and current semantics make removing the GIL a non-starter

Fixing other deficiencies in the language will make the necessary refactorings easier, and in Python

Mixed approach with workers written in a Python subset might be a nice transitional step (PyParallel)

Summary Incorporate all of the good ideas from mypy into Python

Add some more of the conveniences from modern languages

Enjoy a safer, faster, more capable Python