Title: On Rust Date: 14-09-2020 Category: Programming

Rust impressions from Python

My first attempt to Rewrite It In Rust - a model of the Tennesse Eastmann chemical plant - ended two weeks later, with me giving up and modernising the original Fortran to something from this centry, with a sour taste in my mouth.

I suppose what I really want to see, following this and the enormous hype train Rust has had, is critique of a language like Python from someone who actually understands why it took off and now runs half the world.

First, what’s good about Rust?

Function notation

fn foo(x: bar, y: baz) -> egg {} is pretty nice notation for a function signature. I know

it’s a function
its input, and types
its output type

in that order. Brilliant. I find C-style “return type first” notation a real papercut, especially once you start stuffing volatiles and finals in there… by the time I’ve found the function name I’ve forgotten what it was I was looking for. Shame Rust doesn’t go all-in and make async functions return a Future<> as well. Plus, it’s a little annoying that I have a mut x variable but an x: &mut as a mutable reference.

Explicit mutability

Very good. Mutable reference arguments are one of very few times in Python I end up shooting myself in the foot. Passing-by-reference and the various functional tools available work so well I usually forget that’s what I’m (kind of) doing - until the few corner cases where a child function does mutate its argument. Oh dear. I’d love python to have a similar mut keyword.

It does take a bit to remember the syntax for values vs references, however. A mutable reference is foo: &mut T, a mutable value is a mut foo: T. It… does make sense, but it’s still annoying.

Type notation

I have genuinely seen the following, as an in-person interview question for a C programming role:

what size is an int? What size is a double, what size is a long?

(The answer is, of course, it depends!)

Really? In Rust, you know. It’s a u8 or i32 or f64. Similar to the function definition syntax, Rust immediately makes other C-family languages feel very old indeed with this simple change.

(Except, you usually don’t I guess. More on that later)

Traits

I like them. I shouldn’t care if the object I have is a duck. I should care whether or not it implements Quack. Nice. Though, I have to say I dislike the two ways to declare of:

fn foo<T: Bar>(x: T) {} fn foo<T>(x: T) where T: Bar {}

One, and only one, obvious way to do it, please. Given Rust already has a steep learning curve, the absolute last thing I ever want to do is work out whether two different things are the same thing.

That said, they can really lead to write-only code. Here’s a challenge: tell me, without using a debugger, what code is executed when you run random.random() in rust. The crate is 3 MB. src/ is upwards of 300 kB. 15385 sloc last time I checked, where I got stuck in a circular definition.

Expression-oriented programming

As a concept, it’s lovely, and Rust uses it. Mathematics teaches us to think in a functional way, i.e:

result = calculation(input)

So paradigms that mirror that will be understandable, and idiomatic Rust usually does. Pattern matching in particular lets me write fundamentally messy problems (i.e. lots of corner cases) in as neat a way as possible, and error handling is a lot more intuitive. Exceptions have always bugged me, and Rust shows why. You don’t need or benefit from those obfuscated gotos, you need a better type system.

Also, as a result, semicolons are actually semantically useful. #SorryNotSorry, but with most C-style languages there is simply nothing they give the compiler that proper whitespace - that is necessary for humans to read - does not. If you do have code that compiles with semicolons that would not pass a whitespaced-based compiler - it’s horrible code. You’re not a special case.

Variable Shadowing & Type Inference

Why do dev communities profess such love for statically-typed when the most popular languages are not? Python, Ruby, javascript, Excel.

I find it utterly hilarious that these two features, combined, essentially allow the exact same behaviour I’d have in Python. Let’s look at these enormous semantic differences between a statically- and dynamically- typed language:

Python:

def make_an_x():
    x = step_1()    #idk, parsing or something
    x = step_2(x)   #frobnicate the parsed input
    return x

Rust:

fn make_an_x() -> X {
    let x = step_1();
    let x = step_2(x); //huh.
    x
}

Python still wins at speed-of-development, but it’s a lot closer than it used to be, and both free me from the utter inanity of writing interminable intermediate variables for what is obviously the same thing, at different processing stages.

N/B A lot of people seem to misunderstand this aspect of Python. Everything’s a pointer, and objects themselves are actually pretty static. So:

def foo(bar: dict):
    bar = str(bar)

Might “coerce” the local reference of bar to a string, but not the original! That’s in addition to implicit type coercion plain not happening, unlike javascript. See

Useful crates

Finally, whilst I didn’t want to include third-party stuff, StructOpt really is wonderful as a way of parsing invocation options. It just solves that problem. Use it, you won’t regret it.

Anyhow is similar, though it’s mostly solving a problem Rust sort of creates in the first place. No one wants error handling to be complicated.

Other good stuff: Rayon?

The once bad, now good (or at least ok)

Reading from stdin

Reading a string from stdin is, maybe, the second or third thing a programmer does when kicking a new language? Well, here’s how it used to be in Rust.

use std::io::{self, BufRead};

fn main() {
    let mut line = String::new();
    let stdin = io::stdin();
    let result = stdin.lock().read_line(&mut line).unwrap(); 
}

The result is the number of bytes read, because I just don’t fucking know, especially when line.len() already contains that! If the read fails, it could return an Error, or it could return zero. So two of the best Rust idioms - functional style and returning explicit Error types instead of sentinel values - are broken hard.

No, it’s not awful. But Java’s public static void main(String[] args) isn’t awful, and at least that is consistent with everything else Java. It’s still bad, and it occurs, again, right at the start.

Stuff that requires nightly

Nightly has a builtin functional utility now, std::io::read_to_string.

Integer log, especially log2.

Copy semantics

On the note of “breaking idioms right at the start”, Copy semantics, which mean you do this:

    let x = 1;
    let y = x;
    x

With, and only with, primitive types. You know, the things you play around with when learning a new language. It’d be nice to have an explicit copy/clone operator instead.

No sane way to extend builtin types

How fucking hard can it be to print a Vector?

Modules

Modules also used to be horrible. With a python folder layout like this:

foo.py <- contains foo_func
bar.py

Module import looks like this:

from foo import foo_func

One file, one module. Nice.

Pre-1.4 Rust, by contrast… I don’t even know, because modules are sometimes files and sometimes not. I do know you’ll need lashings of super::, because a file on the same level as yours is addressed with super::. super:: is actively encouraged, because namespace collisions are fun. This is so much better as of 1.47 or so.

Not sure

null return

Speaking of python footguns, just about the only other one I fire semi-regularly is mistaking a functional method with an imperative run, particularly with, says, pandas. For example:

    x = some_df() #yay I have a df
    df.do_a_thing() #strip out the bad things
    return df #oh no

df still has the bad things because it’s actually functional, but python will quite happily let you throw the result - the only useful thing - away. So does Rust! On one hand, I’m glad I don’t need to write, say, let _ = match ... every time I want to mutate things with a match (quite often, match is lovely). Maybe a warning would be nice.

The bad

Missing features

Varags - macros are a bandaid, and an ugly one at that, forcing you to learn a different syntax - and default args. I can’t express how much I like this design pattern in Python:

def foo(bar = "baz"):

I can’t think of a better way to communicate: this is a suggested value. The function works, and works well, with this input in most scenarios. But you’re free to change it. Maybe you do want a network timeout or 5000 ms instead of 50. Great! Try it. Results your problem.

It’s a shame we don’t have similar in Rust. I can’t think of a reason fn foo(bar = "baz") wouldn’t work.

Regex with backrefs

Function polymorphism. Similar to default args, and with variable shadowing. Sometimes I don’t want traits, I want to treat different inputs differently. For example, Advent of Code’s “intcode” interpreter - I want three functions with two different input types:

    fn run(inputs: Vec<i32>) //run to halt
    fn step(inputs: (i32,i32)) //some problems pass in inputs in pars
    fn step(input: i32) //and some don't...

It’s a shame that’s not possibly, and I’m back to writing step_tuple(), step_single(), etc.

It would also be really nice to have build aliases. I don’t really want to type “build –target=“arm7muslabiehf” or whatever whenever I want to build for a raspberrypi. I want to build “–target=‘pi’” or even have it as default.

Floating-point ranges

I can sort of understand, and it’s common to many languages to handle floating-point a little weirdly (Error types and IEEE 754 just don’t play). It’s still a shame that sorting floats doesn’t work. Nor does that wonderful match statement with ranges. What I’d give for a language that does

    match (i) {
        < 1.0 => foo(),
        > 1.0, < 10.0 => bar(),
        > 10.0 => baz()

Whilst we’re on it, matrices. A Vec<Vec> is not the same thing; you don’t know that if row i has j elements, so does row i+1. Yes, there are third-party packages. They suck, or at least the documentation does, and I know to multiply by at least twelve the time estimate for any problem that requires them. Guess what images are?

Maybe sometimes I do want exceptions

Hence the existence of Anyhow.

Who actually needs memory safety without a GC?

Not web dev, nor application dev. Not… most stuff?

That’s important, because the cognitive overhead of keeping track of memory is large. If a rival team can do it in Go, or Java or C# or python, they’ll do so ten times quicker. I can write something in python, have only a decent guess at how much RAM it’s using, and focus on other things. Like how to solve real problems, like epidemiology. The same holds broadly true for ANY GC language. Rust necessarily sacrifices that. That isn’t bad, but it is a real cost, and I need a reason to pay it.

One of the best suggestions I heard for Rust is “add a mode for quick prototyping where memory isn’t reclaimed”. That really would be nice. It also says a lot about the fundamental thesis of the language.

Side note, I’ve easily managed to induce runtime panics on Rust with out-of-bounds memory access on similar problems. So, yay for memory safety.

Static allocation and arrays are awful

Just about every language I’ve used at least has good support for simple, static, stack-allocated arrays. Safe C is usually based on no malloc(). Try doing so in Rust, where the size of the array is > 32.

struct Foo {
    foo : usize
}

fn make_foo_array() {
    //this was my best guess...
    let foo_array : [Foo; N] = [0..N].iter().map(|i| Foo::new(i)).collect(); //????
}

(Side note: thank god I don’t have to deal with those messy for loops!) (Side note: variant approaches to this suggested to me used up to EIGHT different syntatic constructs. Two involved a third-party package. For static allocation of arrays.)

This may error, or not, depending on whether N > 32. That has got to be one of the hardest violations of the principle of least astonishment I’ve seen.

I actually quite like the idea that a [T; N] is a different type from a [T; N+1]. They DO sometimes have different properties, and frankly developers sometimes seem blind to that possibility. That a functional-inspired languages does so badly at this is dissapointing.

2D arrays are also surprisingly poor. Here’s how to initialise a 2D array with random values in python, using idiomatic third-party libraries (these matter):

import numpy as np
np.random.rand(4096, 4096)

Here it is with Rust:

use nalgebra::{DMatrix};
let dm = DMatrix:new_random(4096, 4096);

Except, no it isn’t. That’s the official doc example, and it fails type inference. But you can’t pass it a type, either. ???

Needless to say, Fortran and python+numpy are just so much nicer to use it’s unreal.

Imports

WTF is a crate? Yes, I know what it is. NOW. When learning a new language - learning new, unfamiliar terms for common concepts is not cute, it’s annoying.

Cargo, Binary Size, and Updates

Let’s say - madness, I know - that I don’t have a gigabit internet connection, terabyte SSD reserved for coding, and a spare hour. What if I want to test a piece of Rust syntax?

Well, you’re screwed. You’re dependent on Cargo and third party crates for glue, and Cargo will insist on downloading 300 MB of fresh libraries. My projects folder contains about 50% data science projects, and the rest is twenty-line Rust projects at hundreds of MB apiece. Yes, you can obsessively run cargo clean. No, you shouldn’t have to.

Finally, binary size (yes, it’s been done before). As a reference, I wrote solutions to project euler problem 1 (“print the sum of all integers between 1 and 1000 that are divisible by 3 or 5) in C, Python and Rust. The C program weighed in at 20K bytes. The Rust version? 2M. 2 MEGAbytes. I’ve written a < 2 kloc module - that imports that rand module again - and the binary size was FIFTY megabytes. Nothing fancy, just some vectors. The equivalent Fortran code was 60 kilobytes.

Oh, and what’s that OTHER thing developers constantly berate users with? Always, always have the latest version, even when the update is complete harebrain lunacy that forces you to spend hours relearning awful interfaces? Well, for programmers, because we’re special, let’s implement entire features to ignore that practice.

I’m sorry, but having seen legions of smart non-programmers have to deal with this “because updates good”, this sort of behaviour just angers me. Every problem you have with an updated lib, non-programmers have ten times as bad, with half as many tools.

Summary

You know Lean engineering? well, if not, the principle is to check how much of your business’s effort goes into a) solving the core problem, b) unavoidable non-core problems, c) waste.

I don’t want to spend one second - not one brain cycle - on c). You know what c), is every time you use Java.

Yes, Rust isn’t Python. It’s explicitly for problems why I can’t just declare “ah, fuck it, I have 8 GBs of RAM. A couple of unnecessary allocations and waiting for a GC won’t matter.”. But there are still things that waste my time:

scrolling through lines of irrelevant closing braces.
worse, reading badly-formatted code that wouldn’t pass a Python interpreter. If it can… thankyou for proving my point! Your braces are irrelevant, because your code was formatted properly.
six different types of syntax.
arguably, unfortunately, memory management and compile-time type safety.

I don’t mean that dynamically-typed is better - if you want correctness guarantees, it isn’t. But Chesterton’s fence comes up here - if you dismiss a feature of widely used, successful tools without understanding it, you’re not smart. You’re a fool. (For what it’s worth, I’m a real fan of duck typing - I should not have to care what type a variable is if it has the property I require)

Python has clear use cases - namely, hacking. In both senses of the word. I’ve written scripts that I ran once, I’m 99 per cent certain it will break the very next time, but that’s fine. It’s core purpose is complete. It has a nice clean syntax, where if you can’t read the code, it doesn’t run. Remember this?