../aoc24day3

Advent of Code Day 3: Overengineering

So I’m doing Advent of Code this year, and hopefully we can stick through all 25 days. Day 3’s problem was particularly interesting to me, and I did some silly (and hopefully somewhat clever) stuff that I think would be fun to write about.

Here we are.

I likely won’t redescribe the puzzles in full - for that, just head over to Advent of Code - but I will summarize the parts briefly.

Part 1: This is where the fun begins

So the main goal of this problem is to, within a single line/string, find matches of mul(num1, num2) and sum up all of the multiplication operations, giving you your answer. The sample text they provide for this is:

xmul(2,4)%&mul[3,7]!@^do_not_mul(5,5)+mul(32,64]then(mul(11,8)mul(8,5))

which will eventually output (2*4 + 5*5 + 11*8 + 8*5) = 161

Initially, I thought I could do it with Regex, and you certainly could, but that didn’t feel like a satisfying enough solution to me. One of my friends also challenged me to use just Rust’s standard library and nothing else, so that removed the regex crate from contention entirely. My next idea, which is what I ended up going with, was to basically handwrite a lexer and do some nifty sliding window magic.

Step 1: Defining Tokens

First thing’s first, we have to define the tokens we’re going to be using. With a little bit of inspiration from the fantastic Crafting Interpreters, I ended up with the following:

enum Token {
    LPAREN,
    RPAREN,
    COMMA,
    NUMBER(u32),
    MUL,
    OTHER, // throwaway
}

(To be clear, Crafting Interpreters also helped elsewhere)

Clearly, the only tokens that make up something like mul(num1, num2) are mul, (, ,, ), and numbers. Anything else is captured as OTHER and thrown away (handled, admittedly, in quite a disgraceful way. I wonder what my future employers would think).

Step 2: Lexing the damn thing

Warning: there’s gonna be a shitload of code below. I will explain it in a moment

First, we set up the file reading and the iterator over our characters:

    let line = read_to_string(file).unwrap();
    let mut iter = line.chars().peekable();
    let mut tokens: Vec<Token> = Vec::new();

We call peekable() on our iterator because the Peekable struct provides us the peek method, which allows us to look at the next character without consuming it. This is particularly advantageous when we want to look ahead just once or twice in a controlled manner.

Now, here’s where the magic happens!

while let Some(ch) = iter.next() {
    match ch {
        '(' => tokens.push(Token::LPAREN),
        ')' => tokens.push(Token::RPAREN),
        ',' => tokens.push(Token::COMMA),
        '1'..='9' => {
            let n: u32 = once(ch)
                .chain(from_fn(|| iter.by_ref()
                .next_if(|s| s.is_ascii_digit())))
                .collect::<String>()
                .parse()
                .unwrap();

            tokens.push(Token::NUMBER(n));
        }
        'm' => {
            let rest = next_chunk(&mut iter, 2);
            if rest == vec!['u', 'l'] {
                tokens.push(Token::MUL);
            }
        }
        _ => tokens.push(Token::OTHER),
    }
}

This is a lot, so I’ll break it down for the relevant chunks. We’ve also removed the part 2 code from here as well (and it will return soon). Naturally, we start by iterating until there are no more characters to yield. In each match arm, we handle a specific character case.

The parentheses and the comma are pretty self-explanatory - just add the corresponding token variant to the vector - but the number and mul keyword handling are slightly more involved.

'1'..='9' => {
    let n: u32 = once(ch)
        .chain(from_fn(|| iter.by_ref()
        .next_if(|s| s.is_ascii_digit())))
        .collect::<String>()
        .parse()
        .unwrap();

    tokens.push(Token::NUMBER(n));
}

This seems like a bit of word vomit, but is nicer with a few choice words explaining it, I hope. It’s also largely lifted from this post by Bruno Calza, so thank you Bruno. Here, we essentially chain together iter::once(ch) (i.e., a single repetition of the first/matched character, to make sure it’s included in the resulting String), followed by all of the following characters that are digits, using Peekable’s next_if function, which takes in a function that returns a boolean and allows us to continue based on a condition. I won’t go into much more depth, but this is an interesting construction that I did not entirely come up with.

Finally, there’s the mul keyword match. The idea is simple, if we see an m, let’s check the two following characters and see if they are u and l! I don’t think this is a safe assumption to make all the time, but this design choice was selected in order to optimize developer implementation efficiency (it was 2 AM).

'm' => {
    let rest = next_chunk(&mut iter, 2);
    if rest == vec!['u', 'l'] {
        tokens.push(Token::MUL);
    }
}

The Rust standard library has a next_chunk function in nightly, and when I was originally implementing this, I used many a iter.peek(), is_some(), next_if_eq(), and other nested gems. However, after Ved correctly pointed out that nothing was stopping me from writing my own wannabe replacement, I got to work. It’s quite simple (with a more complicated extension), and I’ll talk about them more towards the end.

Step 3: Whatever shall we do with this token stream?? (parse it [kinda])

The last major piece of the puzzle is as follows; now that we have our very cool token stream, how the hell do we make it useful? The fun way I thought of doing this was by using a sliding window over the tokens; since we know the structure of the mul operation quite well, we can just pass a window of the necessary size and check if it contains the necessary elements! For each slice, if there’s a match, we add the product of the values contained within, otherwise just add 0.

let sum:u32 = tokens[..].windows(6).map(|slice| {
    if let [
            Token::MUL,
            Token::LPAREN,
            Token::NUMBER(a),
            Token::COMMA,
            Token::NUMBER(b),
            Token::RPAREN
        ] = slice { 
        a * b
    } else {
        0
    }
}).sum();
println!("{sum}");

This was quite fun to solve, and for all its flaws and overengineering, it was an enjoyable experience.

Part 2: To Multiply or Not to Multiply

(it’s a toss-up, who knows)

(we know, see below)

The second problem centers around do() and don't() within a code fragment:

xmul(2,4)&mul[3,7]!^don't()_mul(5,5)+mul(32,64](mul(11,8)undo()?mul(8,5))

When a don't() is encountered, until a do() is encountered after, all multiplications in between are invalidated. In order to support this, we make 3 major additions:

  1. Add DO and DONT as variants to the Token enum
  2. Add lexing support for DO and DONT
  3. Find out a way to remove the correct muls from consideration before using our sliding window technique

(1) is trivial, so let’s look at (2) just a bit closer.

'd' => {
    let next = iter.by_ref().next_if_eq(&'o');
    if next.is_some() {
        iter.peek();
        let next = iter.by_ref().next_if_eq(&'n');
        if next.is_some() {
            let rest = next_chunk(&mut iter, 4);
            if rest == vec!['\'', 't', '(', ')'] {
                tokens.push(Token::DONT);
            }
        } else {
            let rest = next_chunk(&mut iter, 2);
            if rest == vec!['(', ')'] {
                tokens.push(Token::DO);
            }
        }
    }
}

We add this to the match statement from before, and it follows the same principle as mul. If we see a d, then we check if we have an o. After that, we branch based on whether or not the next value is n or not, and Bob’s your uncle or something. Before we explain next_chunk (finally), let’s look at (3), which was also cool.

// Goal: Remove all entries "between" a Token::DONT and a Token::DO
let mut delete = false;
tokens.retain(|token| {
    match token {
        Token::DONT => {
            delete = true;
            false // don't want to retain
        },
        // Check if we have encountered a DONT before
        Token::DO if delete => {
            delete = false;
            false // don't want to retain
        }
        _ => !delete, // keep elts if not in delete mode
    }
});

I hadn’t really used retain before, but it’s conceptually quite simple; we pass in a function on an iterator, and it will retain all elements that return true. The way this relates to our construction is as follows: if we hit a DONT, we want to begin deleting, and we also want to delete DONT so we return false. If we hit DO, we want to set delete to false, but we want to clear away the DO as well. For any other elements, keep them if not in delete mode. This action allows us to maintain the invariant of not having DO or DONT at sliding window matching time (see part 1 step 3).

I found this quite elegant, and the whole thing works as well.

next_chunk :D

Now it’s finally time to showcase my darling boy; next_chunk:

fn next_chunk<I>(iter: &mut Peekable<I>, num: usize) -> Vec<I::Item>
    where I: Iterator
{
    let mut chunk = Vec::with_capacity(num);
    for _ in 0..num {
        if let Some(item) = iter.next() {
            chunk.push(item);
        } else {
            break
        }
    }
    chunk
}

Pretty self-explanatory, but the real kicker is when we try to use const generics and return something with type [I::Item; N]:

fn next_chunk<I, const N:usize>(iter: &mut Peekable<I>) -> Option<[I::Item; N]>
    where I: Iterator, <I>::Item: Copy,
{
    // SAFETY: We are initializing an N elt length array, using MaybeUninit
    // for now
    let mut chunk = unsafe { [MaybeUninit::uninit().assume_init();N] };
    for i in 0..N {
        if let Some(item) = iter.next() {
            chunk[i] = MaybeUninit::new(item);
        } else {
            return None
        }
    }
    // SAFETY: Convert uninit array to regular with transmute,
    // since they are the same size
    let chunk = unsafe {
      std::mem::transmute_copy::<_, [I::Item;N]>(&chunk) 
    };
    Some(chunk)
}

This bad boy right here is a bit crazier, although after digesting it, it becomes much more manageable.

  1. We use const generic const N: usize for our sizing, and return an array
  2. Because we can’t initialize an empty array, we use std::mem::MaybeUninit
  3. We populate said array
  4. We can now “safely” convert the array of MaybeUninits to an array of I::Items instead, with std::mem::transmute (see this article for a great discussion on transmute)
  5. We can rest easy knowing we can now call the function like
next_chunk::<_, some_num>(&mut iter)

instead of

next_chunk(&mut iter, 2)

(nice work buddy, the array comparison is more space efficient than the vector equivalence check, at least :D)

That’s all, folks!

I hope you guys enjoyed my first real blog and this assault on the English language. Thanks for reading!

/aoc2024/ /rust/