Some random fun things you can do with DFAs/NFAs:

1. From two machines you can build a single machine that corresponds to running them in lockstep/parallel. Most language classes are closed under unions but regular languages stand out by being closed under intersections, and that relies on this construction. You can't replicate this construction with pushdown automata to show that context-free languages are closed under intersection because the product automata would require two stacks. But in fact the intersection of a context-free language with a regular language is context-free via this construction.

2. You can run them backwards. Usually works best with NFAs since reversing a DFA can seriously inflate the state space. E.g. if you're trying to match a regular expression of the form ' string ' you can search for 'string' and then match re1 backward and re2 forward. Running machines backwards can also be used for computing capture groups after a match has been found since the fastest methods for matching don't let you compute capture groups on the fly. Decoupling the re1 and re2 machines also means you end up with two independent machines with smaller state spaces, which means smaller, cache-friendlier lookup tables. See the next item for a less conventional example of what you can do with small state spaces.

3. While DFA execution seems inherently serial, it can actually be parallelized very effectively if the state space is small enough, through speculative execution. Suppose you want to lex a string in parallel by chopping it in half. You don't know what state the DFA will be in when it reaches the halfway point, but you can process the second half for all possible states at that point. Then when both halves are finished you throw away all but one of the speculative answers for the second half based on the final state for the first half. This is extremely fast and practical with PSHUFB/TBL-like instructions if your state space is small enough, at most 16 states for PSHUFB. You can mux together multiple PSHUFBs to go beyond that limit but the cost increases substantially. (This is a parallel prefix algorithm when applied recursively. As far as I know the application to parallel lexing goes back to the Hillis-Steele paper on data-parallel algorithms from 1986.)

4. There's a critical optimization for the last algorithm which is sometimes applicable. A log file processor might have numerous internal states while processing a line, so you'd rather not speculate on all of them. Instead you can chop the file in half and from the second half seek forward to the next newline and start there. This avoids speculation entirely if there's a unique state following a newline character regardless of what state you were in before the newline.

Thinking about point 1 makes me wonder why no popular regex engines have an intersection (logical AND) operator. It seems like it could allow writing complex regexes in a much clearer way.

Most popular regex engines use backtracking and support non-regular features. They aren't based on automata, so implementing AND is perhaps not as theoretically straight-forward in that context.

There are some popular finite automata based engines though. RE2 comes to mind. In theory, things like AND and complement could be implemented, but in practice, they blow up the size of the state machine.[1, 2] The current maintainer of RE2 did build a tool that permits intersection and complement though.[3]

[1] - https://dl.acm.org/doi/10.1145/2071368.2071372

[2] - https://github.com/google/re2/issues/156#issuecomment-354068...

[3] - https://github.com/google/redgrep