Lexer

Lexer reads the input source code and produces and stream of tokens. Each token is a pair of token class and the substring itself (called lexeme).

The correspondence between a set of strings and a token class can be defined through regular expressions and implemented using finite automata. NFA (Non-deterministic Finite Automata) can be converted to DFA (Deterministic Finite Automata) and that can be used to generate an efficient table-driven implementation.

Tools like Flex can generate lexer code.

Links