$\begingroup$

The best book to answer your question would probably be: Cooper and Torczon, "Engineering a Compiler," 2003. If you have access to a university library you should be able to borrow a copy.

In a production compiler like llvm or gcc the designers make every effort to keep all the algorithms below $O(n^2)$ where $n$ is the size of the input. For some of the analysis for the "optimization" phases this means that you need to use heuristics rather than producing truly optimal code.

The lexer is a finite state machine, so $O(n)$ in the size of the input (in characters) and produces a stream of $O(n)$ tokens that is passed to the parser.

For many compilers for many languages the parser is LALR(1) and thus processes the token stream in time $O(n)$ in the number of input tokens. During parsing you typically have to keep track of a symbol table, but, for many languages, that can be handled with a stack of hash tables ("dictionaries"). Each dictionary access is $O(1)$, but you may occasionally have to walk the stack to look up a symbol. The depth of the stack is $O(s)$ where $s$ is the nesting depth of the scopes. (So in C-like languages, how many layers of curly braces you are inside.)

Then the parse tree is typically "flattened" into a control flow graph. The nodes of the control flow graph might be 3-address instructions (similar to a RISC assembly language), and the size of the control flow graph will typically be linear in the size of the parse tree.

Then a series of redundancy elimination steps are typically applied (common subexpression elimination, loop invariant code motion, constant propagation, ...). (This is often called "optimization" although there is rarely anything optimal about the result, the real goal is to improve the code as much as is possible within the time and space constraints we have placed on the compiler.) Each redundancy elimination step will typically require proofs of some facts about the control flow graph. These proofs are typically done using data flow analysis. Most data-flow analyses are designed so that they will converge in $O(d)$ passes over the flow graph where $d$ is (roughly speaking) the loop nesting depth and a pass over the flow graph takes time $O(n)$ where $n$ is the number of 3-address instructions.

For more sophisticated optimizations you might want to do more sophisticated analyses. At this point you start running into tradeoffs. You want your analysis algorithms to take much less than $O(n^2)$ time in the size of the whole-program's flow graph, but this means you need to do without information (and program improving transformations) that might be expensive to prove. A classic example of this is alias analysis, where for some pair of memory writes you would like to prove that the two writes can never target the same memory location. (You might want to do an alias analysis to see if you could move one instruction above the other.) But to get accurate information about aliases you might need to analyze every possible control path through the program, which is exponential in the number of branches in the program (and thus exponential in the number of nodes in the control flow graph.)

Next you get into register allocation. Register allocation can be phrased as a graph-coloring problem, and coloring a graph with a minimal number of colors is known to be NP-Hard. So most compilers use some kind of greedy heuristic combined with register spilling with the goal of reducing the number of register spills as best as possible within reasonable time bounds.

Finally you get into code generation. Code generation is typically done a maximal basic-block at a time where a basic block is a set of linearly connected control flow graph nodes with a single entry and single exit. This can be reformulated as a graph covering problem where the graph you are trying to cover is the dependence graph of the set of 3-address instructions in the basic block, and you are trying to cover with a set of graphs that represent the available machine instructions. This problem is exponential in the size of the largest basic block (which could, in principle, be the same order as the size of the entire program), so this is again typically done with heuristics where only a small subset of the possible coverings are examined.