A long time ago I wrote programs for the RCA 1802 (I'm not quite that old, but there were rad-hard versions of it which were interesting to us).
The 1802 was primitive. In particular it didn't have support for subroutine call and return. Instead, if you wanted to have subroutines, you'd decide which register was the stack pointer and then
to call a routine
- stick the next PC address at the address pointed to by the register you'd decided was the stack pointer;
- increment that register;
- go to the address of the subroutine.
To return you then just
- decrement the stack pointer;
- load the address one beyond it into the PC.
(I may have the details wrong, it has been a long time.)
Everyone working on the code for the thing had to agree on which register was the SP (and indeed which register was the PC) and where the stack was and so on.
The system we were working on had very, very little memory (at some point people from the US turned up wanting to buy our thing and said that all software had to be in Ada or FORTRAN. We laughed). And some clever person had worked out the following trick:
If your subroutine was going to call another routine and then just return you could avoid the whole ritual described above: you could just go to the address of the subroutine you wanted to call. This saved all of instructions, stack space and time.
We didn't know this, but we'd just reinvented tail call elimination.
Well, of course, you teach a compiler to do this: when it realises that its about to compile a call to a function which is the very last thing that happens in the function it is compiling, it can instead just compile a jump.
In a language like Common Lisp there are caveats around this. For instance consider compiling something like
(defun foo (x)
(let ((*v* x))
(declare (special *v*))
(bar (1+ x))))
The call to bar may or may not be a tail call, because it may or may not be the last thing that foo does, depending on how special variables are implemented. Similar things apply, for instance, to
(defun foo (x)
(handler-case
(bar x)
...))
Quite likely this call to bar is not a tail call because the condition handlers need to be undone on its return.
Finally quite a lot of Lisp compilers have traditionally only dealt with tail call elimination when the call is directly recursive, because the compiler can then know much more about how the function being called expects to work. Scheme, on the other hand, requires that all tail calls be eliminated.
letcreates variable bindings. Variable bindings may exhibit stack-like behavior, but AFAIK there is no requirement that a stack is used for this, or thatletshould implement its own stack. But the linked article is talking about the call stack in assembly language. It seems that you are really asking about how tail call optimization is implemented at the assembly level for lisps. That's pretty broad, but I am adding a couple of tags which might attract someone who can answer.