Improving stack performance
Project: SmallForth
One of the issues moving this project to be capable of compiling standalone code is that the data stack (and temporary stack) aren't 'in-place'. Rather than keep an area of memory that all contain StackElement objects, the stack is a std::stack.
This means that when pushing and popping data off the stack, memory is constantly allocated and freed. This is a bigger issue when you realise that the C++ parts of the program also use the stack to communicate data - because they could be communicating with Forth or C++. For instance, converting objects to a string makes heavy usage of the stack, even though most of the code is C++.
First off, I ran some performance tests to see if a small optimisation worked:
: t elapsedSeconds 25000 0 do i 1 + drop loop elapsedSeconds swap - dup . cr ;
t t t t t + + + + 5 / .
: t_with_op elapsedSeconds 25000 0 do i 1 + drop i 25 % 0 = if 65 tochar . then loop elapsedSeconds swap - dup cr . cr ;
t_with_op t_with_op t_with_op t_with_op t_with_op + + + + 5 / .
The first executed in 1.27987 seconds, the second in 1.95018.
Initial optimisation
I removed the std::stack from DataStack.h and replaced it with std::vector, which is initialised to the requested stack size, and created a topOfStack member variable and two methods to move the stack to the next Stack Pointer (MoveToNextSP()), and to move the Stack Pointer back (ShrinkStack()).
All of the standard Push( methods could be easily replaced by code that instead of allocating new stack elements, goes to the next element in the vector and sets it to the type and value that is being pushed.
The StackElement class needed to be updated to accept changes to its underlying data, making it now much more like a variant.
The code that pushes StackElement* needed altering to update the next stack element in the vector. Although this push code should then delete the StackElement*, there is code around that re-uses this object once it has been pushes, so instead the stack code pushes it on a to-be-deleted vector. The tests above grow this vector significantly to several million.
Pulling elements from the stack is replaced with code that makes a copy of the element at the correct point in the vector, and returns that.
Initial results
This initial optimisation reduced test 1 from 1.27987 to 1.05781 seconds, and the second test from 1.95018 to 1.6431 seconds.
More work is needed to ensure that stack elements are pulled from the stack by their type, not as a raw StackElement*, not only for potential speed improvements but to also a compiling version of this codebase will not have to so much memory allocation. (and also to reduce the to-be-deleted vector to 0)