It is not clear from the question whether you are after theoretical or practical algorithms.
For theoretical results, Le Gall and Urrutia (2018), mentioned in Shannon Starr's comment, is probably the current record. (From quick browsing of the many works that cite them, I did not spot any works that would seem to improve this aspect.) Their Table 3 claims exponent $2.044183$ when the inner dimension is $n^{0.5}$.
For practical results, there exist implementations of Strassen-like algorithms for square matrix multiplication. (Use the search word "practical".) You could start from the answers to the MO question How fast can we really multiply matrices?, where they in particular cite Huang's 2018 dissertation Practical fast matrix multiplication algorithms.
As a very crude example: If you have a practical Strassen-like implementation that does $n\times n$ square multiplication in $O(n^{2.8074})$ time, then you can split your $\langle n,\sqrt{n},n\rangle$ task into $\sqrt{n} \times \sqrt{n} = n$ square subtasks of size $\sqrt{n} \times \sqrt{n}$, each solved in $O(n^{1.4037})$ time by the practical Strassen. Your total time is then $O(n^{2.4037})$, which fits your requirement of beating $n^{2.5}$, but falls far short of the theoretical results.
Instead of square subtasks, you could play with rectangular subtasks. Benson and Ballard (2015) have performed a practical study A Framework for Practical Parallel Fast Matrix Multiplication. Despite the name, they also study serial implementations. I did not find the $\sqrt{n}$ inner dimension case mentioned, but they do a constant inner dimension case $\langle n, 1600, n\rangle$ with $2000 \le n \le 12000$, see their Figure 6. They experimented with algorithms that use base cases like $\langle 4,2,4\rangle$ and others, and outperformed MKL (Intel's Math Kernel Library). Note that even if your method is asymptotically superior, it is not at all trivial to outperform fine-tuned linear algebra libraries that use the classical method. One of their conclusions is:
[F]or multiplying rectangular matrices (which occurs more frequently
than square in practice), there is a rich space for improvements. In
particular, fast algorithms with base cases that match the shape of
the matrices tend to have the highest performance.