I ran benchmarks on this repo and found some revealing factors, I posted the findings in detail here: https://github.com/milla-jovovich/mempalace/issues/125
I felt something sus about the repo and I was right.
I ran benchmarks on this repo and found some revealing factors, I posted the findings in detail here: https://github.com/milla-jovovich/mempalace/issues/125
I felt something sus about the repo and I was right.
Yea my openclaw picked upon that too, most likely thanks to your findings.
I find it really hard to choose the right benchmark between what they used and Locomo.
I red your issue -- very deep evaluation, thanks for pointing me on this!
Will update the post