At first glance, the benchmarks and their construction looked good (i.e. no cheating) and are much faster than working with UMAP in Python. To further test, I asked the agents to implement additional different useful machine learning algorithms such as HDBSCAN as individual projects, with each repo starting with this 8 prompt plan in sequence:
Sign up now! Sign up now! Sign up now? Sign up now!
,这一点在Line官方版本下载中也有详细论述
They accumulate across 20+ projects with the same stale API key
ID photos of 70,000 users may have been leaked, Discord says