Turkish gap
Most global benchmarks are English-only. Turkish hallucination, fairness and reasoning performance is not systematically measured.
LLMTurkey Network is Turkey's AI evaluation community. Join us.
Thousands of organizations are putting AI into production — but no independent source measures which model is accurate, safe and consistent in Turkish.
Most global benchmarks are English-only. Turkish hallucination, fairness and reasoning performance is not systematically measured.
Scores published by model providers are not independent. Enterprises want to trust a third party, not the manufacturer.
Once a model is in production, most organizations have no infrastructure to continuously measure how it behaves in Turkish.
Over the next three years, we aim to be the independent measurement reference for Turkish AI — establishing a shared evaluation language for academia, industry and government.
Continuously updated Turkish-first scoreboards, free from vendor, project or political bias.
Continuous evaluation operations enterprises can integrate into production to measure their own models.
A research network producing open reports on Turkish LLM safety, fairness and robustness.
Members appear by name under these outputs — we are remembered for measurement, not manifestos.
An open leaderboard refreshed every quarter, measured across 9 parameters and 12 scenarios.
Specialized evaluation reports for banking, public sector, health and education.
Open methodology and template kits enabling enterprises to set up an internal evaluation pipeline.
Turkish-language evaluation scenarios extended by the community and published on GitHub.
A curated network of academics, researchers and industry leaders shaping the AI evaluation culture in Turkey.
The Network isn't symbolic membership — we're looking for real contribution on open projects. If you fit one of the roles below, your application is prioritized.
To extend the Turkish data sets behind Bias & Fairness and Truthfulness scenarios.
To build the continuous benchmark pipeline and own API integrations.
To audit sectoral scenarios for real-world fidelity.
To run events, open calls and partner outreach.
Anyone wanting to grow in AI evaluation, benchmark methodology and EvalOps.
Researchers contributing to benchmarks, AI safety, ethics and model evaluation.
Practitioners using AI in their workflows or specializing in this field.
Executives shaping reliable AI transformation in their organizations.
Universities, companies, technology ventures and communities.
12-week EvalOps Specialist program + applied work on live projects.
Contribute to evaluation studies published on Judex; results carry your name.
Field experience on evaluation projects with banks, public sector and tech companies.
Direct collaboration with researchers working on Turkish LLM safety and fairness.
Network-only referral channel into partner organizations' job and consulting listings.
Monthly closed sessions, access to founding members, intros to partner organizations.
We tackle the most current topics in AI evaluation in small, focused groups.
Long-term collaborations with academia, industry and the community form the backbone of LLMTurkey Network.
Joint benchmark studies, publications and curriculum collaboration with universities, labs and research institutes.
Integration and evaluation partnerships with model providers, infrastructure companies and AI ventures.
Custom benchmark and EvalOps programs for organizations measuring their AI transformation.
Co-created content, events and visibility with communities, associations and meetups.