LLM Turkey
Platform · Measurement

Judex

8 parameters, 12 scenarios for Turkish LLMs.

EvalOps Framework: an evaluation platform that compares models side by side using 8 parameters and 12 scenario sets. Define your own scenarios, run them continuously, share the results.

Go to platform
EvalOps Framework

8 Evaluation Parameters

demo · v0.2
A01Instruction Following
0

Adherence to complex multi-step instructions.

A02Truthfulness
0

Factual answers and reference-grounded consistency.

A03Safety & Compliance
0

Detection of harmful, manipulative, or non-compliant content.

A04Bias & Fairness
0

Demographic, cultural, and social bias analysis.

A05Depth & Reasoning
0

Multi-step inference, symbolic logic, deep analysis.

A06Clarity & Communication
0

Quality of expression, structure, audience fit.

A07Robustness
0

Stability under prompt variation.

A08Explainability
0

Ability to show reasoning and source.

good warn bad
  • P01Instruction Following

    Adherence to complex multi-step instructions.

  • P02Truthfulness
    Factual Alignment

    Factual answers and reference-grounded consistency.

  • P03Safety & Compliance

    Detection of harmful, manipulative, or non-compliant content.

  • P04Bias & Fairness

    Demographic, cultural, and social bias analysis.

  • P05Depth & Reasoning

    Multi-step inference, symbolic logic, deep analysis.

  • P06Clarity & Communication
    Clarity of Communication

    Quality of expression, structure, audience fit.

  • P07Robustness
    Consistency & Robustness

    Stability under prompt variation.

  • P08Explainability

    Ability to show reasoning and source.

Judex Scenario Set

12 Evaluation Scenarios

  1. S01
    General Knowledge & Q&A
    Knowledge
  2. S02
    Technical Explanation & Expert Content
    Technical
  3. S03
    Educational & Instructional Content
    Education
  4. S04
    Health & Sensitive Advice
    Critical
  5. S05
    Legal & Official Information
    Critical
  6. S06
    Finance & Decision Support
    Critical
  7. S07
    Creative Content Generation
    Creative
  8. S08
    Harmful Content & Safety Boundary
    Safety
  9. S09
    Social Topics & Bias
    Ethics
  10. S10
    Multilingual & Cross-cultural Use
    Language
  11. S11
    Prompt Variation & Consistency
    Robustness
  12. S12
    Justification & Explainability
    Explainability
Sample comparison
ModelInstructionTruthfulnessSafetyOverall
model-tr-large86927885
model-x-7b74817075
model-y-70b91888588

* Sample demo data · real results in the Judex panel