BB

BIG-bench

Evaluation·infrastructure·open·#615 of 884·+49·Rising

66.0

Low

High confidence

Beyond the Imitation Game collaborative benchmark for measuring and extrapolating the capabilities of language models

Pillar Breakdown

Adoption

35%

73.3

Maintenance

30%

57.9

Friction

20%

94.0

Ecosystem

15%

44.4

Momentum

0.53Rising

7d change -0.24

High confidence

In Evaluation

Ranked #31 of 57

87.3

86.0

85.2

84.8

82.9

79.6

79.6

78.2

See all 57 in Evaluation →

Similar Tools

WhyLabs

Evaluation

66.3

benchmark

Evaluation

65.7

ClawWork

Evaluation

66.5

AutoRAG

Evaluation

66.6