-
Type: Task
-
Resolution: Done
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: GAI
-
None
-
3
-
Iteration Minmi, Iteration Nodosaurus
-
Not Needed
We'd like to know when prompt changes or other regressions impact the accuracy of the generative ai results. To do this we'll run the accuracy tests that were recently added to Compass (scripts/ai-accuracy-tests.js) on a nightly basis. They should fail under a certain threshold.
Currently these tests might be synchronous, we might need to parallelize if its really slow.