Research Questions

Q0: Best evaluation rubrics (human evaluation)? How scalable is this (LLM-judge?)?

Q1: Can automated situation awareness note replace portion of work of human analyst and how effectively?

Q2: Open-source vs. proprietary LLM solutions: which one is best and what’s the trade off?

Q3: Which prompt solution would provide best results?

Q4: Can we build a system that can be flexibly adjusted to a similar task? [Data-to-Text task]

Evaluation criteria:

Contribution: