How to Evaluate Your LLM Quality in n8n - Automated Upwork Proposal Demo
When building AI or agentic systems, one of the most important things to do to ensure reliability, quality, and accuracy is evaluating your LLM responses. In this video we demo a complete workflow that automatically finds Upwork job posts, classifies whether each job is a good fit, extracts required skills, and generates tailored proposals; using an n8n orchestration layer combined with LLMs and small custom microservices. The goal is to produce proposal drafts we can confidently send with minimal editing, so we built a diverse test set of real Upwork postings, labeled ideal human proposals, and pipeline steps that mirror real-world decision points: qualify/reject, skill extraction, proposal generation, and notification. You’ll see how the system runs the test set end-to-end and stores every artifact for inspection.
You will also see how we evaluate and score the outputs at scale: an LLM-as-judge compares GPT proposals to human-written references and returns qualitative explanations (same, missing info, extra info, or completely different), while simple similarity heuristics and manual sampling let us convert those judgments into numeric scores for classification and generation tasks. This practical pattern lets you measure quality, detect hallucinations, and iterate quickly on prompts and model choices so your automation gets progressively better.
Where else to find us:
https://www.linkedin.com/in/amirfzpr/
https://aisc.substack.com/
/ @ai-science
https://lu.ma/aisc-llm-school
https://maven.com/aggregate-intellect/
#UpworkAutomation #ProposalGenerator #n8n #LLMevaluation #AIAgents #RAG #PromptEngineering #AIWorkflow #LLMops #GenerativeAI #AIProductivity #AutomatedProposals #ModelEvaluation #GPT4o #AIinFreelancing #upwork