Harry Coppock

10 Downing Street Fellow · Research Scientist · Visiting Lecturer

HEA Fellow, PhD, PGCert, MSc, MEng

CV harrygcoppock [at] gmail [dot] com

Hey, welcome to my website 🌱

I am an AI researcher passionate about ensuring AI benefits humanity. To achieve this I work on a broad range of challenges including AI evaluation, development of robust AI capabilities, and reform within Government.

Currently, I am a Research Scientist at the UK AI Security Institute (AISI), where I was the first technical hire. I am also a 10 Downing Street Fellow where I advise on and lead AI projects addressing Prime Minister priority objectives. Along with this, I am a Visiting Lecturer at Imperial College London where for one semester a year I teach the Deep Learning course.

Previously, I completed a PhD in AI at Imperial College London and spent time on secondment at The Alan Turing Institute. During my doctorate, I founded Maat, an AI due diligence consultancy that assessed the AI capabilities of companies involved in M&A and VC transactions, working with high-profile clients including Pfizer. Prior to this, I completed an MSc in AI at Imperial College London, following an undergraduate degree in Materials Science and Engineering.

In my spare time, I like to be in nature, working on my partner’s farm in Somerset, playing sport with friends and travelling to places that feel different to the UK.

UK AI Security Institute Research Scientist

Since joining AISI as its first technical hire in November 2023, I’ve enjoyed building AISI’s AI evaluation capabilities. In this role, I’ve built out AISI’s autonomous and cyber evaluations, primarily involving capture-the-flag-style agentic evals. I’ve also been directly involved in 20+ frontier model pre-deployment evaluation cycles, working closely with labs such as Anthropic, OpenAI, and Google DeepMind to evaluate harmful capabilities of their models before public release (see our 2025 Frontier AI Trends Report for a redacted overview of these evaluations). To enable this line of work, I have also been involved with open-source evaluation software such as Inspect AI, Hibayes, and Inspect Scout. I’ve been delighted to see AISI become a world-leading lab, attracting great talent — now 170+ technical individuals — and demonstrating new ways of operating effectively within government.

A significant amount of my time has also been spent improving the science of AI evaluation, ensuring our measures are good indicators of latent capability. A subset of this work which is now public covers a rigorous framework for agentic benchmarks, also using generalised linear models to investigate LLM judge biases, and more recently on best practices for scalable oversight or transcription analysis. Internally, I have worked on inference time scaling laws, real world human uplift trials, automatic jailbreak methods and measuring the robustness properties.

Sandbox escape evaluation architecture and scenarios — Left: the sandbox-within-a-sandbox architecture, which allows us to safely evaluate container escape capabilities. Right: the different vulnerability scenarios we inject, along with difficulty ratings from 1 to 5.

A recent paper I enjoyed working on released one of our agentic evals: Quantifying Frontier LM Capabilities for Container Sandbox Escape. SandboxEscapeBench is a benchmark that measures LLM capacity to breach container sandboxes across a range of vulnerabilities — from misconfigurations and privilege errors to kernel flaws and runtime weaknesses.

The AISI team

10 Downing Street Innovation Fellow

In this Deputy Director role, I work closely with ministers, advising on and leading AI-related opportunities and projects. Alongside AISI, I contributed to the early work of the Incubator for Artificial Intelligence, which focused on building technical capability within government and reducing reliance on external procurement to deliver better outcomes for the UK through tech. I have had the privilege of serving across two governments, gaining significant political experience, including through the transition of power from a Conservative to a Labour government following the general election.

Harry at 10 Downing Street Working with the Prime Minister Cabinet Office Demonstrating AI capabilities

Imperial College London Visiting Lecturer

Teaching is something I enjoy, and I’m fortunate to have taught the Deep Learning course in the Department of Computing at Imperial College London for the past three years. I cover topics including efficient deep learning, the science of AI evaluation, scaling laws, generative models, and transformer architectures. You can find recordings of some of my lectures (Lecture 1, Lecture 2) and a set of lecture notes. To support my pedagogical journey, I have completed a PGCert in Higher Education, am a Fellow of the Higher Education Academy (HEA), and am actively involved in faculty working groups to ensure the department’s teaching methods adapt appropriately to AI advancements.

AI for Health

Ensuring the public has access to the best possible healthcare through the NHS is a key objective of mine. I actively work towards this goal by pursuing advancements in AI for healthcare, rigorously validating their clinical utility, and working to position the NHS to maximise its chances of adopting these technological advances for public benefit.

A example contribution of mine was a paper in Nature Machine Intelligence demonstrating that a widely accepted technology — with hundreds of millions of pounds invested — purporting to diagnose COVID status from respiratory audio was in fact driven by confounders and provided no additional clinical utility over simple symptom checkers.

I have also led a study in the NHS investigating barriers to using frontier LLMs for medication safety reviews (demo of use case). Medication errors account for half of all avoidable medication-related harm, making this an important area of work. However, we found poor translation from synthetic and medical-style QA benchmarks to real-world clinical workflows, as illustrated in the figure below. Next, I hope to work on progressing post-training methods, addressing the spiky capability landscape of current AI, ensuring better and more robust real-world outcomes for AI.

Currently, I am campaigning within the NHS for it to strengthen its technical capability — both to develop its own AI technologies and to enable it to procure external solutions in a way that avoids vendor lock-in and ensures only the highest quality products are deployed for patient care.

LLM performance on medication safety reviews — Taxonomy and examples of system failures. Classification of the 178 failure instances into five failure reasons and their corresponding failure modes. Representative vignettes for each failure type showing the clinical scenario, system output, and clinician assessment.