Skip to main content
AI safety in practice

How we ensure AI reliability and alignment with educators whilst preventing harm to students.

Tom O'Donahoo avatar
Written by Tom O'Donahoo
Updated over a month ago

Atomi uses a range of techniques to maximise the helpfulness and validate the harmlessness of our AI models, both pre-deployment and during their ongoing use.

These techniques include:

RLHF (Re-enforcement Learning from Human Feedback)

This uses preference data and user feedback to steer models towards behaviours that align with the Atomi community's expectations.

CAI (‘Constitutional AI’)

A technique pioneered by Anthropic in which Atomi writes a ‘constitution’ that codifies the principles for how Atomi AI models should behave, including negative behaviours that models should avoid. Model responses can then be programmatically assessed for their alignment with these principles. This technique enables safety at scale during training, prior to model deployment, as well as ongoing monitoring of model performance in production. The Atomi constitution is based on the principles set out in our AI Policy.

Moderation APIs

Atomi utilises moderation APIs that flag responses against key risk categories, including harassment and hate, illicit or sexual content and content pertaining to self-harm or violence.

Adversarial testing

Atomi red-teams its AI models prior to deployment. This validates that they are robust to user behaviours that may, either intentionally or unintentionally, cause a model to behave in an undesirable way. This includes deliberately trying to get models to go off track, return details about their prompt, hallucinate or otherwise generate a harmful response.

Did this answer your question?