Filter and sanitize all incoming prompts.
Secure system instructions and context.
Define and enforce AI behavior limits.
Validate responses before delivery.
Real-time threat identification.
Incident handling and adaptation.
Language Layer Defense Framework—Operational mapping of techniques with plain language definitions and defenses.
AI models can be deceived by carefully crafted prompts designed to manipulate their instructions, distort their reasoning, or slip harmful content past safety mechanisms. These attacks don’t involve conventional hacking; instead, they exploit how language models process text, context, memory, and examples. The techniques below outline common strategies for bypassing safeguards, extracting sensitive data, or provoking unsafe behavior. Recognizing these patterns enables developers, security teams, and newcomers to identify risky prompts and strengthen the resilience and safety of AI systems.
| ID | Technique | Tactic | Definition | Defense | Author |
|---|---|---|---|---|---|
| LLDF-T001 | Prompt Injection | Initial Access | Inserting malicious instructions into user prompts to override system behavior | LLDF Team | |
| LLDF-T002 | Context Poisoning | Persistence | Injecting malicious context that persists across conversation turns | LLDF Team | |
| LLDF-T003 | Jailbreak Techniques | Evasion | Bypassing safety guardrails through creative prompt engineering | LLDF Team | |
| LLDF-T004 | Role-Play Exploitation | Execution | Using fictional scenarios to elicit prohibited responses | LLDF Team | |
| LLDF-T005 | Token Smuggling | Evasion | Hiding malicious content in token sequences that bypass filters | LLDF Team | |
| LLDF-T006 | System Prompt Leakage | Reconnaissance | Extracting system instructions through targeted queries | LLDF Team | |
| LLDF-T007 | Chain-of-Thought Manipulation | Execution | Exploiting reasoning chains to reach unintended conclusions | LLDF Team | |
| LLDF-T008 | Few-Shot Poisoning | Initial Access | Providing malicious examples to influence model behavior | LLDF Team | |
| LLDF-T009 | Encoding Obfuscation | Evasion | Using alternative encodings to bypass content filters | LLDF Team | |
| LLDF-T010 | Instruction Hierarchy Bypass | Evasion | Overriding system instructions with user-level commands | LLDF Team | |
| LLDF-T011 | Memory Exploitation | Persistence | Manipulating conversation memory to maintain malicious state | LLDF Team | |
| LLDF-T012 | Output Manipulation | Execution | Crafting inputs that produce specific malicious outputs | LLDF Team | |
| LLDF-T013 | Semantic Drift | Evasion | Gradually shifting conversation context toward prohibited topics | LLDF Team | |
| LLDF-T014 | Multi-Turn Attacks | Persistence | Building malicious payloads across multiple conversation turns | LLDF Team | |
| LLDF-T015 | Function Calling Abuse | Execution | Exploiting tool/function calling capabilities for unauthorized actions | LLDF Team | |
| LLDF-T016 | Retrieval Poisoning | Initial Access | Injecting malicious content into retrieval sources | LLDF Team | |
| LLDF-T017 | Adversarial Suffixes | Evasion | Appending crafted tokens that trigger unintended behaviors | LLDF Team | |
| LLDF-T018 | Prompt Leaking | Reconnaissance | Extracting training data or system prompts through queries | LLDF Team | |
| LLDF-T019 | Instruction Confusion | Execution | Creating ambiguous instructions that exploit parsing logic | LLDF Team | |
| LLDF-T020 | Context Window Overflow | Evasion | Exceeding context limits to drop security instructions | LLDF Team | |
| LLDF-T021 | Delimiter Injection | Initial Access | Injecting special delimiters to break prompt structure | LLDF Team | |
| LLDF-T022 | Refusal Suppression | Evasion | Techniques to prevent model from refusing requests | LLDF Team | |
| LLDF-T023 | Persona Injection | Execution | Forcing model to adopt malicious personas or identities | LLDF Team | |
| LLDF-T024 | Indirect Prompt Injection | Initial Access | Injecting prompts through external data sources | LLDF Team | |
| LLDF-T025 | Gradient-Based Attacks | Reconnaissance | Using model gradients to craft adversarial inputs | LLDF Team | |
| LLDF-T026 | Tokenization Exploits | Evasion | Exploiting tokenization quirks to bypass filters | LLDF Team | |
| LLDF-T027 | Instruction Negation | Evasion | Using negation to reverse safety instructions | LLDF Team | |
| LLDF-T028 | Multilingual Evasion | Evasion | Using non-English languages to bypass filters | LLDF Team | |
| LLDF-T029 | Code Injection | Execution | Injecting executable code through prompts | LLDF Team | |
| LLDF-T030 | Metadata Manipulation | Initial Access | Exploiting metadata fields to inject instructions | LLDF Team | |
| LLDF-T031 | Attention Manipulation | Execution | Crafting inputs that exploit attention mechanisms | LLDF Team | |
| LLDF-T032 | Embedding Poisoning | Persistence | Poisoning vector embeddings to influence retrieval | LLDF Team | |
| LLDF-T033 | Prompt Chaining | Execution | Chaining multiple prompts to achieve complex attacks | LLDF Team | |
| LLDF-T034 | Safety Alignment Bypass | Evasion | Circumventing RLHF and safety fine-tuning | LLDF Team | |
| LLDF-T035 | Template Injection | Initial Access | Injecting malicious content into prompt templates | LLDF Team | |
| LLDF-T036 | Reasoning Exploitation | Execution | Exploiting chain-of-thought to reach harmful conclusions | LLDF Team | |
| LLDF-T037 | Tool Misuse | Execution | Misusing integrated tools for unauthorized purposes | LLDF Team | |
| LLDF-T038 | Context Injection | Initial Access | Injecting malicious context through external sources | LLDF Team | |
| LLDF-T039 | Behavioral Cloning | Persistence | Training model to mimic malicious behaviors | LLDF Team | |
| LLDF-T040 | Adversarial Examples | Evasion | Crafting inputs that cause misclassification | LLDF Team | |
| LLDF-T041 | Prompt Smuggling | Initial Access | Hiding prompts in seemingly benign content | LLDF Team | |
| LLDF-T042 | Output Steering | Execution | Steering model outputs toward specific harmful content | LLDF Team | |
| LLDF-T043 | Instruction Injection | Initial Access | Injecting new instructions mid-conversation | LLDF Team | |
| LLDF-T044 | Capability Probing | Reconnaissance | Systematically testing model capabilities and limits | LLDF Team | |
| LLDF-T045 | Reward Hacking | Evasion | Exploiting reward functions to bypass safety | LLDF Team | |
| LLDF-T046 | Prompt Wrapping | Evasion | Wrapping malicious prompts in benign context | LLDF Team | |
| LLDF-T047 | Instruction Overload | Evasion | Overwhelming model with conflicting instructions | LLDF Team | |
| LLDF-T048 | Semantic Injection | Initial Access | Injecting malicious semantics through subtle phrasing | LLDF Team | |
| LLDF-T049 | Model Extraction | Reconnaissance | Extracting model parameters or architecture details | LLDF Team | |
| LLDF-T050 | Backdoor Activation | Execution | Triggering hidden backdoors in model behavior | LLDF Team |