LLDF Logo
Language Layer Defense Framework

Defending AI, One Word at a Time

Multi-layered defense against prompt injection attacks
Six Layer Defense ▼
AI Red Teaming Demos Coming Soon
LLDE Lab Coming Soon
LLDE Live Workshops Coming Soon

L1

Input Validation

Filter and sanitize all incoming prompts.

L2

Context Protection

Secure system instructions and context.

L3

Behavioral Boundaries

Define and enforce AI behavior limits.

L4

Output Filtering

Validate responses before delivery.

L5

Monitoring & Detection

Real-time threat identification.

L6

Response & Recovery

Incident handling and adaptation.

LLDF

Language Layer Defense Framework—Operational mapping of techniques with plain language definitions and defenses.

AI models can be deceived by carefully crafted prompts designed to manipulate their instructions, distort their reasoning, or slip harmful content past safety mechanisms. These attacks don’t involve conventional hacking; instead, they exploit how language models process text, context, memory, and examples. The techniques below outline common strategies for bypassing safeguards, extracting sensitive data, or provoking unsafe behavior. Recognizing these patterns enables developers, security teams, and newcomers to identify risky prompts and strengthen the resilience and safety of AI systems.

Join the LLDF Community

Submit New Technique

50 techniques
ID Technique Tactic Definition Defense Author
LLDF-T001 Prompt Injection Initial Access Inserting malicious instructions into user prompts to override system behavior LLDF Team
LLDF-T002 Context Poisoning Persistence Injecting malicious context that persists across conversation turns LLDF Team
LLDF-T003 Jailbreak Techniques Evasion Bypassing safety guardrails through creative prompt engineering LLDF Team
LLDF-T004 Role-Play Exploitation Execution Using fictional scenarios to elicit prohibited responses LLDF Team
LLDF-T005 Token Smuggling Evasion Hiding malicious content in token sequences that bypass filters LLDF Team
LLDF-T006 System Prompt Leakage Reconnaissance Extracting system instructions through targeted queries LLDF Team
LLDF-T007 Chain-of-Thought Manipulation Execution Exploiting reasoning chains to reach unintended conclusions LLDF Team
LLDF-T008 Few-Shot Poisoning Initial Access Providing malicious examples to influence model behavior LLDF Team
LLDF-T009 Encoding Obfuscation Evasion Using alternative encodings to bypass content filters LLDF Team
LLDF-T010 Instruction Hierarchy Bypass Evasion Overriding system instructions with user-level commands LLDF Team
LLDF-T011 Memory Exploitation Persistence Manipulating conversation memory to maintain malicious state LLDF Team
LLDF-T012 Output Manipulation Execution Crafting inputs that produce specific malicious outputs LLDF Team
LLDF-T013 Semantic Drift Evasion Gradually shifting conversation context toward prohibited topics LLDF Team
LLDF-T014 Multi-Turn Attacks Persistence Building malicious payloads across multiple conversation turns LLDF Team
LLDF-T015 Function Calling Abuse Execution Exploiting tool/function calling capabilities for unauthorized actions LLDF Team
LLDF-T016 Retrieval Poisoning Initial Access Injecting malicious content into retrieval sources LLDF Team
LLDF-T017 Adversarial Suffixes Evasion Appending crafted tokens that trigger unintended behaviors LLDF Team
LLDF-T018 Prompt Leaking Reconnaissance Extracting training data or system prompts through queries LLDF Team
LLDF-T019 Instruction Confusion Execution Creating ambiguous instructions that exploit parsing logic LLDF Team
LLDF-T020 Context Window Overflow Evasion Exceeding context limits to drop security instructions LLDF Team
LLDF-T021 Delimiter Injection Initial Access Injecting special delimiters to break prompt structure LLDF Team
LLDF-T022 Refusal Suppression Evasion Techniques to prevent model from refusing requests LLDF Team
LLDF-T023 Persona Injection Execution Forcing model to adopt malicious personas or identities LLDF Team
LLDF-T024 Indirect Prompt Injection Initial Access Injecting prompts through external data sources LLDF Team
LLDF-T025 Gradient-Based Attacks Reconnaissance Using model gradients to craft adversarial inputs LLDF Team
LLDF-T026 Tokenization Exploits Evasion Exploiting tokenization quirks to bypass filters LLDF Team
LLDF-T027 Instruction Negation Evasion Using negation to reverse safety instructions LLDF Team
LLDF-T028 Multilingual Evasion Evasion Using non-English languages to bypass filters LLDF Team
LLDF-T029 Code Injection Execution Injecting executable code through prompts LLDF Team
LLDF-T030 Metadata Manipulation Initial Access Exploiting metadata fields to inject instructions LLDF Team
LLDF-T031 Attention Manipulation Execution Crafting inputs that exploit attention mechanisms LLDF Team
LLDF-T032 Embedding Poisoning Persistence Poisoning vector embeddings to influence retrieval LLDF Team
LLDF-T033 Prompt Chaining Execution Chaining multiple prompts to achieve complex attacks LLDF Team
LLDF-T034 Safety Alignment Bypass Evasion Circumventing RLHF and safety fine-tuning LLDF Team
LLDF-T035 Template Injection Initial Access Injecting malicious content into prompt templates LLDF Team
LLDF-T036 Reasoning Exploitation Execution Exploiting chain-of-thought to reach harmful conclusions LLDF Team
LLDF-T037 Tool Misuse Execution Misusing integrated tools for unauthorized purposes LLDF Team
LLDF-T038 Context Injection Initial Access Injecting malicious context through external sources LLDF Team
LLDF-T039 Behavioral Cloning Persistence Training model to mimic malicious behaviors LLDF Team
LLDF-T040 Adversarial Examples Evasion Crafting inputs that cause misclassification LLDF Team
LLDF-T041 Prompt Smuggling Initial Access Hiding prompts in seemingly benign content LLDF Team
LLDF-T042 Output Steering Execution Steering model outputs toward specific harmful content LLDF Team
LLDF-T043 Instruction Injection Initial Access Injecting new instructions mid-conversation LLDF Team
LLDF-T044 Capability Probing Reconnaissance Systematically testing model capabilities and limits LLDF Team
LLDF-T045 Reward Hacking Evasion Exploiting reward functions to bypass safety LLDF Team
LLDF-T046 Prompt Wrapping Evasion Wrapping malicious prompts in benign context LLDF Team
LLDF-T047 Instruction Overload Evasion Overwhelming model with conflicting instructions LLDF Team
LLDF-T048 Semantic Injection Initial Access Injecting malicious semantics through subtle phrasing LLDF Team
LLDF-T049 Model Extraction Reconnaissance Extracting model parameters or architecture details LLDF Team
LLDF-T050 Backdoor Activation Execution Triggering hidden backdoors in model behavior LLDF Team
×

Technique Details

© 2025 Language Layer Defense Framework    Privacy & Legal