Filter and sanitize all incoming prompts.
Secure system instructions and context.
Define and enforce AI behavior limits.
Validate responses before delivery.
Real-time threat identification.
Incident handling and adaptation.
Language Layer Defense Framework — Operational mapping of techniques with plain language definitions and defenses.
| ID | Technique | Tactic | Definition | Defense | Author |
|---|---|---|---|---|---|
| LLDF-T001 | Prompt Injection | Initial Access | Inserting malicious instructions into user prompts to override system behavior | Input validation, prompt templating, instruction hierarchy | LLDF Team |
| LLDF-T002 | Context Poisoning | Persistence | Injecting malicious context that persists across conversation turns | Context sanitization, session isolation, memory bounds | LLDF Team |
| LLDF-T003 | Jailbreak Techniques | Evasion | Bypassing safety guardrails through creative prompt engineering | Multi-layer filtering, behavioral analysis, output validation | LLDF Team |
| LLDF-T004 | Role-Play Exploitation | Execution | Using fictional scenarios to elicit prohibited responses | Intent classification, scenario detection, response filtering | LLDF Team |
| LLDF-T005 | Token Smuggling | Evasion | Hiding malicious content in token sequences that bypass filters | Token-level analysis, semantic validation, pattern detection | LLDF Team |
| LLDF-T006 | System Prompt Leakage | Reconnaissance | Extracting system instructions through targeted queries | Prompt isolation, response sanitization, instruction obfuscation | LLDF Team |
| LLDF-T007 | Chain-of-Thought Manipulation | Execution | Exploiting reasoning chains to reach unintended conclusions | Reasoning validation, logic bounds, output verification | LLDF Team |
| LLDF-T008 | Few-Shot Poisoning | Initial Access | Providing malicious examples to influence model behavior | Example validation, source verification, pattern analysis | LLDF Team |
| LLDF-T009 | Encoding Obfuscation | Evasion | Using alternative encodings to bypass content filters | Multi-encoding detection, normalization, semantic analysis | LLDF Team |
| LLDF-T010 | Instruction Hierarchy Bypass | Evasion | Overriding system instructions with user-level commands | Privilege separation, instruction priority, command validation | LLDF Team |
| LLDF-T011 | Memory Exploitation | Persistence | Manipulating conversation memory to maintain malicious state | Memory sanitization, state validation, session limits | LLDF Team |
| LLDF-T012 | Output Manipulation | Execution | Crafting inputs that produce specific malicious outputs | Output filtering, content validation, response monitoring | LLDF Team |
| LLDF-T013 | Semantic Drift | Evasion | Gradually shifting conversation context toward prohibited topics | Topic tracking, drift detection, context boundaries | LLDF Team |
| LLDF-T014 | Multi-Turn Attacks | Persistence | Building malicious payloads across multiple conversation turns | Cross-turn analysis, state tracking, cumulative filtering | LLDF Team |
| LLDF-T015 | Function Calling Abuse | Execution | Exploiting tool/function calling capabilities for unauthorized actions | Function whitelisting, parameter validation, execution monitoring | LLDF Team |
| LLDF-T016 | Retrieval Poisoning | Initial Access | Injecting malicious content into retrieval sources | Source validation, content sanitization, retrieval filtering | LLDF Team |
| LLDF-T017 | Adversarial Suffixes | Evasion | Appending crafted tokens that trigger unintended behaviors | Suffix detection, token analysis, behavioral monitoring | LLDF Team |
| LLDF-T018 | Prompt Leaking | Reconnaissance | Extracting training data or system prompts through queries | Data isolation, response filtering, leakage detection | LLDF Team |
| LLDF-T019 | Instruction Confusion | Execution | Creating ambiguous instructions that exploit parsing logic | Instruction clarification, parsing validation, ambiguity detection | LLDF Team |
| LLDF-T020 | Context Window Overflow | Evasion | Exceeding context limits to drop security instructions | Context management, instruction pinning, overflow detection | LLDF Team |
| LLDF-T021 | Delimiter Injection | Initial Access | Injecting special delimiters to break prompt structure | Delimiter escaping, structure validation, parsing hardening | LLDF Team |
| LLDF-T022 | Refusal Suppression | Evasion | Techniques to prevent model from refusing requests | Refusal reinforcement, safety layer redundancy, response validation | LLDF Team |
| LLDF-T023 | Persona Injection | Execution | Forcing model to adopt malicious personas or identities | Persona validation, identity constraints, behavior monitoring | LLDF Team |
| LLDF-T024 | Indirect Prompt Injection | Initial Access | Injecting prompts through external data sources | Source isolation, data sanitization, indirect detection | LLDF Team |
| LLDF-T024 | Gradient-Based Attacks | Reconnaissance | Using model gradients to craft adversarial inputs | Gradient masking, input perturbation, adversarial training | LLDF Team |
| LLDF-T025 | Tokenization Exploits | Evasion | Exploiting tokenization quirks to bypass filters | Tokenization normalization, boundary detection, semantic validation | LLDF Team |
| LLDF-T027 | Instruction Negation | Evasion | Using negation to reverse safety instructions | Negation detection, instruction reinforcement, logic validation | LLDF Team |
| LLDF-T028 | Multilingual Evasion | Evasion | Using non-English languages to bypass filters | Multilingual filtering, translation validation, language detection | LLDF Team |
| LLDF-T029 | Code Injection | Execution | Injecting executable code through prompts | Code detection, execution prevention, sandbox isolation | LLDF Team |
| LLDF-T030 | Metadata Manipulation | Initial Access | Exploiting metadata fields to inject instructions | Metadata validation, field sanitization, structure enforcement | LLDF Team |
| LLDF-T031 | Attention Manipulation | Execution | Crafting inputs that exploit attention mechanisms | Attention monitoring, pattern detection, mechanism hardening | LLDF Team |
| LLDF-T032 | Embedding Poisoning | Persistence | Poisoning vector embeddings to influence retrieval | Embedding validation, anomaly detection, source verification | LLDF Team |
| LLDF-T033 | Prompt Chaining | Execution | Chaining multiple prompts to achieve complex attacks | Chain detection, cumulative analysis, sequence validation | LLDF Team |
| LLDF-T034 | Safety Alignment Bypass | Evasion | Circumventing RLHF and safety fine-tuning | Alignment reinforcement, multi-layer safety, behavioral monitoring | LLDF Team |
| LLDF-T035 | Template Injection | Initial Access | Injecting malicious content into prompt templates | Template validation, variable sanitization, structure enforcement | LLDF Team |
| LLDF-T036 | Reasoning Exploitation | Execution | Exploiting chain-of-thought to reach harmful conclusions | Reasoning validation, logic bounds, conclusion filtering | LLDF Team |
| LLDF-T037 | Tool Misuse | Execution | Misusing integrated tools for unauthorized purposes | Tool authorization, usage monitoring, capability limits | LLDF Team |
| LLDF-T038 | Context Injection | Initial Access | Injecting malicious context through external sources | Context validation, source verification, injection detection | LLDF Team |
| LLDF-T039 | Behavioral Cloning | Persistence | Training model to mimic malicious behaviors | Behavior monitoring, anomaly detection, training validation | LLDF Team |
| LLDF-T040 | Adversarial Examples | Evasion | Crafting inputs that cause misclassification | Adversarial training, input validation, robustness testing | LLDF Team |
| LLDF-T041 | Prompt Smuggling | Initial Access | Hiding prompts in seemingly benign content | Content analysis, hidden instruction detection, semantic validation | LLDF Team |
| LLDF-T042 | Output Steering | Execution | Steering model outputs toward specific harmful content | Output monitoring, steering detection, content validation | LLDF Team |
| LLDF-T043 | Instruction Injection | Initial Access | Injecting new instructions mid-conversation | Instruction tracking, injection detection, command validation | LLDF Team |
| LLDF-T044 | Capability Probing | Reconnaissance | Systematically testing model capabilities and limits | Probing detection, rate limiting, capability obfuscation | LLDF Team |
| LLDF-T045 | Reward Hacking | Evasion | Exploiting reward functions to bypass safety | Reward validation, objective alignment, behavior monitoring | LLDF Team |
| LLDF-T046 | Prompt Wrapping | Evasion | Wrapping malicious prompts in benign context | Context analysis, wrapping detection, intent classification | LLDF Team |
| LLDF-T047 | Instruction Overload | Evasion | Overwhelming model with conflicting instructions | Instruction prioritization, conflict resolution, load management | LLDF Team |
| LLDF-T048 | Semantic Injection | Initial Access | Injecting malicious semantics through subtle phrasing | Semantic analysis, intent detection, phrasing validation | LLDF Team |
| LLDF-T049 | Model Extraction | Reconnaissance | Extracting model parameters or architecture details | Query monitoring, extraction detection, response limiting | LLDF Team |
| LLDF-T050 | Backdoor Activation | Execution | Triggering hidden backdoors in model behavior | Backdoor detection, trigger monitoring, behavioral analysis | LLDF Team |