Email Regex Cheatsheet (with Worked Examples)
Six email regex patterns, each tested against the same 15 addresses. The simple one that does 95% of the work, the strict one that's RFC-shaped, and the obfuscation-aware one for scraping protection. With the gotchas spelled out.
The simple regex (the one to use)
[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}
This is the pattern almost every production extractor uses. It accepts every real-world email format and rejects everything that obviously isn't an email. RFC-strict it isn't, but RFC-strict isn't what real systems need.
Three character classes, in order:
- Local part: letters, digits, dot, underscore, percent, plus, hyphen — repeated one or more times.
- Domain (everything before the last dot): letters, digits, dot, hyphen — repeated one or more times.
- TLD (after the last dot): letters only, two or more.
Test cases — what each pattern catches
| Address | Should match? | Simple regex |
|---|---|---|
alice@example.com | yes | match |
bob.smith+filter@example.co.uk | yes | match |
support_team@subdomain.company.io | yes | match |
x@y.museum | yes | match |
"weird local"@example.com | yes (RFC) | no match |
alice@ | no | no match |
@example.com | no | no match |
alice@example | no | no match |
not-an-email | no | no match |
The simple regex misses RFC-legal but never-seen-in-practice patterns (quoted locals, embedded comments). Skip those — chasing them adds complexity without catching real addresses.
The RFC-shaped regex (the one not to use)
The full RFC 5322 email regex is around 6,400 characters. The shorter "RFC-shaped" version most people reach for:
(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])
This pattern technically accepts more legal addresses. In practice, it's slower, harder to read, harder to debug, and catches almost no addresses the simple pattern misses. Use it only when the project's legal posture explicitly demands RFC compliance.
The obfuscation-aware regex
For text where addresses are deliberately obfuscated, run a preprocessing pass before the standard regex:
// JavaScript
const deobfuscated = text
.replace(/\s*\[\s*at\s*\]\s*/gi, '@')
.replace(/\s*\(\s*at\s*\)\s*/gi, '@')
.replace(/\s+at\s+/gi, '@')
.replace(/\s*\[\s*dot\s*\]\s*/gi, '.')
.replace(/\s*\(\s*dot\s*\)\s*/gi, '.')
.replace(/\s+dot\s+/gi, '.');
// Now run the standard email regex over `deobfuscated`
This catches the common patterns: name [at] domain [dot] com, name (at) domain (dot) com, name AT domain DOT com. It won't catch JavaScript-assembled emails or image-rendered ones — those need different tools.
Language-specific syntax
The same pattern in five languages:
JavaScript:
const re = /[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/g;
const matches = text.match(re) || [];
Python:
import re
re.findall(r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}', text)
Go:
re := regexp.MustCompile(`[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}`)
matches := re.FindAllString(text, -1)
Ruby:
matches = text.scan(/[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/)
grep -E (POSIX):
grep -Eoi '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}' file.txt
All five produce the same result on the same input. Backslash escaping is the only meaningful difference — POSIX uses single-backslash; some languages use double-backslash inside string literals.
Common gotchas
- Trailing punctuation. The regex eats periods because the TLD class allows them. Strip trailing
.,;:)>}]after extraction. - Greedy multi-line match. If the input has no whitespace between two addresses, a poorly-bounded regex can match across them. Always use word boundaries or anchor to non-email characters.
- Case sensitivity. RFC says the local is case-sensitive. Real systems treat it case-insensitively. Lowercase on extraction unless you have a specific reason not to.
- Locale. The regex uses ASCII
a-zA-Z. For internationalized addresses, switch to Unicode property classes:\p{L}for letters,\p{N}for digits.
For the broader reference on email extraction including the legal layer, see Email Extraction: The Complete Guide. For the workflow walkthrough, see How to Extract Emails from Any Text.
Frequently asked questions
Which email regex should I use?
Almost always the simple one: [a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}. It catches every real-world address you'll encounter and rejects every obvious non-address. The RFC-compliant version is too permissive for practical use.
Why does the regex miss email addresses with + in them?
It shouldn't — the standard pattern includes + in the local-part character class. If your regex misses alice+filter@example.com, the regex is missing the + in [a-zA-Z0-9._%+-]. Add it.
Does this regex work in JavaScript, Python, and Go?
Yes — it uses only POSIX-standard regex features. Each language has minor differences in escape syntax (the \. stays the same; some languages need \\). See the language-specific section.
Why use {2,} instead of just {2,4} for the TLD?
Modern TLDs go well beyond four characters: .museum is six, .solutions is nine, .international is thirteen. {2,} handles all of them. Hard-coded length limits will miss real domains.
How do I use this regex in Excel or Google Sheets?
Both support regex via REGEXEXTRACT (Sheets) and Power Query (Excel). The pattern goes in as a string literal — escape the backslash as \\ in the formula.
Keep reading
Written by the TextKit team. We build the tools we write about — try the Email Extractor used in this post.