Convert text to binary and binary to text. UTF-8 supported.
Convert any text into its binary representation and any binary string back to readable text. The encoder defaults to UTF-8, which means every English character, every accented letter, every CJK glyph, and every emoji round-trips correctly. Decoding accepts space-separated, dash-separated, comma-separated, newline-separated, and run-on binary input. Everything happens locally in your browser; nothing is uploaded.
Binary is base 2. The number system humans use day-to-day is base 10, with ten digits (0 through 9) and place values that are powers of ten: the ones, the tens, the hundreds, the thousands. Binary uses only two digits (0 and 1), and the place values are powers of two: the ones, the twos, the fours, the eights, the sixteens. The decimal number 13 is one eight, plus one four, plus zero twos, plus one one, which in binary is 1101.
Computers prefer base 2 because the underlying hardware does. A transistor has two reliable electrical states (on and off, high voltage and low voltage), and mapping those to the digits 0 and 1 gives an unambiguous physical implementation. Building hardware that distinguishes ten distinct voltage levels reliably is much harder than building hardware that distinguishes two, which is why base 10 stayed in calculators and base 2 won inside the machine.
A binary code translator is not just doing math. It is also doing translation. Numbers in binary are unambiguous. Characters are not. To convert the letter A to binary, the tool first looks up which number represents A. That lookup is a character encoding, and the choice of encoding decides what bytes you get out.
ASCII (American Standard Code for Information Interchange, 1963) assigned a number from 0 to 127 to each of the unaccented Latin letters, digits, common punctuation, and a handful of control codes. The letter A is 65, lowercase a is 97, the digit 0 is 48. ASCII covers everything an English-language teletype needed in 1963, but it has no Spanish ñ, no French é, no Russian Cyrillic, no Chinese, and definitely no emoji.
UTF-8 (Unicode Transformation Format, 8-bit, 1992) is the modern superset. For the first 128 codepoints it is byte-identical to ASCII, so plain English text encodes the same. Above 127, UTF-8 uses 2, 3, or 4 bytes per character to cover every script in active use plus emoji and symbols. This tool defaults to UTF-8, which is why pasting café or 🚀 produces correct output rather than garbled bytes.
The byte was not always 8 bits. Early machines used 6-bit, 7-bit, and 9-bit bytes depending on the manufacturer. The IBM System/360 in 1964 standardized on 8 bits, and the rest of the industry followed. Eight bits gives 256 distinct values, which was enough for the 128 ASCII characters plus a national-language extension (the "high half" of ASCII used for accented Latin in Western Europe, Cyrillic in Russia, and so on).
Because ASCII only needed 7 bits, you will sometimes see ASCII written in 7-bit binary (the leading zero dropped). For consistency with how computers actually store text in memory, this tool always writes each byte as a full 8 bits, with a leading zero for the ASCII range. The letter A (decimal 65) is 01000001, not 1000001.
The two-letter string Hi is the simplest non-trivial example. The capital H is ASCII codepoint 72, the lowercase i is ASCII codepoint 105. Both are below 128, so UTF-8 encodes each as a single byte. Converting 72 to binary: 64 + 8 = 72, so the bits at positions 6 and 3 are set, giving 01001000. Converting 105: 64 + 32 + 8 + 1 = 105, so the bits at positions 6, 5, 3, and 0 are set, giving 01101001. The final output is 01001000 01101001.
For multi-byte UTF-8, the process is the same with more steps. The string café is four characters, but five UTF-8 bytes: c (0x63), a (0x61), f (0x66), and é which encodes as 0xC3 0xA9. The full binary output is 01100011 01100001 01100110 11000011 10101001, which is exactly what this tool produces.
Real-world binary strings come in many shapes. Some have spaces between bytes (01001000 01101001). Some use dashes (01001000-01101001). Some use commas. Some run together with no separator at all (0100100001101001). This decoder auto-detects separators by checking for any whitespace, dashes, commas, or newlines in the input. If none are found, it falls back to splitting from the left into 8-bit chunks.
If the total bit count is not a multiple of 8, the tool surfaces a warning. It still attempts a best-effort decode by zero-padding the final chunk, but the output is likely wrong if the input was actually truncated mid-byte. The warning is a hint to check for a missing 0 or 1 before trusting the result.
TextEncoder and TextDecoder. Nothing is uploaded, logged, or transmitted.Hi is really 01001000 01101001 inside the machine sticks better than the abstract claim that computers store everything as numbers.A byte has 8 bits because that was the size needed to hold one character of extended ASCII plus a parity bit, and the hardware industry standardized around 8-bit bytes in the 1970s. Modern UTF-8 still uses 8 bits as its atomic unit, but individual characters can occupy 1 to 4 bytes depending on the codepoint.
UTF-8 by default. UTF-8 is backward compatible with 7-bit ASCII, so plain English text produces the same binary either way. For accented Latin, Cyrillic, Greek, Chinese, Japanese, and emoji, UTF-8 uses 2, 3, or 4 bytes per character.
Yes. The decoder auto-detects separators. If your binary has spaces, dashes, commas, or newlines between bytes, those are used as delimiters. If your input is a continuous string of 0s and 1s, the decoder splits it into 8-bit chunks from the left.
Most emoji live outside the Basic Multilingual Plane. Their UTF-8 representation requires 4 bytes (32 bits). For example, the rocket emoji encodes as the bytes 0xF0 0x9F 0x9A 0x80, which in binary is 11110000 10011111 10011010 10000000.
The tool shows a warning, then pads the final chunk with zeros to complete the byte. The result is often nonsense if the input was actually truncated, so the warning is a signal to check for missing digits before trusting the output.
No. Binary is just a way of writing numbers in base 2. Machine code is binary that happens to encode CPU instructions. When you translate the word Hi to binary here, you get the UTF-8 byte representation of those characters, not a CPU instruction.