1. What does this tool do
This free online text tokenizer splits text into words, characters, or lines and shows how often each appears—instant word count, unique count, and frequency table. Use it for word counter, text analysis, or token count for documents and data preparation. No sign-up, no upload; all tokenization runs in your browser. Copy tokens or the frequency table, or send counts to the Statistics Calculator for further analysis. Ideal for word counts, text analysis, NLP, or statistics pipeline.
2. How to use it
Quick start: Choose Words, Characters, or Lines mode, paste your text, click Tokenize, then view count and frequency table. Copy results or click "Analyze in Statistics" to open the Statistics Calculator with counts pre-filled.
- Select mode — Choose Words, Characters, or Lines depending on how you want to split the text.
- Enter or paste text — Type or paste into the input area. Use Generate dummy text to quickly fill with sample content.
- Click Tokenize — The tool splits the text and displays token count, unique count, and a frequency table.
- Copy results — Copy tokens in comma or newline format, or copy the frequency table (token, tab, count per line).
- Analyze further — Click Analyze in Statistics to open the Statistics Calculator with the frequency counts pre-filled.
3. How it works
- Words mode — Splits on whitespace, strips leading and trailing punctuation from each word, and filters empty strings.
- Characters mode — Each character is a token; spaces, tabs, newlines, and punctuation are excluded.
- Lines mode — Splits on newlines (handles both
\nand\r\n), trims each line, strips trailing punctuation, and filters empty lines.
Frequency is computed by counting each token's occurrences and sorting by count descending. All computation runs entirely in your browser. No data is sent to any server.
4. Use cases & examples
- Word count — Get the total number of words and unique words in a document.
- Text analysis — See which words or characters appear most often.
- Data preparation — Export tokens to comma or newline format for use in spreadsheets or other tools.
- Statistics pipeline — Use "Analyze in Statistics" to compute mean, median, distribution, and percentiles on token counts.
- NLP and corpus work — Quick tokenization for small to medium texts before further processing.
Example
For input: "hello world hello." in Words mode:
- Tokens:
hello,world,hello(the trailing period is stripped) - Frequency:
hello(2),world(1)
5. Limitations & known constraints
- Input cap — Maximum 512KB (~512,000 characters). Larger input returns an error.
- Client-side only — No server; processing runs in the browser. Very large inputs may cause brief UI lag on slower devices.
- Simple tokenization — Words mode splits on whitespace only; no stemming, lemmatization, or language-specific tokenization. Punctuation is stripped from token boundaries.
- Characters exclude spaces and punctuation — Spaces, tabs, newlines, and punctuation characters are not counted as character tokens.