Common JSONL Mistakes
10 frequent errors that break JSONL files - with before/after examples and fixes for each one
Wrapping All Lines in an Array
This is the single most common mistake. JSONL is not a JSON array of objects. Each line must be an independent JSON value with no enclosing array brackets or commas between lines.
Wrong - This is a JSON array, not JSONL
[
{"name": "Alice", "age": 30},
{"name": "Bob", "age": 25},
{"name": "Charlie", "age": 35}
]The outer [ ] brackets and commas between objects make this standard JSON, not JSONL.
Correct - One JSON object per line
{"name": "Alice", "age": 30}
{"name": "Bob", "age": 25}
{"name": "Charlie", "age": 35}Each line is a standalone, valid JSON object. No array brackets, no commas between lines.
How to fix: Remove the opening [ and closing ], remove trailing commas after each object, and make sure each JSON object sits on its own line.
Using Single Quotes Instead of Double Quotes
JSON requires double quotes for both keys and string values. Single quotes are valid in JavaScript and Python, but they are not valid JSON. This mistake often happens when developers copy output from a Python REPL or JavaScript console.
Wrong - Single quotes are not valid JSON
{'name': 'Alice', 'age': 30}
{'name': 'Bob', 'age': 25}Correct - Double quotes required
{"name": "Alice", "age": 30}
{"name": "Bob", "age": 25}How to fix: Replace all single quotes with double quotes. In Python, use json.dumps() to serialize objects instead of str() or repr().
Trailing Commas After the Last Property
JavaScript and many other languages allow trailing commas, but JSON does not. A comma after the last property in an object or the last element in an array makes the JSON invalid.
Wrong - Trailing comma after last property
{"name": "Alice", "age": 30,}
{"name": "Bob", "tags": ["admin", "user",],}The commas before } and ] are invalid.
Correct - No trailing commas
{"name": "Alice", "age": 30}
{"name": "Bob", "tags": ["admin", "user"]}How to fix: Remove any comma that appears immediately before a closing } or ]. Use the JSONL Validator to catch these automatically.
Unquoted Property Keys
In JavaScript, object keys do not need quotes if they are valid identifiers. In JSON, all keys must be double-quoted strings. This is a strict requirement of the JSON specification.
Wrong - Keys must be quoted
{name: "Alice", age: 30}
{name: "Bob", active: true}Correct - All keys double-quoted
{"name": "Alice", "age": 30}
{"name": "Bob", "active": true}How to fix: Wrap every key in double quotes. If generating JSONL programmatically, always use a proper JSON serializer (e.g., json.dumps() in Python, JSON.stringify() in JavaScript) rather than manual string concatenation.
Multi-line JSON Objects
In JSONL, each record must occupy exactly one line. Pretty-printed or multi-line JSON objects break parsers because each line is read independently. This mistake often occurs when using json.dumps(obj, indent=2) or similar formatting options.
Wrong - Object split across multiple lines
{
"name": "Alice",
"age": 30
}
{
"name": "Bob",
"age": 25
}A JSONL parser reads line-by-line. Line 1 sees just { which is incomplete JSON.
Correct - Each record on a single line
{"name": "Alice", "age": 30}
{"name": "Bob", "age": 25}Each complete JSON object fits on one line, no matter how long it is.
How to fix: In Python, use json.dumps(obj) without the indent parameter. In JavaScript, use JSON.stringify(obj) without the space argument. The output may look dense, but that is the correct JSONL format.
Missing Newlines Between Records
Every record must be separated by a newline character (\n). When multiple JSON objects are concatenated on the same line, parsers will fail because the combined text is not valid JSON.
Wrong - Multiple objects on one line
{"name": "Alice", "age": 30}{"name": "Bob", "age": 25}{"name": "Charlie", "age": 35}The parser tries to parse the entire line as one JSON value and fails.
Correct - One object per line
{"name": "Alice", "age": 30}
{"name": "Bob", "age": 25}
{"name": "Charlie", "age": 35}How to fix: When writing JSONL, always append \n after each JSON object. In Python: f.write(json.dumps(obj) + '\n'). In Node.js: stream.write(JSON.stringify(obj) + '\n').
BOM Characters at Start of File
A Byte Order Mark (BOM) is an invisible character (U+FEFF) that some editors - especially on Windows - insert at the beginning of UTF-8 files. The BOM is three bytes (EF BB BF) that sit before your first {, causing JSON parsers to choke on the very first line.
Wrong - Hidden BOM before first character
# What your editor shows:
{"name": "Alice", "age": 30}
# What the parser actually sees (hex):
EF BB BF 7B 22 6E 61 6D 65 22 ...
^^^BOM^^^ { starts hereOnly the first line fails. All other lines parse correctly, making this error confusing to debug.
Correct - UTF-8 without BOM
# Clean hex - file starts with { directly:
7B 22 6E 61 6D 65 22 ...
{ starts immediatelySave files as "UTF-8" (not "UTF-8 with BOM").
How to detect and fix:
# Detect BOM (Linux/Mac)
hexdump -C data.jsonl | head -n 1
# Look for: ef bb bf at the start
# Remove BOM (Linux/Mac)
sed -i '1s/^\xEF\xBB\xBF//' data.jsonl
# Python: Read with BOM-aware encoding
with open('data.jsonl', 'r', encoding='utf-8-sig') as f:
for line in f:
obj = json.loads(line)Non-UTF-8 Encoding
The JSON specification (RFC 8259) requires UTF-8 encoding. Files saved in Latin-1, Windows-1252, Shift-JIS, or other encodings will cause parse errors whenever a non-ASCII character appears. Characters like accented letters, currency symbols, or emoji will be corrupted or rejected.
Wrong - File saved as Latin-1
# Latin-1 byte for e-acute: E9
# Parser expects UTF-8 byte: C3 A9
{"name": "René", "city": "Montréal"}
# Fails with: invalid UTF-8 byte sequenceCorrect - File saved as UTF-8
# Two valid options for special characters:
# Option 1: Raw UTF-8 characters
{"name": "René", "city": "Montréal"}
# Option 2: Unicode escape sequences
{"name": "Ren\u00e9", "city": "Montr\u00e9al"}How to fix: Convert files to UTF-8 before processing. In Python: open(f, 'r', encoding='utf-8'). To convert from another encoding: read with the source encoding, then write as UTF-8. Use chardet to auto-detect the source encoding if unknown.
Comments in JSONL
JSON does not support comments of any kind. Neither // single-line comments, /* */ block comments, nor # hash comments are valid. This trips up developers who are used to JSONC (JSON with Comments) or configuration file formats that do support comments.
Wrong - Comments are not valid JSON
// User records
{"name": "Alice", "age": 30}
/* Admin users below */
{"name": "Bob", "role": "admin"}
# End of fileCorrect - No comments, just data
{"name": "Alice", "age": 30}
{"name": "Bob", "role": "admin"}If you need metadata, add a "_comment" field inside your JSON objects, or use a separate metadata file.
How to fix: Remove all comment lines. If you need annotations, include them as data fields: {"_meta": "admin users", "name": "Bob"}. Some teams use a header record with metadata instead.
Empty Lines Between Records
While the JSONL specification says parsers may ignore empty lines, many real-world parsers do not. An empty line between records can cause parse failures, off-by-one errors in line counts, or unexpected null values. It is safest to avoid them entirely.
Risky - Empty lines between records
{"name": "Alice", "age": 30}
{"name": "Bob", "age": 25}
{"name": "Charlie", "age": 35}
Some parsers throw "unexpected end of input" on the blank lines. Others silently produce null records.
Correct - No empty lines
{"name": "Alice", "age": 30}
{"name": "Bob", "age": 25}
{"name": "Charlie", "age": 35}A trailing newline after the last record is fine and common, but blank lines between records should be avoided.
How to fix: Remove blank lines with sed '/^$/d' data.jsonl > clean.jsonl or in Python: skip lines where line.strip() is empty before parsing.
Quick Reference Table
| # | Mistake | Example | Fix |
|---|---|---|---|
| 1 | Array wrapper | [{"a":1},{"b":2}] | Remove [ ] and commas between objects |
| 2 | Single quotes | {'name': 'Alice'} | Use double quotes: "name" |
| 3 | Trailing commas | {"a": 1,} | Remove comma before } |
| 4 | Unquoted keys | {name: "Alice"} | Quote all keys: {"name": ...} |
| 5 | Multi-line objects | {\n "a": 1\n} | Flatten to one line per object |
| 6 | Missing newlines | {"a":1}{"b":2} | Add \n between objects |
| 7 | BOM character | EF BB BF {"a":1} | Save as UTF-8 without BOM |
| 8 | Wrong encoding | Latin-1, Shift-JIS, etc. | Convert to UTF-8 |
| 9 | Comments | // comment | Remove all comment lines |
| 10 | Empty lines | Blank lines between records | Remove with sed '/^$/d' |
Programmatic Validation
Catch all 10 mistakes automatically with code. Here are ready-to-use validation snippets in Python, Node.js, and jq.
Python
import json
import sys
def validate_jsonl(filepath):
"""Validate a JSONL file and report errors by line."""
errors = []
valid = 0
with open(filepath, 'r', encoding='utf-8-sig') as f:
for line_num, line in enumerate(f, 1):
# Skip empty lines (but warn about them)
if not line.strip():
errors.append(f"Line {line_num}: Empty line (remove it)")
continue
try:
obj = json.loads(line)
valid += 1
except json.JSONDecodeError as e:
errors.append(f"Line {line_num}: {e}")
# Print results
print(f"Valid records: {valid}")
print(f"Errors: {len(errors)}")
for err in errors:
print(f" {err}")
return len(errors) == 0
# Usage
if __name__ == '__main__':
filepath = sys.argv[1] if len(sys.argv) > 1 else 'data.jsonl'
is_valid = validate_jsonl(filepath)
sys.exit(0 if is_valid else 1)Usage: python validate.py data.jsonl
Uses utf-8-sig encoding to automatically handle BOM characters (Mistake #7).
Node.js
const fs = require('fs');
const readline = require('readline');
async function validateJsonl(filepath) {
const fileStream = fs.createReadStream(filepath, { encoding: 'utf-8' });
const rl = readline.createInterface({
input: fileStream,
crlfDelay: Infinity
});
let lineNum = 0;
let valid = 0;
const errors = [];
for await (const line of rl) {
lineNum++;
// Strip BOM from first line if present
const cleanLine = lineNum === 1
? line.replace(/^\uFEFF/, '')
: line;
// Skip empty lines (but warn)
if (cleanLine.trim() === '') {
errors.push(`Line ${lineNum}: Empty line (remove it)`);
continue;
}
try {
JSON.parse(cleanLine);
valid++;
} catch (e) {
errors.push(`Line ${lineNum}: ${e.message}`);
}
}
console.log(`Valid records: ${valid}`);
console.log(`Errors: ${errors.length}`);
errors.forEach(err => console.log(` ${err}`));
return errors.length === 0;
}
// Usage
const filepath = process.argv[2] || 'data.jsonl';
validateJsonl(filepath).then(isValid => {
process.exit(isValid ? 0 : 1);
});Usage: node validate.js data.jsonl
Uses readline for memory-efficient streaming - works on multi-GB files.
jq (Command Line)
# Quick validation - check if every line is valid JSON
jq -e . data.jsonl > /dev/null 2>&1 && echo "Valid JSONL" || echo "Invalid JSONL"
# Show line numbers of invalid lines
awk '{print NR": "$0}' data.jsonl | while IFS= read -r line; do
num=$(echo "$line" | cut -d: -f1)
content=$(echo "$line" | cut -d: -f2-)
echo "$content" | jq . > /dev/null 2>&1 || echo "Error on line $num"
done
# Count valid vs total lines (excluding empty lines)
total=$(grep -c '.' data.jsonl)
valid=$(jq -c . data.jsonl 2>/dev/null | wc -l)
echo "Valid: $valid / $total"
# Remove empty lines and re-validate
sed '/^$/d' data.jsonl | jq -c . > cleaned.jsonlTip: Install jq from stedolan.github.io/jq. It is the most popular command-line tool for working with JSON and JSONL files.
Validate Your JSONL Instantly
Paste your JSONL data into our free online validator. It catches all 10 mistakes listed above and shows you exactly what to fix, with line numbers and error descriptions.