Frequently asked questions about JSON Lines format
JSONL (JSON Lines) is a text format for structured data where each line is a valid JSON object. It is also known as newline-delimited JSON (NDJSON) or JSON Lines. Each line represents a separate, self-contained JSON record.
{"name": "Alice", "age": 30}
{"name": "Bob", "age": 25}
{"name": "Charlie", "age": 35}
These terms are essentially the same format with minor variations in naming conventions:
All follow the same core principle: one JSON object per line. Use whichever name is most common in your ecosystem.
Use JSONL when:
Use regular JSON when you need nested arrays/objects as the top-level structure or when the entire dataset represents a single cohesive object.
The most common file extensions are:
For consistency and maximum compatibility, use .jsonl. If working with legacy systems that expect .ndjson, use that instead.
The JSON Lines specification recommends LF (\n) line endings, but most parsers accept:
Most modern JSONL parsers handle both LF and CRLF transparently. For maximum compatibility, use LF (\n).
JSONL's line-based format makes error handling straightforward. Best practices:
// Python example with error handling
import json
with open('data.jsonl', 'r') as f:
for line_num, line in enumerate(f, 1):
try:
data = json.loads(line)
process(data)
except json.JSONDecodeError as e:
print(f"Error on line {line_num}: {e}")
# Continue or break depending on requirements
No. Each line must contain exactly one complete, valid JSON object. The JSON object itself must not contain literal newlines - all newlines within strings must be escaped as \n.
Invalid
{
"name": "Alice",
"age": 30
}
Valid
{"name": "Alice", "age": 30}
If you need pretty-printed JSON for debugging, use regular JSON format. JSONL is designed for machine processing, not human reading.
According to the official specification:
For maximum compatibility, avoid empty lines and comments. If you need metadata, include it as JSON objects with a special type field:
{"type": "metadata", "version": "1.0", "created": "2025-11-11"}
{"type": "record", "id": 1, "name": "Alice"}
Always use UTF-8 encoding. This is the standard for JSON and JSONL.
UTF-8 is backward compatible with ASCII and handles emoji, international characters, and special symbols correctly.
Validation involves checking both format and content:
# Command-line validation with jq
cat data.jsonl | jq -c . > /dev/null && echo "Valid JSONL" || echo "Invalid"
# Python validation
import json
def validate_jsonl(filepath):
line_count = 0
with open(filepath, 'r') as f:
for line_num, line in enumerate(f, 1):
line = line.strip()
if not line: # Skip empty lines if tolerant
continue
try:
json.loads(line)
line_count += 1
except json.JSONDecodeError as e:
print(f"Invalid JSON on line {line_num}: {e}")
return False
print(f"Valid JSONL: {line_count} records")
return True
It depends on your use case:
For large datasets (>10MB) or streaming use cases, JSONL significantly outperforms JSON arrays. For small, static datasets, the difference is negligible.
Yes, JSONL compresses extremely well. Common strategies:
Gzip is recommended for most use cases. Many tools (Python, Go, JavaScript) can stream-decompress .gz files, maintaining JSONL's memory efficiency.
# Compress JSONL with gzip
gzip data.jsonl # Creates data.jsonl.gz
# Read compressed JSONL in Python
import gzip
import json
with gzip.open('data.jsonl.gz', 'rt') as f:
for line in f:
data = json.loads(line)
Large file strategies:
# Split large JSONL file into 10,000 line chunks
split -l 10000 huge.jsonl chunk- --additional-suffix=.jsonl
# Process in parallel with GNU Parallel
parallel "cat {} | jq '.field' > {}.output" ::: chunk-*.jsonl
Yes! JSONL is excellent for streaming APIs and network protocols:
Set Content-Type to application/x-ndjson or application/jsonl. Clients can process records as they arrive.
// Node.js HTTP streaming example
const http = require('http');
http.createServer((req, res) => {
res.writeHead(200, { 'Content-Type': 'application/x-ndjson' });
// Stream records one at a time
for (let i = 0; i < 1000; i++) {
res.write(JSON.stringify({ id: i, data: 'value' }) + '\n');
}
res.end();
}).listen(8080);
JSONL is widely adopted across the industry:
Yes! JSONL easily converts to/from many formats:
# JSONL to CSV with jq
cat data.jsonl | jq -r '[.name, .age, .email] | @csv' > data.csv
# CSV to JSONL with Python pandas
import pandas as pd
df = pd.read_csv('data.csv')
df.to_json('data.jsonl', orient='records', lines=True)
# JSONL to JSON array with jq
jq -s '.' data.jsonl > data.json
# JSON array to JSONL with jq
jq -c '.[]' data.json > data.jsonl
# Import JSONL into PostgreSQL
CREATE TABLE data (doc jsonb);
COPY data FROM '/path/to/data.jsonl';
# Export from SQLite to JSONL
sqlite3 db.sqlite ".mode json" "SELECT * FROM table" > data.jsonl
Common data operations on JSONL:
# Sort by field with jq
jq -s 'sort_by(.age)' data.jsonl > sorted.jsonl
# Sort by multiple fields
jq -s 'sort_by(.lastName, .firstName)' data.jsonl > sorted.jsonl
# Remove duplicates by ID with jq
jq -s 'unique_by(.id)' data.jsonl > deduped.jsonl
# Remove exact duplicates (entire object)
sort data.jsonl | uniq > deduped.jsonl
# Deduplicate in Python (memory efficient)
import json
seen_ids = set()
with open('data.jsonl', 'r') as fin, open('deduped.jsonl', 'w') as fout:
for line in fin:
obj = json.loads(line)
if obj['id'] not in seen_ids:
seen_ids.add(obj['id'])
fout.write(line)
JSONL is not ideal when:
Use the right tool for the job. JSONL excels at large-scale, record-oriented data, but it is not a universal replacement for JSON.
Debugging and inspection tools:
# Pretty-print first 10 records with jq
head -10 data.jsonl | jq '.'
# Count records
wc -l data.jsonl
# Show unique keys across all records
cat data.jsonl | jq -r 'keys[]' | sort | uniq
# Sample random records
shuf -n 5 data.jsonl | jq '.'
# Check for parse errors
cat data.jsonl | jq -c . > /dev/null
# Convert to pretty JSON array for viewing (careful with large files!)
jq -s '.' data.jsonl > pretty.json
While not officially registered with IANA, common MIME types used in practice:
For HTTP APIs, application/x-ndjson is recommended. Always specify charset:
Content-Type: application/x-ndjson; charset=utf-8