Real-world use cases and code samples showing when and why to use JSONL
[
{
"id": 1,
"name": "Alice",
"email": "[email protected]"
},
{
"id": 2,
"name": "Bob",
"email": "[email protected]"
}
]
Must load entire file
Cannot append easily
Memory intensive for large data
{"id": 1, "name": "Alice", "email": "[email protected]"}
{"id": 2, "name": "Bob", "email": "[email protected]"}
Stream line-by-line
Append-friendly
Memory efficient
Widely used for passing training data to ML models (e.g., OpenAI, Google Vertex AI). Each line represents one training example, making it easy to stream massive datasets.
{"prompt": "What is AI?", "response": "Artificial Intelligence is..."}
{"prompt": "Explain ML", "response": "Machine Learning is..."}
Process data as it arrives without loading everything into memory.
{"event": "click", "ts": 1234567890}
{"event": "view", "ts": 1234567891}
Perfect format for log files. Each log entry is structured as a JSON object, and new logs are simply appended to the end of the file. No need to parse and rewrite the file.
{"ts": 1699123456, "level": "info", "msg": "Server started", "port": 8080}
{"ts": 1699123460, "level": "error", "msg": "DB connection failed", "retry": 3}
A very common format for ingesting, exporting, and processing data in big data systems like Apache Spark, Hadoop, and data warehouses.
{"user_id": 1, "status": "active", "last_login": "2025-01-10"}
{"user_id": 2, "status": "inactive", "last_login": "2024-12-15"}
Track user events efficiently with one event per line.
{"action": "signup", "user": "alice"}
{"action": "purchase", "user": "bob"}
APIs that need to send a large, indeterminate number of results can stream them as JSONL, allowing the client to process results as they arrive without waiting for the entire response.
{"product_id": 1, "name": "Widget A", "price": 99.99, "stock": 45}
{"product_id": 2, "name": "Widget B", "price": 149.99, "stock": 12}
import json
# Read JSONL file line by line (memory efficient)
with open('data.jsonl', 'r') as f:
for line in f:
data = json.loads(line)
print(data['name'])
# Write JSONL file
with open('output.jsonl', 'w') as f:
for item in items:
f.write(json.dumps(item) + '\n')
const fs = require('fs');
const readline = require('readline');
// Read JSONL file with streams
const fileStream = fs.createReadStream('data.jsonl');
const rl = readline.createInterface({
input: fileStream,
crlfDelay: Infinity
});
for await (const line of rl) {
const data = JSON.parse(line);
console.log(data.name);
}
# Filter JSONL with jq
cat data.jsonl | jq 'select(.age > 30)'
# Convert JSONL to CSV
cat data.jsonl | jq -r '[.name, .age] | @csv'
# Count lines in JSONL
wc -l data.jsonl