JSONL Tools & Utilities
Master the complete toolkit for working with JSON Lines data - from command-line utilities to IDE extensions, validators, converters, and more.
jq - The Swiss Army Knife for JSONL
jq is the most powerful command-line JSON processor, perfect for JSONL data manipulation, filtering, and transformation.
Basic jq Usage with JSONL
jq processes each line of JSONL independently, making it ideal for streaming large datasets.
Pretty Print Each Line
cat data.jsonl | jq '.'
Extract Specific Field
cat users.jsonl | jq '.email'
Output: One email per line
Filter by Condition
cat users.jsonl | jq 'select(.age > 21)'
Returns only records where age is greater than 21
Compact Output
cat data.jsonl | jq -c '.'
Ensures each JSON object is on a single line (JSONL format)
Advanced jq Queries
Transform Objects
cat users.jsonl | jq '{name: .name, email: .email, active: .status == "active"}'
Create new objects with selected and computed fields
Array Operations
cat orders.jsonl | jq '.items | length'
Count items in an array field
Nested Field Access
cat data.jsonl | jq '.user.profile.avatar_url'
Access deeply nested fields with dot notation
Multiple Filters
cat events.jsonl | jq 'select(.type == "purchase" and .amount > 100)'
Combine multiple conditions with and/or
String Manipulation
cat users.jsonl | jq '.email | split("@") | .[1]'
Extract domain from email addresses
Date Formatting
cat events.jsonl | jq '.timestamp | strftime("%Y-%m-%d")'
Convert Unix timestamps to readable dates
Aggregation with jq Slurp Mode
Use -s or --slurp to read entire JSONL file into an array for aggregation.
Count Total Records
cat data.jsonl | jq -s 'length'
Sum Values
cat sales.jsonl | jq -s 'map(.amount) | add'
Calculate total sales amount
Average Calculation
cat scores.jsonl | jq -s 'map(.score) | add/length'
Group By Field
cat events.jsonl | jq -s 'group_by(.category) | map({category: .[0].category, count: length})'
Count events by category
Find Maximum
cat data.jsonl | jq -s 'max_by(.value)'
jq Best Practices for JSONL
-
Always use
-cflag for compact output when generating JSONL - Process line-by-line for large files to avoid memory issues
-
Use
select()early in pipeline to reduce data volume - Test queries on small sample before running on full dataset
-
Use
--slurponly when aggregation is necessary
Command Line Tools for JSONL
Unix command-line utilities provide powerful ways to process JSONL files efficiently.
grep - Search JSONL Files
Find Lines Containing Text
grep "error" logs.jsonl
Return all lines containing "error"
Case-Insensitive Search
grep -i "warning" logs.jsonl
Count Matching Lines
grep -c "status.*active" users.jsonl
Invert Match (Exclude Lines)
grep -v "test" data.jsonl > production.jsonl
Remove all lines containing "test"
Regular Expression Search
grep -E '"age":\s*[5-9][0-9]' users.jsonl
Find users with age 50-99
awk - Pattern Processing
Print Specific Lines
awk 'NR >= 10 && NR <= 20' data.jsonl
Print lines 10 through 20
Filter by Line Length
awk 'length($0) < 1000' data.jsonl
Keep only lines shorter than 1000 characters
Sample Every Nth Line
awk 'NR % 10 == 0' data.jsonl
Take every 10th line (10% sample)
Add Line Numbers
awk '{print NR": "$0}' data.jsonl
sed - Stream Editor
Replace Text in Place
sed 's/"status":"pending"/"status":"active"/g' data.jsonl
Delete Specific Lines
sed '/test_user/d' users.jsonl
Remove lines containing "test_user"
Extract Range of Lines
sed -n '100,200p' data.jsonl
Standard Unix Tools
head - First N Lines
head -n 100 data.jsonl
tail - Last N Lines
tail -n 100 data.jsonl
wc - Count Lines
wc -l data.jsonl
sort - Sort Lines
sort data.jsonl > sorted.jsonl
Alphabetically sort JSONL lines
uniq - Remove Duplicates
sort data.jsonl | uniq > unique.jsonl
split - Divide Large Files
split -l 10000 large.jsonl chunk_
Split into files of 10,000 lines each
JSONL Validators
Ensure your JSONL files are properly formatted and valid before processing.
Command Line Validation
jq Validation
cat data.jsonl | jq -e . > /dev/null && echo "Valid JSONL" || echo "Invalid JSONL"
Exit code 0 if all lines are valid JSON
Line-by-Line Validation
awk '{if (system("echo '\''" $0 "'\'' | jq . > /dev/null 2>&1") != 0) print "Invalid JSON on line", NR}' data.jsonl
Python json.tool
while IFS= read -r line; do echo "$line" | python -m json.tool > /dev/null || echo "Error on line $line"; done < data.jsonl
Python Validation Script
import json
import sys
def validate_jsonl(filepath):
errors = []
with open(filepath, 'r', encoding='utf-8') as f:
for line_num, line in enumerate(f, 1):
line = line.strip()
if not line:
continue
try:
json.loads(line)
except json.JSONDecodeError as e:
errors.append(f"Line {line_num}: {e.msg}")
if errors:
print(f"Found {len(errors)} errors:")
for error in errors:
print(error)
return False
else:
print("Valid JSONL file!")
return True
if __name__ == "__main__":
if len(sys.argv) != 2:
print("Usage: python validate_jsonl.py ")
sys.exit(1)
is_valid = validate_jsonl(sys.argv[1])
sys.exit(0 if is_valid else 1)
Node.js Validation Script
const fs = require('fs');
const readline = require('readline');
async function validateJSONL(filepath) {
const fileStream = fs.createReadStream(filepath);
const rl = readline.createInterface({
input: fileStream,
crlfDelay: Infinity
});
let lineNum = 0;
let errors = [];
for await (const line of rl) {
lineNum++;
if (!line.trim()) continue;
try {
JSON.parse(line);
} catch (e) {
errors.push(`Line ${lineNum}: ${e.message}`);
}
}
if (errors.length > 0) {
console.log(`Found ${errors.length} errors:`);
errors.forEach(err => console.log(err));
process.exit(1);
} else {
console.log('Valid JSONL file!');
process.exit(0);
}
}
const filepath = process.argv[2];
if (!filepath) {
console.log('Usage: node validate-jsonl.js ');
process.exit(1);
}
validateJSONL(filepath);
Schema Validation
Validate JSONL against a JSON Schema to ensure data structure compliance.
Python with jsonschema
import json
from jsonschema import validate, ValidationError
schema = {
"type": "object",
"properties": {
"id": {"type": "number"},
"name": {"type": "string"},
"email": {"type": "string", "format": "email"}
},
"required": ["id", "name"]
}
with open('data.jsonl', 'r') as f:
for line_num, line in enumerate(f, 1):
try:
obj = json.loads(line)
validate(instance=obj, schema=schema)
except ValidationError as e:
print(f"Line {line_num}: Schema validation failed - {e.message}")
Format Converters
Convert between JSONL and other popular data formats seamlessly.
JSON Array to JSONL
Using jq
jq -c '.[]' data.json > data.jsonl
Converts JSON array to JSONL format
Python Script
import json
with open('data.json', 'r') as fin, open('data.jsonl', 'w') as fout:
data = json.load(fin)
for item in data:
fout.write(json.dumps(item) + '\n')
JSONL to JSON Array
Using jq
jq -s '.' data.jsonl > data.json
Python Script
import json
items = []
with open('data.jsonl', 'r') as f:
for line in f:
items.append(json.loads(line))
with open('data.json', 'w') as f:
json.dump(items, f, indent=2)
CSV to JSONL
Python with csv module
import csv
import json
with open('data.csv', 'r') as fin, open('data.jsonl', 'w') as fout:
reader = csv.DictReader(fin)
for row in reader:
fout.write(json.dumps(row) + '\n')
Using pandas
import pandas as pd
df = pd.read_csv('data.csv')
df.to_json('data.jsonl', orient='records', lines=True)
JSONL to CSV
Using jq and csv
jq -r '[.name, .email, .age] | @csv' data.jsonl > data.csv
Python Script
import json
import csv
# Read all records to get all possible fields
records = []
with open('data.jsonl', 'r') as f:
for line in f:
records.append(json.loads(line))
# Get all unique keys
all_keys = set()
for record in records:
all_keys.update(record.keys())
# Write CSV
with open('data.csv', 'w', newline='') as f:
writer = csv.DictWriter(f, fieldnames=sorted(all_keys))
writer.writeheader()
writer.writerows(records)
Using pandas
import pandas as pd
df = pd.read_json('data.jsonl', lines=True)
df.to_csv('data.csv', index=False)
XML to JSONL
import xml.etree.ElementTree as ET
import json
tree = ET.parse('data.xml')
root = tree.getroot()
with open('data.jsonl', 'w') as f:
for item in root.findall('.//item'):
obj = {child.tag: child.text for child in item}
f.write(json.dumps(obj) + '\n')
Parquet to JSONL
import pandas as pd
df = pd.read_parquet('data.parquet')
df.to_json('data.jsonl', orient='records', lines=True)
Python Tools for JSONL
Leverage Python's rich ecosystem for powerful JSONL processing.
Standard Library - json module
Read JSONL File
import json
with open('data.jsonl', 'r') as f:
for line in f:
obj = json.loads(line)
print(obj)
Write JSONL File
import json
data = [
{"id": 1, "name": "Alice"},
{"id": 2, "name": "Bob"}
]
with open('output.jsonl', 'w') as f:
for item in data:
f.write(json.dumps(item) + '\n')
Stream Processing
import json
def process_large_jsonl(filepath):
with open(filepath, 'r') as f:
for line in f:
obj = json.loads(line)
# Process each object without loading entire file
yield obj
for item in process_large_jsonl('large_data.jsonl'):
# Process item
pass
pandas - Data Analysis
Read JSONL into DataFrame
import pandas as pd
df = pd.read_json('data.jsonl', lines=True)
Write DataFrame to JSONL
import pandas as pd
df.to_json('output.jsonl', orient='records', lines=True)
Chunked Reading for Large Files
import pandas as pd
chunk_size = 10000
for chunk in pd.read_json('large.jsonl', lines=True, chunksize=chunk_size):
# Process each chunk
print(f"Processing {len(chunk)} records")
# Your processing logic here
jsonlines Library
Dedicated library for JSONL with clean API. Install: pip install jsonlines
Read JSONL
import jsonlines
with jsonlines.open('data.jsonl') as reader:
for obj in reader:
print(obj)
Write JSONL
import jsonlines
data = [{"id": 1}, {"id": 2}]
with jsonlines.open('output.jsonl', mode='w') as writer:
writer.write_all(data)
Append to JSONL
import jsonlines
with jsonlines.open('data.jsonl', mode='a') as writer:
writer.write({"id": 3, "name": "Charlie"})
Node.js Tools for JSONL
Process JSONL efficiently in JavaScript and Node.js environments.
Node.js Built-in Modules
Read JSONL with readline
const fs = require('fs');
const readline = require('readline');
const rl = readline.createInterface({
input: fs.createReadStream('data.jsonl'),
crlfDelay: Infinity
});
rl.on('line', (line) => {
const obj = JSON.parse(line);
console.log(obj);
});
Write JSONL
const fs = require('fs');
const data = [
{id: 1, name: 'Alice'},
{id: 2, name: 'Bob'}
];
const stream = fs.createWriteStream('output.jsonl');
data.forEach(item => {
stream.write(JSON.stringify(item) + '\n');
});
stream.end();
Async Iterator (Node 10+)
const fs = require('fs');
const readline = require('readline');
async function processLineByLine() {
const fileStream = fs.createReadStream('data.jsonl');
const rl = readline.createInterface({
input: fileStream,
crlfDelay: Infinity
});
for await (const line of rl) {
const obj = JSON.parse(line);
// Process obj
}
}
processLineByLine();
ndjson Package
Streaming JSONL parser. Install: npm install ndjson
Parse Stream
const ndjson = require('ndjson');
const fs = require('fs');
fs.createReadStream('data.jsonl')
.pipe(ndjson.parse())
.on('data', obj => {
console.log(obj);
});
Stringify to JSONL
const ndjson = require('ndjson');
const fs = require('fs');
const stringify = ndjson.stringify();
stringify.pipe(fs.createWriteStream('output.jsonl'));
stringify.write({id: 1, name: 'Alice'});
stringify.write({id: 2, name: 'Bob'});
stringify.end();
split2 - Line Splitting
Stream-based line splitter. Install: npm install split2
const fs = require('fs');
const split = require('split2');
fs.createReadStream('data.jsonl')
.pipe(split(JSON.parse))
.on('data', obj => {
console.log(obj);
});
IDE Extensions & Editors
Enhance your development workflow with IDE support for JSONL files.
Visual Studio Code
Recommended Extensions
- JSONL Formatter - Auto-format JSONL files
- JSONLines Support - Syntax highlighting for .jsonl files
- Rainbow CSV - View JSONL in table format
- Prettify JSON - Format individual JSON lines
settings.json Configuration
{
"files.associations": {
"*.jsonl": "jsonl",
"*.ndjson": "jsonl"
},
"[jsonl]": {
"editor.defaultFormatter": "esbenp.prettier-vscode",
"editor.formatOnSave": false
}
}
Associate .jsonl files and disable auto-format (to preserve line format)
Custom Tasks
{
"version": "2.0.0",
"tasks": [
{
"label": "Validate JSONL",
"type": "shell",
"command": "cat ${file} | jq -e . > /dev/null",
"problemMatcher": []
}
]
}
Sublime Text
Packages
- Pretty JSON - Format JSON lines individually
- JSONL Syntax - Syntax highlighting support
Custom Syntax Definition
Create JSONL.sublime-syntax:
%YAML 1.2
---
name: JSON Lines
file_extensions: [jsonl, ndjson]
scope: source.jsonl
contexts:
main:
- include: 'scope:source.json'
IntelliJ IDEA / PyCharm
File Type Association
Settings → Editor → File Types → JSON → Add *.jsonl and *.ndjson patterns
Plugins
- JSON Helper - Enhanced JSON editing
- String Manipulation - Quick JSON formatting
Online JSONL Tools
JSONL Formatter Online
Web-based formatter for quick validation and pretty-printing
jqplay.org
Test jq queries in your browser with JSONL input
CSV to JSONL Converter
Browser-based conversion tools for common formats
Best Practices
Do's
- Use streaming processing for large files
- Validate JSONL before processing in production
- Use jq -c flag for compact output
- Test tools on sample data first
- Keep backups before transformation
Don'ts
- Don't load entire JSONL into memory if streaming is possible
- Don't use jq --slurp on multi-GB files
- Don't forget to handle encoding (always use UTF-8)
- Don't ignore validation errors in production pipelines
Ready to Master JSONL Tools?
Explore more resources and examples to become a JSONL expert.