JSONL Tools & Utilities

Master the complete toolkit for working with JSON Lines data - from command-line utilities to IDE extensions, validators, converters, and more.

jq - The Swiss Army Knife for JSONL

jq is the most powerful command-line JSON processor, perfect for JSONL data manipulation, filtering, and transformation.

Basic jq Usage with JSONL

jq processes each line of JSONL independently, making it ideal for streaming large datasets.

Pretty Print Each Line

cat data.jsonl | jq '.'

Extract Specific Field

cat users.jsonl | jq '.email'

Output: One email per line

Filter by Condition

cat users.jsonl | jq 'select(.age > 21)'

Returns only records where age is greater than 21

Compact Output

cat data.jsonl | jq -c '.'

Ensures each JSON object is on a single line (JSONL format)

Advanced jq Queries

Transform Objects

cat users.jsonl | jq '{name: .name, email: .email, active: .status == "active"}'

Create new objects with selected and computed fields

Array Operations

cat orders.jsonl | jq '.items | length'

Count items in an array field

Nested Field Access

cat data.jsonl | jq '.user.profile.avatar_url'

Access deeply nested fields with dot notation

Multiple Filters

cat events.jsonl | jq 'select(.type == "purchase" and .amount > 100)'

Combine multiple conditions with and/or

String Manipulation

cat users.jsonl | jq '.email | split("@") | .[1]'

Extract domain from email addresses

Date Formatting

cat events.jsonl | jq '.timestamp | strftime("%Y-%m-%d")'

Convert Unix timestamps to readable dates

Aggregation with jq Slurp Mode

Use -s or --slurp to read entire JSONL file into an array for aggregation.

Count Total Records

cat data.jsonl | jq -s 'length'

Sum Values

cat sales.jsonl | jq -s 'map(.amount) | add'

Calculate total sales amount

Average Calculation

cat scores.jsonl | jq -s 'map(.score) | add/length'

Group By Field

cat events.jsonl | jq -s 'group_by(.category) | map({category: .[0].category, count: length})'

Count events by category

Find Maximum

cat data.jsonl | jq -s 'max_by(.value)'

jq Best Practices for JSONL

  • Always use -c flag for compact output when generating JSONL
  • Process line-by-line for large files to avoid memory issues
  • Use select() early in pipeline to reduce data volume
  • Test queries on small sample before running on full dataset
  • Use --slurp only when aggregation is necessary

Command Line Tools for JSONL

Unix command-line utilities provide powerful ways to process JSONL files efficiently.

grep - Search JSONL Files

Find Lines Containing Text

grep "error" logs.jsonl

Return all lines containing "error"

Case-Insensitive Search

grep -i "warning" logs.jsonl

Count Matching Lines

grep -c "status.*active" users.jsonl

Invert Match (Exclude Lines)

grep -v "test" data.jsonl > production.jsonl

Remove all lines containing "test"

Regular Expression Search

grep -E '"age":\s*[5-9][0-9]' users.jsonl

Find users with age 50-99

awk - Pattern Processing

Print Specific Lines

awk 'NR >= 10 && NR <= 20' data.jsonl

Print lines 10 through 20

Filter by Line Length

awk 'length($0) < 1000' data.jsonl

Keep only lines shorter than 1000 characters

Sample Every Nth Line

awk 'NR % 10 == 0' data.jsonl

Take every 10th line (10% sample)

Add Line Numbers

awk '{print NR": "$0}' data.jsonl

sed - Stream Editor

Replace Text in Place

sed 's/"status":"pending"/"status":"active"/g' data.jsonl

Delete Specific Lines

sed '/test_user/d' users.jsonl

Remove lines containing "test_user"

Extract Range of Lines

sed -n '100,200p' data.jsonl

Standard Unix Tools

head - First N Lines

head -n 100 data.jsonl

tail - Last N Lines

tail -n 100 data.jsonl

wc - Count Lines

wc -l data.jsonl

sort - Sort Lines

sort data.jsonl > sorted.jsonl

Alphabetically sort JSONL lines

uniq - Remove Duplicates

sort data.jsonl | uniq > unique.jsonl

split - Divide Large Files

split -l 10000 large.jsonl chunk_

Split into files of 10,000 lines each

JSONL Validators

Ensure your JSONL files are properly formatted and valid before processing.

Command Line Validation

jq Validation

cat data.jsonl | jq -e . > /dev/null && echo "Valid JSONL" || echo "Invalid JSONL"

Exit code 0 if all lines are valid JSON

Line-by-Line Validation

awk '{if (system("echo '\''" $0 "'\'' | jq . > /dev/null 2>&1") != 0) print "Invalid JSON on line", NR}' data.jsonl

Python json.tool

while IFS= read -r line; do echo "$line" | python -m json.tool > /dev/null || echo "Error on line $line"; done < data.jsonl

Python Validation Script

import json
import sys

def validate_jsonl(filepath):
    errors = []
    with open(filepath, 'r', encoding='utf-8') as f:
        for line_num, line in enumerate(f, 1):
            line = line.strip()
            if not line:
                continue
            try:
                json.loads(line)
            except json.JSONDecodeError as e:
                errors.append(f"Line {line_num}: {e.msg}")

    if errors:
        print(f"Found {len(errors)} errors:")
        for error in errors:
            print(error)
        return False
    else:
        print("Valid JSONL file!")
        return True

if __name__ == "__main__":
    if len(sys.argv) != 2:
        print("Usage: python validate_jsonl.py ")
        sys.exit(1)

    is_valid = validate_jsonl(sys.argv[1])
    sys.exit(0 if is_valid else 1)

Node.js Validation Script

const fs = require('fs');
const readline = require('readline');

async function validateJSONL(filepath) {
    const fileStream = fs.createReadStream(filepath);
    const rl = readline.createInterface({
        input: fileStream,
        crlfDelay: Infinity
    });

    let lineNum = 0;
    let errors = [];

    for await (const line of rl) {
        lineNum++;
        if (!line.trim()) continue;

        try {
            JSON.parse(line);
        } catch (e) {
            errors.push(`Line ${lineNum}: ${e.message}`);
        }
    }

    if (errors.length > 0) {
        console.log(`Found ${errors.length} errors:`);
        errors.forEach(err => console.log(err));
        process.exit(1);
    } else {
        console.log('Valid JSONL file!');
        process.exit(0);
    }
}

const filepath = process.argv[2];
if (!filepath) {
    console.log('Usage: node validate-jsonl.js ');
    process.exit(1);
}

validateJSONL(filepath);

Schema Validation

Validate JSONL against a JSON Schema to ensure data structure compliance.

Python with jsonschema

import json
from jsonschema import validate, ValidationError

schema = {
    "type": "object",
    "properties": {
        "id": {"type": "number"},
        "name": {"type": "string"},
        "email": {"type": "string", "format": "email"}
    },
    "required": ["id", "name"]
}

with open('data.jsonl', 'r') as f:
    for line_num, line in enumerate(f, 1):
        try:
            obj = json.loads(line)
            validate(instance=obj, schema=schema)
        except ValidationError as e:
            print(f"Line {line_num}: Schema validation failed - {e.message}")

Format Converters

Convert between JSONL and other popular data formats seamlessly.

JSON Array to JSONL

Using jq

jq -c '.[]' data.json > data.jsonl

Converts JSON array to JSONL format

Python Script

import json

with open('data.json', 'r') as fin, open('data.jsonl', 'w') as fout:
    data = json.load(fin)
    for item in data:
        fout.write(json.dumps(item) + '\n')

JSONL to JSON Array

Using jq

jq -s '.' data.jsonl > data.json

Python Script

import json

items = []
with open('data.jsonl', 'r') as f:
    for line in f:
        items.append(json.loads(line))

with open('data.json', 'w') as f:
    json.dump(items, f, indent=2)

CSV to JSONL

Python with csv module

import csv
import json

with open('data.csv', 'r') as fin, open('data.jsonl', 'w') as fout:
    reader = csv.DictReader(fin)
    for row in reader:
        fout.write(json.dumps(row) + '\n')

Using pandas

import pandas as pd

df = pd.read_csv('data.csv')
df.to_json('data.jsonl', orient='records', lines=True)

JSONL to CSV

Using jq and csv

jq -r '[.name, .email, .age] | @csv' data.jsonl > data.csv

Python Script

import json
import csv

# Read all records to get all possible fields
records = []
with open('data.jsonl', 'r') as f:
    for line in f:
        records.append(json.loads(line))

# Get all unique keys
all_keys = set()
for record in records:
    all_keys.update(record.keys())

# Write CSV
with open('data.csv', 'w', newline='') as f:
    writer = csv.DictWriter(f, fieldnames=sorted(all_keys))
    writer.writeheader()
    writer.writerows(records)

Using pandas

import pandas as pd

df = pd.read_json('data.jsonl', lines=True)
df.to_csv('data.csv', index=False)

XML to JSONL

import xml.etree.ElementTree as ET
import json

tree = ET.parse('data.xml')
root = tree.getroot()

with open('data.jsonl', 'w') as f:
    for item in root.findall('.//item'):
        obj = {child.tag: child.text for child in item}
        f.write(json.dumps(obj) + '\n')

Parquet to JSONL

import pandas as pd

df = pd.read_parquet('data.parquet')
df.to_json('data.jsonl', orient='records', lines=True)

Python Tools for JSONL

Leverage Python's rich ecosystem for powerful JSONL processing.

Standard Library - json module

Read JSONL File

import json

with open('data.jsonl', 'r') as f:
    for line in f:
        obj = json.loads(line)
        print(obj)

Write JSONL File

import json

data = [
    {"id": 1, "name": "Alice"},
    {"id": 2, "name": "Bob"}
]

with open('output.jsonl', 'w') as f:
    for item in data:
        f.write(json.dumps(item) + '\n')

Stream Processing

import json

def process_large_jsonl(filepath):
    with open(filepath, 'r') as f:
        for line in f:
            obj = json.loads(line)
            # Process each object without loading entire file
            yield obj

for item in process_large_jsonl('large_data.jsonl'):
    # Process item
    pass

pandas - Data Analysis

Read JSONL into DataFrame

import pandas as pd

df = pd.read_json('data.jsonl', lines=True)

Write DataFrame to JSONL

import pandas as pd

df.to_json('output.jsonl', orient='records', lines=True)

Chunked Reading for Large Files

import pandas as pd

chunk_size = 10000
for chunk in pd.read_json('large.jsonl', lines=True, chunksize=chunk_size):
    # Process each chunk
    print(f"Processing {len(chunk)} records")
    # Your processing logic here

jsonlines Library

Dedicated library for JSONL with clean API. Install: pip install jsonlines

Read JSONL

import jsonlines

with jsonlines.open('data.jsonl') as reader:
    for obj in reader:
        print(obj)

Write JSONL

import jsonlines

data = [{"id": 1}, {"id": 2}]

with jsonlines.open('output.jsonl', mode='w') as writer:
    writer.write_all(data)

Append to JSONL

import jsonlines

with jsonlines.open('data.jsonl', mode='a') as writer:
    writer.write({"id": 3, "name": "Charlie"})

Node.js Tools for JSONL

Process JSONL efficiently in JavaScript and Node.js environments.

Node.js Built-in Modules

Read JSONL with readline

const fs = require('fs');
const readline = require('readline');

const rl = readline.createInterface({
    input: fs.createReadStream('data.jsonl'),
    crlfDelay: Infinity
});

rl.on('line', (line) => {
    const obj = JSON.parse(line);
    console.log(obj);
});

Write JSONL

const fs = require('fs');

const data = [
    {id: 1, name: 'Alice'},
    {id: 2, name: 'Bob'}
];

const stream = fs.createWriteStream('output.jsonl');
data.forEach(item => {
    stream.write(JSON.stringify(item) + '\n');
});
stream.end();

Async Iterator (Node 10+)

const fs = require('fs');
const readline = require('readline');

async function processLineByLine() {
    const fileStream = fs.createReadStream('data.jsonl');
    const rl = readline.createInterface({
        input: fileStream,
        crlfDelay: Infinity
    });

    for await (const line of rl) {
        const obj = JSON.parse(line);
        // Process obj
    }
}

processLineByLine();

ndjson Package

Streaming JSONL parser. Install: npm install ndjson

Parse Stream

const ndjson = require('ndjson');
const fs = require('fs');

fs.createReadStream('data.jsonl')
    .pipe(ndjson.parse())
    .on('data', obj => {
        console.log(obj);
    });

Stringify to JSONL

const ndjson = require('ndjson');
const fs = require('fs');

const stringify = ndjson.stringify();
stringify.pipe(fs.createWriteStream('output.jsonl'));

stringify.write({id: 1, name: 'Alice'});
stringify.write({id: 2, name: 'Bob'});
stringify.end();

split2 - Line Splitting

Stream-based line splitter. Install: npm install split2

const fs = require('fs');
const split = require('split2');

fs.createReadStream('data.jsonl')
    .pipe(split(JSON.parse))
    .on('data', obj => {
        console.log(obj);
    });

IDE Extensions & Editors

Enhance your development workflow with IDE support for JSONL files.

Visual Studio Code

Recommended Extensions

  • JSONL Formatter - Auto-format JSONL files
  • JSONLines Support - Syntax highlighting for .jsonl files
  • Rainbow CSV - View JSONL in table format
  • Prettify JSON - Format individual JSON lines

settings.json Configuration

{
    "files.associations": {
        "*.jsonl": "jsonl",
        "*.ndjson": "jsonl"
    },
    "[jsonl]": {
        "editor.defaultFormatter": "esbenp.prettier-vscode",
        "editor.formatOnSave": false
    }
}

Associate .jsonl files and disable auto-format (to preserve line format)

Custom Tasks

{
    "version": "2.0.0",
    "tasks": [
        {
            "label": "Validate JSONL",
            "type": "shell",
            "command": "cat ${file} | jq -e . > /dev/null",
            "problemMatcher": []
        }
    ]
}

Sublime Text

Packages

  • Pretty JSON - Format JSON lines individually
  • JSONL Syntax - Syntax highlighting support

Custom Syntax Definition

Create JSONL.sublime-syntax:

%YAML 1.2
---
name: JSON Lines
file_extensions: [jsonl, ndjson]
scope: source.jsonl

contexts:
  main:
    - include: 'scope:source.json'

IntelliJ IDEA / PyCharm

File Type Association

Settings → Editor → File Types → JSON → Add *.jsonl and *.ndjson patterns

Plugins

  • JSON Helper - Enhanced JSON editing
  • String Manipulation - Quick JSON formatting

Online JSONL Tools

JSONL Formatter Online

Web-based formatter for quick validation and pretty-printing

jqplay.org

Test jq queries in your browser with JSONL input

CSV to JSONL Converter

Browser-based conversion tools for common formats

Best Practices

Do's

  • Use streaming processing for large files
  • Validate JSONL before processing in production
  • Use jq -c flag for compact output
  • Test tools on sample data first
  • Keep backups before transformation

Don'ts

  • Don't load entire JSONL into memory if streaming is possible
  • Don't use jq --slurp on multi-GB files
  • Don't forget to handle encoding (always use UTF-8)
  • Don't ignore validation errors in production pipelines

Ready to Master JSONL Tools?

Explore more resources and examples to become a JSONL expert.