JSONL History & Standards - JSON Lines Evolution & Specification

The Pre-JSONL Era (1990s-2009)

The challenges that led to the creation of JSON Lines format.

The Data Streaming Problem

Before JSONL, developers faced significant challenges when working with streaming JSON data. Traditional JSON arrays required complete parsing before any processing could begin, creating bottlenecks in data pipelines.

Memory Constraints

Loading multi-gigabyte JSON arrays into memory was impractical and often impossible on production servers.

Slow Processing

Waiting for entire datasets to load before processing meant delayed insights and slow pipelines.

Format Fragmentation

Each organization created custom delimited formats, leading to incompatibility and vendor lock-in.

Existing Solutions and Their Limitations

CSV (Comma-Separated Values)

The de facto standard for tabular data since the 1970s.

Pros:

Simple and widely supported
Human-readable
Streamable line-by-line

Cons:

No nested structures
No data types (everything is text)
Escaping issues with delimiters

XML (Extensible Markup Language)

Dominant structured data format in the 1990s and early 2000s.

Pros:

Nested structures
Schema validation (XSD)
Mature tooling

Cons:

Verbose and bloated
Difficult to stream
Complex parsing requirements

JSON Arrays

The emerging standard for web APIs in the late 2000s.

Pros:

Lightweight syntax
Native JavaScript support
Nested structures

Cons:

Not streamable
Requires complete parse
Memory intensive for large datasets

The JSON Revolution (2001-2009)

2001

Douglas Crockford specifies JSON format, derived from JavaScript object literal syntax.

2005-2006

AJAX (Asynchronous JavaScript and XML) popularizes JSON for web APIs, gradually replacing XML.

2007-2009

JSON becomes the dominant format for web APIs. Companies struggle with large JSON datasets in log processing and data pipelines.

By 2009, JSON had won the data interchange format war for web APIs, but the question remained: how do we efficiently stream large collections of JSON objects?

Birth of JSON Lines (2010-2012)

The emergence of a simple yet powerful solution to the streaming JSON problem.

The Brilliant Simplicity

Around 2010-2011, multiple developers independently arrived at the same elegant solution: what if we simply put one JSON object per line?

The Core Principle

{"id": 1, "name": "Alice"}

{"id": 2, "name": "Bob"}

{"id": 3, "name": "Charlie"}

Each line is valid JSON. Each line is independent. Process line-by-line. That's it.

A Format by Many Names

The format emerged organically across different communities, each giving it their own name.

JSON Lines (JSONL)

The name that would eventually become the standard. Simple, descriptive, and self-explanatory.

Used by: Early adopters, documentation sites, general community

Newline-Delimited JSON (NDJSON)

Emphasized the delimiter mechanism. More technically precise terminology.

Used by: Data engineering communities, ETL tool developers

Line-Delimited JSON (LDJSON)

Similar to NDJSON, less commonly used variant.

Used by: Some database vendors, scattered tooling

JSON Streaming / JSON Sequence

Used informally before standardization, often confused with other streaming JSON approaches.

Used by: Early blog posts, internal documentation

While "JSON Lines" and "NDJSON" are often used interchangeably today, they refer to the same format. JSONL became the more popular shorthand.

Early Adopters (2011-2013)

Log Processing Systems

The first major use case. Log aggregation systems needed to process millions of log entries per second without loading everything into memory.

Tools like Logstash (later part of the Elastic Stack) embraced JSONL for log shipping and processing pipelines.

Data Science Community

Python data scientists working with large datasets found JSONL perfect for streaming data processing.

The format worked seamlessly with Python's line-by-line file reading patterns, making it a natural fit.

Database Import/Export Tools

NoSQL databases like MongoDB needed efficient bulk import/export formats.

MongoDB's mongoimport and mongoexport tools added JSONL support, allowing for efficient data migration.

Formalization & Standardization (2013-2015)

The community comes together to document and standardize the format.

The Need for Specification

As adoption grew, inconsistencies emerged. Some implementations used different line endings (LF vs CRLF), others had different rules about empty lines or whitespace. The community needed a clear specification.

UTF-8 Encoding

Mandate UTF-8 as the standard character encoding, ensuring international compatibility.

Line Separator Clarity

Define LF (\n) or CRLF (\r\n) as acceptable line separators, accommodating different platforms.

No BOM (Byte Order Mark)

Explicitly disallow BOM to prevent parsing issues and maintain clean line-by-line processing.

One JSON Per Line

Clarify that each line must contain exactly one valid JSON value (typically an object).

Community Documentation Efforts

jsonlines.org (2013)

The community-created website that became the de facto documentation source. Provided clear examples and explained the format's benefits.

ndjson.org

Parallel documentation effort under the NDJSON name, essentially describing the same format with slightly different terminology.

GitHub Specifications

Various GitHub repositories provided reference implementations and test suites, helping implementations stay consistent.

Unlike JSON (which went through formal ECMA standardization), JSONL remained a community-driven format. This flexibility allowed rapid evolution and adoption.

RFC Discussions and Related Standards

RFC 7464 - JSON Text Sequences (2015)

The IETF published RFC 7464, describing a similar but distinct format using the RS (Record Separator, ASCII 0x1E) character.

Key Difference:

RFC 7464 uses <RS>JSON<LF> format, while JSONL uses simple JSON<LF>. The RFC format never gained widespread adoption due to the RS character being less standard-tool friendly.

Why JSONL Won

Simpler format - works with standard Unix tools (grep, sed, awk)
Already had widespread adoption before RFC 7464 was published
Human-readable line separators (visible newlines vs invisible RS character)
Compatible with existing text editors and version control systems

Widespread Adoption (2015-2020)

JSONL becomes the standard across industries and platforms.

Cloud Platform Adoption

Amazon Web Services (AWS)

AWS adopted JSONL across multiple services:

Amazon Athena: JSONL as a native format for querying S3 data
AWS Kinesis: Stream processing with JSONL records
CloudWatch Logs: JSONL export format for log analysis
Amazon SageMaker: JSONL for training data in machine learning

Google Cloud Platform (GCP)

Google embraced JSONL for big data:

BigQuery: JSONL as primary import/export format
Cloud Storage: Optimized handling for JSONL files
Dataflow: Native JSONL support in streaming pipelines
Cloud Logging: JSONL export for log analytics

Microsoft Azure

Azure integrated JSONL across analytics:

Azure Data Lake: JSONL storage and querying
Azure Stream Analytics: Real-time JSONL processing
Azure Monitor: JSONL log exports

Database System Integration

MongoDB

mongoimport/mongoexport utilities with native JSONL support for bulk operations

PostgreSQL

COPY command with JSONL format support for efficient data loading

Elasticsearch

Bulk API using JSONL for high-performance indexing

Apache Kafka

JSONL as standard format for message serialization

ClickHouse

JSONEachRow format (JSONL) for fast analytics ingestion

Redis

Redis Modules using JSONL for bulk data operations

Programming Language Libraries

By 2018, every major programming language had mature JSONL libraries.

Python

jsonlines, pandas, ndjson packages

JavaScript

ndjson, split2, JSONStream

Go

encoding/json with bufio

Java

Jackson, Gson with streaming

Ruby

jsonl gem

Rust

serde_json with BufReader

Industry Sector Adoption

Machine Learning & AI

JSONL became standard for training datasets, model outputs, and MLOps pipelines. Hugging Face, OpenAI, and others standardized on JSONL for dataset distribution.

Analytics & Business Intelligence

Data warehouses and BI tools adopted JSONL for ETL processes, making semi-structured data analysis more accessible.

Observability & Monitoring

Log aggregation platforms (Splunk, Datadog, New Relic) embraced JSONL for structured logging and metrics ingestion.

DevOps & CI/CD

Build logs, deployment records, and infrastructure events standardized on JSONL for parsing and analysis.

Modern Era & Future (2020-Present)

JSONL as the universal standard for streaming structured data.

The AI & LLM Revolution (2022-Present)

Large Language Models (LLMs) and modern AI systems dramatically increased JSONL adoption. Training datasets containing billions of examples needed efficient storage and streaming - JSONL was the perfect fit.

OpenAI & ChatGPT

OpenAI standardized on JSONL for:

Fine-tuning datasets for GPT models
Batch API requests and responses
Training data preparation and validation
Model evaluation and benchmarking

Hugging Face

The leading ML model hub uses JSONL for:

Dataset distribution (thousands of datasets in JSONL format)
Streaming datasets API for efficient data loading
Model training pipelines and fine-tuning

Anthropic (Claude)

JSONL for context windows, training data, and API interactions

The AI boom of 2022-2025 made JSONL ubiquitous in machine learning workflows, cementing its status as the standard format for large-scale structured data.

Current State of JSONL

Universal Adoption

Supported by all major cloud platforms
Standard in data engineering workflows
Default for ML/AI datasets
First-class support in analytics tools

Ecosystem Maturity

Libraries in every major language
Built-in support in Unix tools (jq)
IDE extensions and syntax highlighting
Extensive documentation and tutorials

Future Directions

Real-Time Analytics

JSONL becoming the standard for real-time data pipelines and stream processing frameworks.

Edge Computing

IoT devices and edge systems adopting JSONL for efficient data transmission to cloud systems.

Distributed Systems

Microservices architectures using JSONL for inter-service communication and event streaming.

Schema Evolution

Emerging standards for schema versioning and validation within JSONL files, combining flexibility with structure.

JSONL Timeline at a Glance

Pre-2010

Streaming JSON problem identified

2010-2012

Format emerges organically, early adopters

2013-2015

Community documentation, specifications published

2015-2020

Widespread adoption - cloud platforms, databases, tools

2020-Present

Universal standard - AI/ML boom, LLMs, modern data systems

Why JSONL Succeeded

Simplicity

One JSON object per line. That's the entire specification. Simple enough to implement in minutes, powerful enough for petabytes of data.

Tool Compatibility

Works with standard Unix tools (grep, sed, awk), text editors, version control, and every JSON parser ever written.

Performance

Stream processing without loading entire datasets into memory. Process millions of records per second on commodity hardware.

Explore More About JSONL

Dive deeper into the format that powers modern data infrastructure.

Read Specification View Examples

The History of JSONL

The Pre-JSONL Era (1990s-2009)

The Data Streaming Problem

Memory Constraints

Slow Processing

Format Fragmentation

Existing Solutions and Their Limitations

CSV (Comma-Separated Values)

XML (Extensible Markup Language)

JSON Arrays

The JSON Revolution (2001-2009)

2001

2005-2006

2007-2009

Birth of JSON Lines (2010-2012)

The Brilliant Simplicity

The Core Principle

A Format by Many Names

JSON Lines (JSONL)

Newline-Delimited JSON (NDJSON)

Line-Delimited JSON (LDJSON)

JSON Streaming / JSON Sequence

Early Adopters (2011-2013)

Log Processing Systems

Data Science Community

Database Import/Export Tools

Formalization & Standardization (2013-2015)

The Need for Specification

UTF-8 Encoding

Line Separator Clarity

No BOM (Byte Order Mark)

One JSON Per Line

Community Documentation Efforts

jsonlines.org (2013)

ndjson.org

GitHub Specifications

RFC Discussions and Related Standards

RFC 7464 - JSON Text Sequences (2015)

Why JSONL Won

Widespread Adoption (2015-2020)

Cloud Platform Adoption

Amazon Web Services (AWS)

Google Cloud Platform (GCP)

Microsoft Azure

Database System Integration

MongoDB

PostgreSQL

Elasticsearch

Apache Kafka

ClickHouse

Redis

Programming Language Libraries

Python

JavaScript

Go

Java

Ruby

Rust

Industry Sector Adoption

Machine Learning & AI

Analytics & Business Intelligence

Observability & Monitoring

DevOps & CI/CD

Modern Era & Future (2020-Present)

The AI & LLM Revolution (2022-Present)

OpenAI & ChatGPT

Hugging Face

Anthropic (Claude)

Current State of JSONL

Universal Adoption

Ecosystem Maturity

Future Directions

Real-Time Analytics

Edge Computing

Distributed Systems

Schema Evolution

JSONL Timeline at a Glance

Why JSONL Succeeded

Simplicity

Tool Compatibility