The History of JSONL
From humble beginnings to universal adoption - discover how JSON Lines became the standard for streaming structured data across the modern web.
The Pre-JSONL Era (1990s-2009)
The challenges that led to the creation of JSON Lines format.
The Data Streaming Problem
Before JSONL, developers faced significant challenges when working with streaming JSON data. Traditional JSON arrays required complete parsing before any processing could begin, creating bottlenecks in data pipelines.
Memory Constraints
Loading multi-gigabyte JSON arrays into memory was impractical and often impossible on production servers.
Slow Processing
Waiting for entire datasets to load before processing meant delayed insights and slow pipelines.
Format Fragmentation
Each organization created custom delimited formats, leading to incompatibility and vendor lock-in.
Existing Solutions and Their Limitations
CSV (Comma-Separated Values)
The de facto standard for tabular data since the 1970s.
Pros:
- Simple and widely supported
- Human-readable
- Streamable line-by-line
Cons:
- No nested structures
- No data types (everything is text)
- Escaping issues with delimiters
XML (Extensible Markup Language)
Dominant structured data format in the 1990s and early 2000s.
Pros:
- Nested structures
- Schema validation (XSD)
- Mature tooling
Cons:
- Verbose and bloated
- Difficult to stream
- Complex parsing requirements
JSON Arrays
The emerging standard for web APIs in the late 2000s.
Pros:
- Lightweight syntax
- Native JavaScript support
- Nested structures
Cons:
- Not streamable
- Requires complete parse
- Memory intensive for large datasets
The JSON Revolution (2001-2009)
2001
Douglas Crockford specifies JSON format, derived from JavaScript object literal syntax.
2005-2006
AJAX (Asynchronous JavaScript and XML) popularizes JSON for web APIs, gradually replacing XML.
2007-2009
JSON becomes the dominant format for web APIs. Companies struggle with large JSON datasets in log processing and data pipelines.
By 2009, JSON had won the data interchange format war for web APIs, but the question remained: how do we efficiently stream large collections of JSON objects?
Birth of JSON Lines (2010-2012)
The emergence of a simple yet powerful solution to the streaming JSON problem.
The Brilliant Simplicity
Around 2010-2011, multiple developers independently arrived at the same elegant solution: what if we simply put one JSON object per line?
The Core Principle
Each line is valid JSON. Each line is independent. Process line-by-line. That's it.
A Format by Many Names
The format emerged organically across different communities, each giving it their own name.
JSON Lines (JSONL)
The name that would eventually become the standard. Simple, descriptive, and self-explanatory.
Used by: Early adopters, documentation sites, general community
Newline-Delimited JSON (NDJSON)
Emphasized the delimiter mechanism. More technically precise terminology.
Used by: Data engineering communities, ETL tool developers
Line-Delimited JSON (LDJSON)
Similar to NDJSON, less commonly used variant.
Used by: Some database vendors, scattered tooling
JSON Streaming / JSON Sequence
Used informally before standardization, often confused with other streaming JSON approaches.
Used by: Early blog posts, internal documentation
While "JSON Lines" and "NDJSON" are often used interchangeably today, they refer to the same format. JSONL became the more popular shorthand.
Early Adopters (2011-2013)
Log Processing Systems
The first major use case. Log aggregation systems needed to process millions of log entries per second without loading everything into memory.
Tools like Logstash (later part of the Elastic Stack) embraced JSONL for log shipping and processing pipelines.
Data Science Community
Python data scientists working with large datasets found JSONL perfect for streaming data processing.
The format worked seamlessly with Python's line-by-line file reading patterns, making it a natural fit.
Database Import/Export Tools
NoSQL databases like MongoDB needed efficient bulk import/export formats.
MongoDB's mongoimport and mongoexport tools added JSONL support, allowing for efficient data migration.
Formalization & Standardization (2013-2015)
The community comes together to document and standardize the format.
The Need for Specification
As adoption grew, inconsistencies emerged. Some implementations used different line endings (LF vs CRLF), others had different rules about empty lines or whitespace. The community needed a clear specification.
UTF-8 Encoding
Mandate UTF-8 as the standard character encoding, ensuring international compatibility.
Line Separator Clarity
Define LF (\n) or CRLF (\r\n) as acceptable line separators, accommodating different platforms.
No BOM (Byte Order Mark)
Explicitly disallow BOM to prevent parsing issues and maintain clean line-by-line processing.
One JSON Per Line
Clarify that each line must contain exactly one valid JSON value (typically an object).
Community Documentation Efforts
jsonlines.org (2013)
The community-created website that became the de facto documentation source. Provided clear examples and explained the format's benefits.
ndjson.org
Parallel documentation effort under the NDJSON name, essentially describing the same format with slightly different terminology.
GitHub Specifications
Various GitHub repositories provided reference implementations and test suites, helping implementations stay consistent.
Unlike JSON (which went through formal ECMA standardization), JSONL remained a community-driven format. This flexibility allowed rapid evolution and adoption.
RFC Discussions and Related Standards
RFC 7464 - JSON Text Sequences (2015)
The IETF published RFC 7464, describing a similar but distinct format using the RS (Record Separator, ASCII 0x1E) character.
Key Difference:
RFC 7464 uses <RS>JSON<LF> format, while JSONL uses simple JSON<LF>. The RFC format never gained widespread adoption due to the RS character being less standard-tool friendly.
Why JSONL Won
- Simpler format - works with standard Unix tools (grep, sed, awk)
- Already had widespread adoption before RFC 7464 was published
- Human-readable line separators (visible newlines vs invisible RS character)
- Compatible with existing text editors and version control systems
Widespread Adoption (2015-2020)
JSONL becomes the standard across industries and platforms.
Cloud Platform Adoption
Amazon Web Services (AWS)
AWS adopted JSONL across multiple services:
- Amazon Athena: JSONL as a native format for querying S3 data
- AWS Kinesis: Stream processing with JSONL records
- CloudWatch Logs: JSONL export format for log analysis
- Amazon SageMaker: JSONL for training data in machine learning
Google Cloud Platform (GCP)
Google embraced JSONL for big data:
- BigQuery: JSONL as primary import/export format
- Cloud Storage: Optimized handling for JSONL files
- Dataflow: Native JSONL support in streaming pipelines
- Cloud Logging: JSONL export for log analytics
Microsoft Azure
Azure integrated JSONL across analytics:
- Azure Data Lake: JSONL storage and querying
- Azure Stream Analytics: Real-time JSONL processing
- Azure Monitor: JSONL log exports
Database System Integration
MongoDB
mongoimport/mongoexport utilities with native JSONL support for bulk operations
PostgreSQL
COPY command with JSONL format support for efficient data loading
Elasticsearch
Bulk API using JSONL for high-performance indexing
Apache Kafka
JSONL as standard format for message serialization
ClickHouse
JSONEachRow format (JSONL) for fast analytics ingestion
Redis
Redis Modules using JSONL for bulk data operations
Programming Language Libraries
By 2018, every major programming language had mature JSONL libraries.
Python
jsonlines, pandas, ndjson packages
JavaScript
ndjson, split2, JSONStream
Go
encoding/json with bufio
Java
Jackson, Gson with streaming
Ruby
jsonl gem
Rust
serde_json with BufReader
Industry Sector Adoption
Machine Learning & AI
JSONL became standard for training datasets, model outputs, and MLOps pipelines. Hugging Face, OpenAI, and others standardized on JSONL for dataset distribution.
Analytics & Business Intelligence
Data warehouses and BI tools adopted JSONL for ETL processes, making semi-structured data analysis more accessible.
Observability & Monitoring
Log aggregation platforms (Splunk, Datadog, New Relic) embraced JSONL for structured logging and metrics ingestion.
DevOps & CI/CD
Build logs, deployment records, and infrastructure events standardized on JSONL for parsing and analysis.
Modern Era & Future (2020-Present)
JSONL as the universal standard for streaming structured data.
The AI & LLM Revolution (2022-Present)
Large Language Models (LLMs) and modern AI systems dramatically increased JSONL adoption. Training datasets containing billions of examples needed efficient storage and streaming - JSONL was the perfect fit.
OpenAI & ChatGPT
OpenAI standardized on JSONL for:
- Fine-tuning datasets for GPT models
- Batch API requests and responses
- Training data preparation and validation
- Model evaluation and benchmarking
Hugging Face
The leading ML model hub uses JSONL for:
- Dataset distribution (thousands of datasets in JSONL format)
- Streaming datasets API for efficient data loading
- Model training pipelines and fine-tuning
Anthropic (Claude)
JSONL for context windows, training data, and API interactions
The AI boom of 2022-2025 made JSONL ubiquitous in machine learning workflows, cementing its status as the standard format for large-scale structured data.
Current State of JSONL
Universal Adoption
- Supported by all major cloud platforms
- Standard in data engineering workflows
- Default for ML/AI datasets
- First-class support in analytics tools
Ecosystem Maturity
- Libraries in every major language
- Built-in support in Unix tools (jq)
- IDE extensions and syntax highlighting
- Extensive documentation and tutorials
Future Directions
Real-Time Analytics
JSONL becoming the standard for real-time data pipelines and stream processing frameworks.
Edge Computing
IoT devices and edge systems adopting JSONL for efficient data transmission to cloud systems.
Distributed Systems
Microservices architectures using JSONL for inter-service communication and event streaming.
Schema Evolution
Emerging standards for schema versioning and validation within JSONL files, combining flexibility with structure.
JSONL Timeline at a Glance
Streaming JSON problem identified
Format emerges organically, early adopters
Community documentation, specifications published
Widespread adoption - cloud platforms, databases, tools
Universal standard - AI/ML boom, LLMs, modern data systems
Why JSONL Succeeded
Simplicity
One JSON object per line. That's the entire specification. Simple enough to implement in minutes, powerful enough for petabytes of data.
Tool Compatibility
Works with standard Unix tools (grep, sed, awk), text editors, version control, and every JSON parser ever written.
Performance
Stream processing without loading entire datasets into memory. Process millions of records per second on commodity hardware.
Explore More About JSONL
Dive deeper into the format that powers modern data infrastructure.