JSONL Glossary
Your comprehensive A-Z reference for JSON Lines terminology, data format concepts, and technical definitions.
Aggregation
The process of combining multiple JSONL records to compute summary statistics or derived values. Common aggregations include counting records, summing values, calculating averages, and finding min/max values.
Example:
jq -s 'map(.amount) | add' sales.jsonl # Sum all sales amounts
See also: Streaming, Slurp Mode, Batch Processing
API Response Format
JSONL used as a response format for APIs to stream large result sets. Allows clients to start processing results before the entire response completes, improving time-to-first-byte and reducing memory usage.
HTTP Headers:
Content-Type: application/x-ndjson
Transfer-Encoding: chunked
See also: Streaming, HTTP Streaming, Transfer-Encoding
Append-Only
A common pattern where new records are appended to the end of a JSONL file without modifying existing lines. This simplifies concurrent writes and creates an immutable audit trail. Particularly useful for event logs and time-series data.
Advantages: Thread-safe appends, simple backup strategies, natural time ordering, easy rollback to any point in history.
See also: Event Sourcing, Immutable Data, Log-Structured Storage
Backpressure
A flow control mechanism in streaming systems where a slow consumer signals to a fast producer to reduce the rate of data generation. Critical for processing large JSONL files without overwhelming memory or downstream systems.
In JSONL streaming pipelines, backpressure prevents producer from generating lines faster than consumer can process them, avoiding out-of-memory errors.
See also: Streaming, Flow Control, Buffering
Batch Processing
Processing JSONL files in groups of records rather than individually or all at once. Balances memory efficiency with processing performance. Common batch sizes range from 100 to 10,000 records depending on record size and available memory.
Python Example:
batch = []
for line in file:
batch.append(json.loads(line))
if len(batch) >= 1000:
process_batch(batch)
batch = []
See also: Chunking, Streaming, Windowing
BOM (Byte Order Mark)
A special character sequence (EF BB BF in UTF-8) sometimes added to the beginning of text files. JSONL explicitly prohibits BOM as it would corrupt the first JSON object. Always save JSONL files as "UTF-8 without BOM."
Warning: Many text editors (especially on Windows) add BOM by default. This will break JSONL parsing. Configure your editor to save as "UTF-8 without BOM."
See also: UTF-8, Encoding, Text Editors
Buffering
Temporarily storing JSONL lines in memory before writing to disk or sending over network. Reduces I/O operations and improves performance. Common strategies: line buffering (one line), block buffering (fixed size), or full buffering (entire file).
For JSONL generation, buffering multiple lines before flushing to disk can improve write performance by 10-100x depending on system.
See also: I/O Performance, Flushing, Write Throughput
Chunking
Dividing large JSONL files into smaller pieces (chunks) for parallel processing or distributed storage. Chunks typically contain a fixed number of lines or a maximum file size. Essential for processing files larger than available memory.
Split into 10,000-line chunks:
split -l 10000 large.jsonl chunk_
See also: Partitioning, Sharding, Parallel Processing
Compact Output
JSON serialized without whitespace, ensuring each object fits on a single line. Required for valid JSONL. Most JSON libraries provide a "compact" or "minify" option. In jq, use the -c flag.
Comparison:
Pretty: {"name": "Alice", "age": 30}
Compact: {"name":"Alice","age":30}
See also: Minification, Serialization, Pretty Print
Compression
JSONL text files compress extremely well with standard algorithms. Gzip typically achieves 5-10x compression on JSONL. Common practice: store JSONL.gz files and decompress on-the-fly during processing.
Best Practices:
- Use gzip for general-purpose compression (widely supported)
- Consider zstd for better compression ratios and speed
- Compress after generating JSONL, not line-by-line
- Many tools (jq, grep) can read .gz files directly
See also: Gzip, Storage Efficiency, Archive Formats
Concatenation
Combining multiple JSONL files by simply appending them together. Unlike JSON arrays, JSONL files can be concatenated with basic file operations. This makes merging datasets trivial.
cat file1.jsonl file2.jsonl file3.jsonl > combined.jsonl
See also: Merging, Append-Only, File Operations
CRLF (Carriage Return Line Feed)
Windows-style line ending using two characters: CR (\r) followed by LF (\n). JSONL specification allows both LF and CRLF as line separators to accommodate different operating systems. Tools should accept both formats.
When generating JSONL, prefer LF-only for consistency and smaller file size. Most modern tools handle both transparently.
See also: LF (Line Feed), Line Separator, Unix vs Windows
Data Pipeline
A series of data processing steps where output from one stage feeds into the next. JSONL is ideal for pipelines due to its streaming nature. Each stage can process records independently without buffering the entire dataset.
Example Pipeline:
cat raw.jsonl | jq 'select(.status == "active")' | jq '{id, name, email}' > processed.jsonl
See also: ETL, Stream Processing, Unix Pipes
Delimiter
The character or sequence separating records. In JSONL, the delimiter is the newline character (LF or CRLF). Unlike CSV which uses commas, JSONL's newline delimiter is unambiguous and doesn't require escaping.
The newline delimiter is why JSONL requires compact JSON output - newlines within the JSON itself would break record boundaries.
See also: Line Separator, Newline, Record Boundary
Deserialization
Converting JSONL text into programming language data structures. Each line is deserialized independently using a standard JSON parser. Deserialization happens line-by-line during streaming, avoiding the need to load the entire file.
Python Example:
with open('data.jsonl') as f:
for line in f:
obj = json.loads(line) # Deserialization
print(obj['name'])
See also: Parsing, Serialization, JSON.parse
Encoding
JSONL files must use UTF-8 character encoding. This is non-negotiable per the specification. UTF-8 provides universal character support, efficient ASCII compatibility, and is the standard encoding for JSON.
Critical: Using other encodings (Latin-1, UTF-16, etc.) will create invalid JSONL files that may appear to work but will fail on international characters.
See also: UTF-8, BOM, Character Set
ETL (Extract, Transform, Load)
A data integration pattern where data is extracted from sources, transformed into a target format, and loaded into a destination system. JSONL is commonly used as the intermediate format in ETL pipelines due to its streamability and tool compatibility.
JSONL enables "streaming ETL" where transformation and loading happen continuously without waiting for complete extraction, reducing end-to-end latency.
See also: Data Pipeline, Transformation, Data Integration
Event Sourcing
An architectural pattern where state changes are stored as a sequence of events rather than current state. JSONL is perfect for event stores - each line represents an immutable event, and the file becomes a complete audit trail.
By replaying JSONL event logs, you can reconstruct system state at any point in time. Common in financial systems, audit logging, and distributed systems.
See also: Append-Only, Immutable Data, Audit Trail
Filtering
Selecting a subset of JSONL records based on criteria. Unlike SQL databases, JSONL filtering happens during streaming - each line is evaluated independently. Tools like jq provide powerful filtering capabilities without loading the entire file.
Example - Filter active users:
jq -c 'select(.status == "active")' users.jsonl
See also: jq, Streaming, Data Pipeline
Gzip
The most common compression algorithm for JSONL files. Gzip can reduce JSONL file sizes by 80-90% due to JSON's repetitive structure. Files are typically saved as .jsonl.gz and most JSONL tools can read gzipped files directly.
gzip large.jsonl # Creates large.jsonl.gz
zcat large.jsonl.gz | jq '.field' # Process without decompressing
See also: Compression, Storage Efficiency, Archive Formats
HTTP Streaming
Delivering JSONL data over HTTP with chunked transfer encoding, allowing clients to process records as they arrive. Common in APIs that return large result sets. The server sends one JSON object per line, and the client processes each line incrementally.
Response Headers:
Content-Type: application/x-ndjson
Transfer-Encoding: chunked
Cache-Control: no-cache
See also: API Response Format, Streaming, Transfer-Encoding
Immutable Data
Data that cannot be changed after creation. JSONL naturally supports immutable patterns through append-only writes. Each line represents a permanent record, and historical data is preserved by adding new lines rather than modifying existing ones.
Immutability simplifies concurrent access, enables time-travel debugging, and creates reliable audit trails. Common in event sourcing and blockchain systems.
See also: Append-Only, Event Sourcing, Audit Trail
Interoperability
The ability to exchange data between different systems and programming languages. JSONL excels at interoperability because it's text-based, uses the universal JSON format, and doesn't require specialized libraries or binary parsers.
Every modern programming language can read and write JSONL using built-in JSON support and basic file I/O. No protocol buffers, schemas, or code generation required.
See also: UTF-8, Platform Independence, Data Exchange
jq
The de facto command-line JSON processor and the most powerful tool for working with JSONL. Created by Stephen Dolan, jq provides a domain-specific language for filtering, transforming, and aggregating JSON data with blazing speed.
Essential jq Flags for JSONL:
-cCompact output (required for JSONL generation)-sSlurp mode (read all lines into array for aggregation)-rRaw output (extract string values without quotes)-eExit with error on invalid JSON (for validation)
See also: Command-Line Tools, Filtering, Transformation
JSON (JavaScript Object Notation)
The underlying data format for JSONL. Each line in a JSONL file must be valid JSON. JSON was specified by Douglas Crockford in 2001 and became the universal data interchange format for web APIs.
JSONL inherits all JSON data types: objects, arrays, strings, numbers, booleans, and null. However, JSONL lines typically contain objects rather than primitive values.
See also: JSON Lines, NDJSON, RFC 8259
JSON Lines (JSONL)
A text format where each line is a valid JSON value, typically an object. The name emphasizes "lines" as the key differentiator from standard JSON. Also known as NDJSON (Newline Delimited JSON), LDJSON (Line Delimited JSON), or simply "newline-delimited JSON."
Three Simple Rules:
- UTF-8 encoding
- Each line is valid JSON
- Line separator is \n or \r\n
Naming Convention: All these terms refer to the same format: JSONL, NDJSON, LDJSON, JSON Lines, Newline-Delimited JSON, and Line-Delimited JSON. The community has largely standardized on "JSON Lines" (JSONL) as the preferred name.
See also: NDJSON, LDJSON, Newline-Delimited JSON, Line-Delimited JSON
JSON Sequences (RFC 7464)
A related but distinct standard for streaming JSON, defined in RFC 7464. Uses the ASCII Record Separator character (0x1E) before each JSON value instead of newlines. While similar in concept to JSONL, JSON Sequences are less commonly used due to the non-printable delimiter.
JSON Sequences allow embedded newlines in pretty-printed JSON, but JSONL's simpler newline delimiter has become more popular in practice.
See also: RFC 7464, Record Separator, JSONL Alternatives
JSON Streaming
A general term for processing JSON data incrementally without loading it entirely into memory. JSONL is the most popular format for JSON streaming, but the term can also refer to streaming parsers for regular JSON arrays or other streaming JSON formats.
While "JSON streaming" is broader than JSONL, most implementations use JSONL/NDJSON as the wire format due to its simplicity and line-based nature.
See also: Streaming, JSONL, HTTP Streaming, Incremental Processing
LDJSON (Line-Delimited JSON)
Another name for the same format as JSONL and NDJSON. The term emphasizes "line-delimited" to clarify that lines (not commas or other delimiters) separate records. Less commonly used than JSONL or NDJSON, but occasionally seen in library names and documentation.
Historical Context:
The multiple names emerged independently as different communities developed line-based JSON formats. They all converged on the same specification, making JSONL, NDJSON, and LDJSON completely interchangeable terms.
See also: JSONL, NDJSON, Line-Delimited JSON, Naming Conventions
LF (Line Feed)
The Unix/Mac line ending character (\n, ASCII 10). LF is the preferred line separator for JSONL files. While CRLF is also valid per the specification, LF-only endings result in smaller files and are the standard in modern development.
Modern text editors and version control systems handle LF consistently across platforms, making it the de facto standard even on Windows.
See also: CRLF, Newline Character, Line Separator
Line-by-Line Processing
Processing JSONL files one line at a time, the fundamental pattern that enables JSONL's memory efficiency. Each line is read, parsed, processed, and discarded before moving to the next line. Memory usage stays constant regardless of file size.
Python Pattern:
with open('large.jsonl') as f:
for line in f: # Streaming, not loading entire file
record = json.loads(line)
process(record) # Memory freed after each iteration
See also: Streaming, Memory Efficiency, Incremental Processing
Line-Delimited JSON
A descriptive term for the JSONL format that emphasizes the use of line breaks as delimiters between JSON records. Unlike CSV's comma delimiter or TSV's tab delimiter, line-delimited formats avoid the complexity of escaping delimiters within data.
The line-delimited approach is what makes JSONL so powerful - you can process it with standard Unix line-based tools like grep, sed, head, tail, and wc without any JSON-aware parsing.
See also: JSONL, LDJSON, Newline-Delimited JSON, Delimiter
Memory Efficiency
JSONL's primary advantage over JSON arrays - the ability to process arbitrarily large datasets with constant memory usage. By processing one line at a time, JSONL applications can handle files larger than available RAM.
Example: A 100GB JSONL file can be processed with just a few MB of memory by reading and processing lines sequentially. The same data as a JSON array would require 100GB+ of RAM just to parse.
See also: Streaming, Line-by-Line Processing, Scalability
MIME Type
The media type identifier used in HTTP Content-Type headers and email systems. JSONL has several commonly used MIME types, though none are officially registered with IANA. The most common are application/x-ndjson and application/jsonl.
Common MIME Types:
application/x-ndjson- Most widely supportedapplication/jsonl- Cleaner but less commonapplication/x-jsonlines- Explicit alternativetext/jsonl- Text-based variant
See also: HTTP Headers, Content-Type, API Response Format
NDJSON (Newline Delimited JSON)
Alternative name for the same format as JSON Lines. NDJSON emphasizes the newline delimiter aspect. In practice, JSONL and NDJSON refer to identical formats and are fully interchangeable. Also sometimes written as "Newline-Delimited JSON."
Common Usage:
Many programming libraries use "ndjson" in their package names (e.g., ndjson-parse, python-ndjson), while documentation often refers to "JSON Lines" or "JSONL" file extensions. Both terms are widely recognized and used interchangeably.
The community has largely converged on "JSON Lines" (JSONL) as the preferred name, but NDJSON remains popular, especially in library names and MIME types (application/x-ndjson).
See also: JSON Lines, LDJSON, Newline-Delimited JSON, Naming Conventions
Newline Character
The line separator in JSONL files. Can be LF (\n, ASCII 10) on Unix/Mac or CRLF (\r\n, ASCII 13+10) on Windows. JSONL parsers must accept both formats to ensure cross-platform compatibility.
Within JSON objects, \n is escaped as "\\n" and doesn't create a line break, preventing ambiguity with record delimiters.
See also: LF, CRLF, Line Separator, Delimiter
Newline-Delimited JSON
The full, descriptive name for NDJSON format. Explicitly states that newline characters serve as delimiters between JSON records. This term is often used in formal documentation and specifications to be maximally clear about the format structure.
While verbose, "Newline-Delimited JSON" leaves no ambiguity about how the format works, making it useful in educational contexts and formal specifications.
See also: NDJSON, JSONL, Line-Delimited JSON
One-Line JSON
A common way to describe the requirement that each JSON object in JSONL must be serialized without internal line breaks. Each complete JSON value must fit on a single line, which is why compact (minified) JSON output is required.
This one-line constraint is what enables JSONL to be processed line-by-line with simple text tools, as line boundaries unambiguously indicate record boundaries.
See also: Compact Output, Minification, Line Separator
Parallel Processing
Processing multiple JSONL lines simultaneously across multiple CPU cores or machines. JSONL's line-independence makes it perfectly suited for parallelization - each line can be processed without knowledge of other lines.
GNU Parallel Example:
cat large.jsonl | parallel --pipe -N1000 process_batch.py
See also: MapReduce, Distributed Processing, Chunking
Parsing
Converting JSONL text into data structures. Each line is parsed independently using a standard JSON parser. Parsing happens incrementally during streaming, allowing processing of files larger than available memory.
Common parsing errors: malformed JSON on a line, non-UTF-8 encoding, embedded newlines in strings (should be escaped as \\n), or BOM at file start.
See also: Deserialization, Validation, Error Handling
Partitioning
Dividing JSONL data into separate files based on a key (e.g., date, category, user ID). Common in data lakes and warehouses. Time-based partitioning (year/month/day/hour) enables efficient range queries without scanning irrelevant data.
Example structure: logs/2025/11/11/events.jsonl
See also: Sharding, Data Organization, Query Optimization
Record-Oriented JSON
A descriptive term for JSONL that emphasizes its structure as a sequence of independent records. Unlike document-oriented JSON (a single object or array), record-oriented formats treat each line as a discrete, self-contained record.
This record-oriented structure is what makes JSONL ideal for databases, log files, and data pipelines where each record can be processed independently.
See also: JSONL, Line-by-Line Processing, Independent Records
RFC (Request for Comments)
A document series used to define internet standards. Relevant RFCs for JSONL include RFC 8259 (JSON specification) and RFC 7464 (JSON Sequences). While JSONL itself doesn't have a formal RFC, it follows the JSON standard defined in RFC 8259.
Related RFCs:
- RFC 8259: The JSON Data Interchange Format (2017)
- RFC 7464: JavaScript Object Notation (JSON) Text Sequences (2015)
- RFC 4627: Previous JSON specification (superseded by RFC 8259)
See also: JSON Sequences, Standards, Specification
RFC 7464 (JSON Sequences)
An official IETF standard for streaming JSON that uses ASCII Record Separator (RS, 0x1E) characters instead of newlines. While similar in purpose to JSONL, RFC 7464 is less widely adopted due to its non-printable delimiter and more complex parsing requirements.
JSONL's simpler newline-based approach has proven more popular despite lacking a formal RFC, demonstrating that practical simplicity often trumps official standardization.
See also: JSON Sequences, RFC, Alternative Formats
Schema
A definition of the expected structure and data types in JSONL records. While JSONL itself is schema-less (each line can have different fields), many applications enforce schemas for data quality. JSON Schema is commonly used for JSONL validation.
Schema-less flexibility allows JSONL to handle evolving data models, but validation ensures data quality in production systems.
See also: Validation, JSON Schema, Data Quality
Serialization
Converting data structures into JSONL text format. Each object is serialized to compact JSON and written as a separate line. The inverse operation of deserialization/parsing.
Python Serialization:
with open('output.jsonl', 'w') as f:
for item in data:
f.write(json.dumps(item) + '\n')
See also: Deserialization, JSON.stringify, Encoding
Slurp Mode
A processing mode (especially in jq) where all JSONL lines are read into memory as an array before processing. Enables aggregations and operations across all records but requires sufficient memory. Use the -s flag in jq.
Warning: Only use slurp mode on files that fit comfortably in memory. For large files, prefer streaming operations or batch processing.
See also: jq, Aggregation, Memory Usage
Streaming
Processing JSONL data incrementally as it's read, without loading the entire dataset into memory. The fundamental advantage of JSONL over JSON arrays. Enables processing of arbitrarily large files with constant memory usage.
Streaming is why JSONL scales to petabyte-sized datasets - you process one line at a time, so memory usage stays constant regardless of file size.
See also: Line-by-Line Processing, Memory Efficiency, Scalability
Text-Based Format
JSONL is a text format (not binary), making it human-readable, debuggable, and processable with standard text tools. Unlike binary formats like Protocol Buffers or MessagePack, JSONL files can be viewed in any text editor and manipulated with tools like grep, sed, and awk.
While binary formats may be more compact or faster to parse, text-based formats excel at debugging, tooling compatibility, and long-term archival where readability matters.
See also: UTF-8, Human-Readable, Debugging
Transformation
Converting JSONL data from one structure to another. Transformations can filter fields, rename properties, compute derived values, or reshape objects. Tools like jq excel at JSONL transformations with their streaming-friendly design.
Example Transformation:
# Extract specific fields and rename
jq -c '{userId: .id, fullName: (.firstName + " " + .lastName), active: .status == "active"}' users.jsonl
See also: jq, Data Pipeline, ETL, Mapping
Transfer-Encoding
An HTTP header that specifies how response data is encoded during transmission. JSONL APIs commonly use "Transfer-Encoding: chunked" to stream results without knowing the total size upfront. This enables infinite streams and reduces latency.
HTTP Streaming Response:
HTTP/1.1 200 OK
Content-Type: application/x-ndjson
Transfer-Encoding: chunked
{"id":1,"name":"Alice"}
{"id":2,"name":"Bob"}
...
See also: HTTP Streaming, API Response Format, Chunked Transfer
UTF-8
The mandatory character encoding for JSONL files. UTF-8 is a variable-width encoding that represents all Unicode characters while maintaining ASCII compatibility. Developed by Ken Thompson and Rob Pike in 1992.
Why UTF-8 for JSONL:
- Universal character support (all languages)
- Backward compatible with ASCII
- Self-synchronizing (errors don't propagate)
- Standard for JSON specification
See also: Encoding, BOM, Character Set
Unix Pipes
A mechanism for connecting the output of one command to the input of another. JSONL works beautifully with Unix pipes because line-based text is the native data format for Unix tools. Enables composable data processing pipelines.
cat data.jsonl | grep "error" | jq '.timestamp' | sort | uniq -c
See also: Data Pipeline, Command-Line Tools, Composability
Validation
Verifying that a JSONL file is correctly formatted. Two levels: syntax validation (each line is valid JSON) and schema validation (data matches expected structure). Critical for production data pipelines to catch errors early.
Quick Validation with jq:
jq -e . data.jsonl > /dev/null && echo "Valid" || echo "Invalid"
See also: Schema, JSON Schema, Data Quality
Whitespace
Spaces, tabs, and other non-visible characters. JSONL requires compact JSON output without internal newlines, but spaces and tabs are allowed within a line. However, for optimal file size, most implementations remove all unnecessary whitespace.
Leading and trailing whitespace on lines is typically ignored by parsers, though it's best practice to avoid it.
See also: Compact Output, Minification, File Size
Windowing
Processing JSONL data in time-based or count-based windows. Common in stream processing frameworks. For example, computing statistics over 5-minute windows of events or processing every 1000 records as a batch.
Tumbling windows: non-overlapping fixed-size segments. Sliding windows: overlapping segments. Session windows: variable-size based on activity gaps.
See also: Stream Processing, Batch Processing, Time Series
Quick Reference
Essential Terms
- JSONL: JSON Lines format
- NDJSON: Newline Delimited JSON (same as JSONL)
- LDJSON: Line Delimited JSON (same as JSONL)
- Streaming: Processing data incrementally
- UTF-8: Mandatory character encoding
- jq: Command-line JSON processor
File Extensions
.jsonl- Standard extension.ndjson- Alternative extension.jsonl.gz- Compressed JSONL.json- Sometimes used (not recommended)
MIME Types
application/jsonlapplication/x-jsonlinesapplication/x-ndjsontext/jsonl
Common Commands
jq -c '.[]' file.json > file.jsonljq -s '.' file.jsonl > file.jsonwc -l file.jsonl- Count recordsgzip file.jsonl- Compress file
Continue Learning About JSONL
Explore practical examples, tools, and real-world implementations.