Data Formats & Content Negotiation
JSON won, but the battle isn't over
In a nutshell
APIs need to send data in some format, and that format affects speed, size, and who can read it. JSON is the default because it's easy for humans and machines. But sometimes you need something smaller (Protocol Buffers), something stricter (XML), or something optimized for huge datasets (CSV, Parquet) -- and HTTP headers let the client and server agree on which format to use.
The situation
Your API returns JSON. Everyone's happy — until the IoT team says their devices can't afford the parsing overhead. The data pipeline team wants CSV for bulk exports. A partner sends you XML because their system was built in 2008. And the mobile team asks if you can shave bytes off the response because users in emerging markets are on metered connections.
Same data, different constraints, different formats.
One user, four formats
Here's the same data — a user with a nested address — in each major format.
JSON
{
"id": "usr_8a3f",
"name": "Alice Chen",
"email": "alice@example.com",
"verified": true,
"address": {
"street": "42 Oak Avenue",
"city": "Portland",
"state": "OR",
"zip": "97201"
}
}Size: ~210 bytes. Human-readable, universally supported, native to JavaScript. The default for almost every API built after 2010.
XML
<?xml version="1.0" encoding="UTF-8"?>
<user>
<id>usr_8a3f</id>
<name>Alice Chen</name>
<email>alice@example.com</email>
<verified>true</verified>
<address>
<street>42 Oak Avenue</street>
<city>Portland</city>
<state>OR</state>
<zip>97201</zip>
</address>
</user>Size: ~320 bytes. More verbose, but supports schemas (XSD), namespaces, and attributes. Still dominant in enterprise, healthcare (HL7/FHIR), and financial services (ISO 20022).
Protocol Buffers (binary)
// user.proto — schema definition
syntax = "proto3";
message Address {
string street = 1;
string city = 2;
string state = 3;
string zip = 4;
}
message User {
string id = 1;
string name = 2;
string email = 3;
bool verified = 4;
Address address = 5;
}Size on the wire: ~95 bytes (binary encoding). Not human-readable, but dramatically smaller and faster to parse. The .proto file is the schema AND the documentation. Used by gRPC and for any performance-sensitive internal communication.
YAML
id: usr_8a3f
name: Alice Chen
email: alice@example.com
verified: true
address:
street: 42 Oak Avenue
city: Portland
state: OR
zip: "97201"Size: ~170 bytes. More readable than JSON (no braces, no quotes on most strings), but whitespace-significant and surprisingly tricky to parse correctly. Used for configuration files and API specifications (OpenAPI, AsyncAPI), rarely for wire formats.
Format is a trade-off, not a preference
JSON optimizes for developer experience. Protobuf optimizes for performance. XML optimizes for schema rigor. YAML optimizes for human readability. There's no universal winner — the right format depends on who (or what) is reading the data and how much bandwidth and CPU you can afford.
Content negotiation: letting clients choose
HTTP has a built-in mechanism for format negotiation. The client says what it wants; the server says what it's sending.
The headers
Accept— client tells the server what formats it can handleContent-Type— sender declares the format of the body (request or response)
In practice
# Client requests JSON (the most common case)
curl -H "Accept: application/json" \
https://api.example.com/users/usr_8a3f# Client requests XML
curl -H "Accept: application/xml" \
https://api.example.com/users/usr_8a3f# Client sends JSON, wants JSON back
curl -X POST \
-H "Content-Type: application/json" \
-H "Accept: application/json" \
-d '{"name": "Alice Chen", "email": "alice@example.com"}' \
https://api.example.com/users# Client says "I prefer JSON, but XML is fine too"
curl -H "Accept: application/json, application/xml;q=0.9" \
https://api.example.com/users/usr_8a3fThe q=0.9 is a quality factor — a preference weight from 0 to 1. The server uses it to pick the best match.
What the server does
- Parse the
Acceptheader - Compare against the formats it supports
- Return the best match with the corresponding
Content-Type - If no match: return
406 Not Acceptable
Keep it simple
Most APIs only support JSON. That's fine. If you only support one format, still set the Content-Type: application/json response header explicitly. Don't leave it to the framework's default behavior — be deliberate about your contract.
When to go beyond JSON
| Scenario | Better format | Why |
|---|---|---|
| Internal microservice calls (high throughput) | Protocol Buffers | 3-10x smaller, 20-100x faster parsing |
| IoT / embedded devices | MessagePack / CBOR | Binary JSON — same structure, smaller size, no parsing overhead |
| Bulk data export | CSV / Parquet | Tabular data for analytics pipelines, much smaller than JSON arrays |
| Configuration files | YAML / TOML | Human-readable, easy to edit by hand |
| Enterprise / government integration | XML | Required by partner systems, schema validation via XSD |
| Browser-to-server | JSON | Native to JavaScript, smallest adoption cost |
YAML edge cases will bite you
YAML looks simple but has surprising behavior. NO becomes a boolean false. 3.10 becomes the float 3.1. Norway's country code NO is parsed as false. This is why most APIs use JSON on the wire and YAML only for configuration that humans edit.
The pragmatic approach
- Default to JSON for all external and human-facing APIs
- Use Protocol Buffers when performance matters (internal services, mobile on slow networks)
- Support XML only when a partner requires it
- Always set
Content-Typeexplicitly in your responses - Version your content types if you need to evolve formats:
application/vnd.myapi.v2+json
Custom media types like application/vnd.myapi.v2+json are a powerful but underused feature. They let you version your response format independently from your URL structure — something we'll revisit when we talk about schema evolution.
Next up: specs as source of truth — how OpenAPI, AsyncAPI, and protobuf files turn your API contract from an idea into an enforceable artifact.