Building Real-Time Data Pipelines from Hyperliquid

A good Hyperliquid data pipeline is less about “AI trading” and more about keeping clean market records before you ever ask Claude or GPT what they think.

Hyperliquid gives builders a free, fast API: public market data through https://api.hyperliquid.xyz/info, signed trading actions through https://api.hyperliquid.xyz/exchange, and live streams through wss://api.hyperliquid.xyz/ws. That is enough to build a useful local feed for order books, trades, candles, and AI-readable market summaries.

The part beginners usually miss is reconciliation. A WebSocket stream is not magic. Connections drop. Messages can be missed. Local order books can drift. If you feed messy data into an LLM, the answer may sound confident while being based on a broken book.

Quick takeaway: Use Hyperliquid WebSockets for live data, REST for snapshots and history, store compressed metrics locally, and only send Claude or GPT small summaries instead of raw order books.

Start with the right Hyperliquid endpoints

Hyperliquid’s public data API is simple. You POST JSON to https://api.hyperliquid.xyz/info with a type field. No API key is needed for public reads, per the Hyperliquid API docs verified in the research file on May 21, 2026.

For a basic data pipeline, the useful read endpoints are:

allMids: returns mid prices for all perpetual markets. The sample response had 230 coins as of May 2026. Weight: 2.
l2Book: returns the order book for one coin, with price px, size sz, and number of orders n at each level. Weight: 2.
candleSnapshot: returns OHLCV candles for intervals like 1m, 5m, 15m, 1h, 4h, and 1d. Weight: 20 plus more for larger responses.
fundingHistory: returns funding rates and premiums. It requires startTime. Weight: 20 plus more for larger responses.
meta: returns the trading universe, including szDecimals, name, maxLeverage, and marginTableId. The sample research had 230 assets in the universe as of May 2026.
metaAndAssetCtxs: combines market metadata with context like funding, open interest, mark price, premium, and daily notional volume. Weight: 20.

For live data, use the WebSocket endpoint:

wss://api.hyperliquid.xyz/ws

The documented subscription format is:

{"method":"subscribe","subscription":{"type":"l2Book","coin":"BTC"}}

The main public channels from the research are:

l2Book: live L2 book for one coin.
trades: recent executions for one coin.
allMids: mid prices for all coins.
candle: live candles for one coin and interval.
funding: funding rate data for perpetuals.
activeAssetCtx: active asset context, noted as internal use in the research.

There are authenticated channels too, including userFills, userState, orderUpdates, webData2, notification, userFundings, and userNonFundingLedgerUpdates. Those require signed authentication. If you are only building a market data pipeline, stay public at first.

Use WebSockets for live data, REST for repair

REST and WebSockets solve different problems.

Use WebSockets for live order books, trades, mids, fills, and user state.
Use REST for historical candles, funding history, one-time account queries, and periodic sanity checks.
Use /exchange only for signed trading actions like placing or canceling orders.

Hyperliquid’s WebSocket server sends {"channel":"pong"} roughly every 5 seconds, per the research material. Your client can also send {"method":"ping"}. That heartbeat matters. If your process stops seeing messages, you want to reconnect before your local state gets stale.

The rate limits are generous for a small local builder, but they are still real. Hyperliquid’s GitBook docs dated April 28, 2026 list:

1200 REST weight per minute per IP.
10 WebSocket connections per IP.
30 new WebSocket connections per minute.
1000 WebSocket subscriptions.
2000 messages per minute sent across all WebSocket connections.
100 simultaneous inflight post messages across all WebSocket connections.

Do not reconnect in a tight loop. Use backoff. Start with 1 second, then 2, then 4, up to maybe 30 seconds. That keeps your script from making a bad network moment worse.

The l2Book rule: snapshots first, deltas after

The l2Book channel is the core of a Hyperliquid order book pipeline.

On subscribe, the first l2Book message is a full snapshot. You can identify it because it has a levels array. The research sample shows the shape:

{
  "coin": "BTC",
  "time": 1779406026396,
  "levels": [
    [{"px": "77632.0", "sz": "24.44037", "n": 60}],
    [{"px": "77633.0", "sz": "1.25621", "n": 19}]
  ]
}

After that, Hyperliquid sends incremental updates with seqNum. The contract is simple: each new update should increase by exactly 1 for that coin subscription.

If your last sequence number was 1200, the next one should be 1201. If you see 1204, you missed data. Your local book is no longer trustworthy.

The fix is not to guess. The research notes that Hyperliquid does not provide a WebSocket RPC call to request a fresh book snapshot. You reconnect. On reconnect, Hyperliquid sends a new full snapshot. Replace your local book with that snapshot and continue.

That gives you a clean loop:

Connect to wss://api.hyperliquid.xyz/ws.
Subscribe to l2Book for BTC, ETH, SOL, or whichever coins you track.
Store the first full snapshot.
Apply each incremental update in sequence.
If seqNum skips, close the connection.
Reconnect and replace the book with the next full snapshot.

Hyperliquid does not send a CRC32 checksum like Binance does, per the research. Data integrity comes from sequence numbers. If you want one more safety check, compare your local book against the REST l2Book endpoint every so often. Just remember that REST snapshots may be slightly delayed relative to the WebSocket stream.

Store less than you think

Raw order books get large fast. For most beginners, storing every full book forever is overkill. You want enough data to debug your system, build simple analytics, and reconstruct why an alert fired.

A practical local setup can use three layers:

SQLite for recent operational data.
DuckDB for backtesting and analytics.
A markdown ledger for human trade notes.

SQLite is a good first database because it is boring in the best way. You can run it locally, write from Python with aiosqlite, and query it without setting up a server.

The research suggests these tables:

orderbook_snapshots: coin, sequence number, snapshot JSON, best bid, best ask, spread, mid price, bid volume, ask volume, recorded time.
trades: coin, price, size, side, trade time, hash, recorded time.
candles_1m: coin, bucket time, open, high, low, close, volume, trade count, VWAP.

Index by coin and time. That one choice saves pain later.

DuckDB is better when you want to ask bigger questions. You can load from SQLite with sqlite_scan(), build rolling VWAP views, look at volatility, or compare trades against nearby order book snapshots to study execution quality.

The markdown ledger is not for speed. It is for memory. Each closed trade can write one entry with:

Timestamp.
Direction.
Entry, stop loss, and take profit.
Size.
Signal source.
Claude or GPT confidence score, if used.
PnL history.
Short notes on what happened.

If you Git-track that folder, you get a readable trading diary with diffs. That is more useful than a mystery database row three weeks later.

Do not put the LLM on the hot path

Claude and GPT should not receive every tick. They are too slow, too expensive, and too easy to confuse with noisy data.

Instead, turn the market into compact summaries.

A one-line ticker summary every 500 milliseconds to 1 second might look like this:

BTC: bid=77633/ask=77632, spread=-1.0, bid_sz=1.26/ask_sz=24.44, 10s delta_bid=-5.0, funding=0.000835%

That example uses the kind of BTC values shown in the research sample. Notice the weird spread: bid above ask. The research caveat says that crossed spreads in the May 21 sample may come from propagation timing differences. Treat that as a warning. Your pipeline should detect crossed books and either flag them or wait for the next clean update.

For Claude or GPT, a compact JSON state every 5 to 10 seconds is usually better:

{
  "BTC": {
    "mid": 77632.5,
    "spread": -1,
    "imbalance": 0.05,
    "vwap_1m": 77635,
    "funding": "0.000835%",
    "volume_1m": 145.2
  }
}

Then call the LLM only when something changes enough to matter. The research gives useful trigger examples:

Spread widens above 2 times the trailing average.
Order book imbalance rises above 3.0 or falls below 0.33.
Price moves more than 0.5% in under 10 seconds.
A large fill is more than 5% of 10-minute average volume.

That keeps the model away from raw noise. It also makes your logs easier to audit.

A safer prompt pattern for Claude or GPT

If you ask an LLM, “Should I buy BTC?”, you are begging for a vague answer.

Give it a narrow job. Feed it current state, recent trades, funding, and the output shape you expect. The research suggests keeping regular calls under 2000 tokens and running evaluation every 30 to 60 seconds at most.

A plain prompt can look like this:

You are a trading analyst. Current market state:

BTC: Mid 77632.5, Spread -1.0, Bid Size 1.26, Ask Size 24.44
5-min delta: +0.15%
Funding: +0.000835% 8h
Recent trades 30s: 2 buys, 5 sells, net sell 27.3 BTC

Decide whether to enter a position.

Output JSON only:
{
  "direction": "long|short|none",
  "entry": price,
  "sl": price,
  "tp": price,
  "size": btc_amount,
  "confidence": 1-10,
  "reasoning": "short explanation"
}

Use structured JSON output mode if your LLM provider supports it. More importantly, treat the output as a suggestion, not permission to trade. If the model says "direction":"long", your system should still check risk limits, account state, open orders, and whether the book is clean.

For signed trading, Hyperliquid uses EIP-712 typed data signing rather than API keys. The Python SDK handles this through Exchange._post_action() and sign_l1_action() in hyperliquid/utils/signing.py, per the research. API wallets can be generated at app.hyperliquid.xyz/API, so you do not need to expose your main wallet key to a script.

That is a separate step from market data. Build the read-only pipeline first.

Python, TypeScript, and SDK choices

The official Python SDK is the cleanest starting point if you are comfortable with Python. The research lists hyperliquid-python-sdk version 0.23.0:

pip install hyperliquid-python-sdk

The GitHub repo is https://github.com/hyperliquid-dex/hyperliquid-python-sdk. The useful classes are Info for read endpoints, Exchange for trading actions, API for base HTTP, and WebsocketManager for managed WebSocket connections.

You can also use CCXT through the hyperliquid package, listed in the research as version 1.7.7.

TypeScript is less tidy. The research says Hyperliquid does not maintain an official TypeScript SDK from hyperliquid-dex. Community options include:

@nktkas/hyperliquid v0.32.2, described as the most actively maintained as of March 2026.
hyperliquid v1.7.7 by nomeida, which depends on ethers v6.
hyperliquid-sdk v2.4.2.
hyperliquid-ts-sdk v0.0.38 by elevatordown, noted as older.

If this is your first pipeline, choose Python unless your existing stack is already Node.js. Async Python with websockets, aiosqlite, and DuckDB is enough.

A simple production shape

A clean architecture separates ingest from analysis. The WebSocket reader should not wait on Claude, GPT, Discord, Telegram, or a database hiccup before reading the next message.

The research architecture looks like this:

[Hyperliquid WS] -> [Python/Node.js ingest pipeline]
                          |
                   [Queue: Redis/NATS]
                          |
        +-----------------+----------------+
        |                 |                |
 [SQLite writes]   [DuckDB analytics]  [LLM evaluator]
        |                                  |
 [Markdown journal]              [Telegram/Discord alerts]

Redis or NATS gives you a buffer between the fast stream and slower consumers. SQLite writes can batch and commit periodically. DuckDB can run analytics without touching the hot path. The LLM evaluator can run every 30 to 60 seconds or only when an alert condition fires.

This also makes failures less scary. If Claude is down, your book still updates. If Telegram fails, your trades table still records events. If the WebSocket gaps, the ingest process reconnects and replaces the book.

Risk notes beginners should not skip

A data pipeline can feel safe because it is “just code.” It still touches trading decisions if you use it that way.

All API values come back as strings, per the research. Convert carefully. Do not compare price strings like numbers.
Funding rates use a per-8-hour convention in the research. To annualize, the formula given is rate * 3 * 365 * 100.
Open interest may be raw units rather than dollars. The research warns that BTC open interest around 28K may imply about $28B notional, not $28K.
Some exposed endpoints are undocumented and may be unstable, including frontendOpenOrders and userNonFundingLedgerUpdates.
No public liquidation history endpoint exists, per the research.
Rate limit numbers can change. The cited limits are from Hyperliquid GitBook docs dated April 28, 2026.

The safest first version is read-only. Get WebSockets, snapshots, storage, and alerts working before adding signed orders.

Protect your crypto with Ledger

Use NordVPN for a safer connection

Disclosure: Easy as Pie DeFi may earn a commission if you buy through these links, at no extra cost to you. Hardware wallets and VPNs do not remove crypto risk. They only help with specific parts of security.

Bottom line

Build the pipeline before you build the trader: subscribe to Hyperliquid l2Book, watch seqNum, reconnect on gaps, store clean summaries in SQLite or DuckDB, and feed Claude or GPT small, structured market states instead of raw noise.