X For You Feed Algorithm

This repository contains the core recommendation system powering the “For You” feed on X. It combines in-network content (from accounts you follow) with out-of-network content (discovered through ML-based retrieval) and ranks everything using a Grok-based transformer model.

- Advertisement -

Note: The transformer implementation is ported from the Grok-1 open source release by xAI, adapted for recommendation system use cases.

Table of Contents

- Advertisement -

Overview
System Architecture
Components

Home Mixer
Thunder
Phoenix
Candidate Pipeline

- Advertisement -

How It Works

Pipeline Stages
Scoring and Ranking
Filtering

Key Design Decisions
License

- Advertisement -

Overview

The For You feed algorithm retrieves, ranks, and filters posts from two sources:

In-Network (Thunder): Posts from accounts you follow
Out-of-Network (Phoenix Retrieval): Posts discovered from a global corpus

- Advertisement -

Both sources are combined and ranked together using Phoenix, a Grok-based transformer model that predicts engagement probabilities for each post. The final score is a weighted combination of these predicted engagements.

We have eliminated every single hand-engineered feature and most heuristics from the system. The Grok-based transformer does all the heavy lifting by understanding your engagement history (what you liked, replied to, shared, etc.) and using that to determine what content is relevant to you.

System Architecture

┌─────────────────────────────────────────────────────────────────────────────────────────────┐
│ FOR YOU FEED REQUEST │
└─────────────────────────────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────────────────────────┐
│ HOME MIXER │
│ (Orchestration Layer) │
├─────────────────────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────────────────────────────────┐ │
│ │ QUERY HYDRATION │ │
│ │ ┌──────────────────────────┐ ┌──────────────────────────────────────────────┐ │ │
│ │ │ User Action Sequence │ │ User Features │ │ │
│ │ │ (engagement history) │ │ (following list, preferences, etc.) │ │ │
│ │ └──────────────────────────┘ └──────────────────────────────────────────────┘ │ │
│ └─────────────────────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────────────────────┐ │
│ │ CANDIDATE SOURCES │ │
│ │ ┌─────────────────────────────┐ ┌────────────────────────────────┐ │ │
│ │ │ THUNDER │ │ PHOENIX RETRIEVAL │ │ │
│ │ │ (In-Network Posts) │ │ (Out-of-Network Posts) │ │ │
│ │ │ │ │ │ │ │
│ │ │ Posts from accounts │ │ ML-based similarity search │ │ │
│ │ │ you follow │ │ across global corpus │ │ │
│ │ └─────────────────────────────┘ └────────────────────────────────┘ │ │
│ └─────────────────────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────────────────────┐ │
│ │ HYDRATION │ │
│ │ Fetch additional data: core post metadata, author info, media entities, etc. │ │
│ └─────────────────────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────────────────────┐ │
│ │ FILTERING │ │
│ │ Remove: duplicates, old posts, self-posts, blocked authors, muted keywords, etc. │ │
│ └─────────────────────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────────────────────┐ │
│ │ SCORING │ │
│ │ ┌──────────────────────────┐ │ │
│ │ │ Phoenix Scorer │ Grok-based Transformer predicts: │ │
│ │ │ (ML Predictions) │ P(like), P(reply), P(repost), P(click)… │ │
│ │ └──────────────────────────┘ │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ ┌──────────────────────────┐ │ │
│ │ │ Weighted Scorer │ Weighted Score = Σ (weight × P(action)) │ │
│ │ │ (Combine predictions) │ │ │
│ │ └──────────────────────────┘ │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ ┌──────────────────────────┐ │ │
│ │ │ Author Diversity │ Attenuate repeated author scores │ │
│ │ │ Scorer │ to ensure feed diversity │ │
│ │ └──────────────────────────┘ │ │
│ └─────────────────────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────────────────────┐ │
│ │ SELECTION │ │
│ │ Sort by final score, select top K candidates │ │
│ └─────────────────────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────────────────────┐ │
│ │ FILTERING (Post-Selection) │ │
│ │ Visibility filtering (deleted/spam/violence/gore etc) │ │
│ └─────────────────────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────────────────────────┐
│ RANKED FEED RESPONSE │
└─────────────────────────────────────────────────────────────────────────────────────────────┘

Components

Home Mixer

Location: home-mixer/

The orchestration layer that assembles the For You feed. It leverages the CandidatePipeline framework with the following stages:

Stage
Description

Query Hydrators
Fetch user context (engagement history, following list)

Sources
Retrieve candidates from Thunder and Phoenix

Hydrators
Enrich candidates with additional data

Filters
Remove ineligible candidates

Scorers
Predict engagement and compute final scores

Selector
Sort by score and select top K

Post-Selection Filters
Final visibility and dedup checks

Side Effects
Cache request info for future use

The server exposes a gRPC endpoint (ScoredPostsService) that returns ranked posts for a given user.

Thunder

Location: thunder/

An in-memory post store and realtime ingestion pipeline that tracks recent posts from all users. It:

Consumes post create/delete events from Kafka
Maintains per-user stores for original posts, replies/reposts, and video posts
Serves “in-network” post candidates from accounts the requesting user follows
Automatically trims posts older than the retention period

Thunder enables sub-millisecond lookups for in-network content without hitting an external database.

Phoenix

Location: phoenix/

The ML component with two main functions:

1. Retrieval (Two-Tower Model)

Finds relevant out-of-network posts:

User Tower: Encodes user features and engagement history into an embedding
Candidate Tower: Encodes all posts into embeddings
Similarity Search: Retrieves top-K posts via dot product similarity

2. Ranking (Transformer with Candidate Isolation)

Predicts engagement probabilities for each candidate:

Takes user context (engagement history) and candidate posts as input
Uses special attention masking so candidates cannot attend to each other
Outputs probabilities for each action type (like, reply, repost, click, etc.)

See phoenix/README.md for detailed architecture documentation.

Candidate Pipeline

Location: candidate-pipeline/

A reusable framework for building recommendation pipelines. Defines traits for:

Trait
Purpose

Source
Fetch candidates from a data source

Hydrator
Enrich candidates with additional features

Filter
Remove candidates that shouldn’t be shown

Scorer
Compute scores for ranking

Selector
Sort and select top candidates

SideEffect
Run async side effects (caching, logging)

The framework runs sources and hydrators in parallel where possible, with configurable error handling and logging.

How It Works

Pipeline Stages

Query Hydration: Fetch the user’s recent engagements history and metadata (eg. following list)

Candidate Sourcing: Retrieve candidates from:

Thunder: Recent posts from followed accounts (in-network)
Phoenix Retrieval: ML-discovered posts from the global corpus (out-of-network)

Candidate Hydration: Enrich candidates with:

Core post data (text, media, etc.)
Author information (username, verification status)
Video duration (for video posts)
Subscription status

Pre-Scoring Filters: Remove posts that are:

Duplicates
Too old
From the viewer themselves
From blocked/muted accounts
Containing muted keywords
Previously seen or recently served
Ineligible subscription content

Scoring: Apply multiple scorers sequentially:

Phoenix Scorer: Get ML predictions from the Phoenix transformer model
Weighted Scorer: Combine predictions into a final relevance score
Author Diversity Scorer: Attenuate repeated author scores for diversity
OON Scorer: Adjust scores for out-of-network content

Selection: Sort by score and select the top K candidates

Post-Selection Processing: Final validation of post candidates to be served

Scoring and Ranking

The Phoenix Grok-based transformer model predicts probabilities for multiple engagement types:

Predictions:
├── P(favorite)
├── P(reply)
├── P(repost)
├── P(quote)
├── P(click)
├── P(profile_click)
├── P(video_view)
├── P(photo_expand)
├── P(share)
├── P(dwell)
├── P(follow_author)
├── P(not_interested)
├── P(block_author)
├── P(mute_author)
└── P(report)

The Weighted Scorer combines these into a final score:

Final Score = Σ (weight_i × P(action_i))

Positive actions (like, repost, share) have positive weights. Negative actions (block, mute, report) have negative weights, pushing down content the user would likely dislike.

Filtering

Filters run at two stages:

Pre-Scoring Filters:

Filter
Purpose

DropDuplicatesFilter
Remove duplicate post IDs

CoreDataHydrationFilter
Remove posts that failed to hydrate core metadata

AgeFilter
Remove posts older than threshold

SelfpostFilter
Remove user’s own posts

RepostDeduplicationFilter
Dedupe reposts of same content

IneligibleSubscriptionFilter
Remove paywalled content user can’t access

PreviouslySeenPostsFilter
Remove posts user has already seen

PreviouslyServedPostsFilter
Remove posts already served in session

MutedKeywordFilter
Remove posts with user’s muted keywords

AuthorSocialgraphFilter
Remove posts from blocked/muted authors

Post-Selection Filters:

Filter
Purpose

VFFilter
Remove posts that are deleted/spam/violence/gore etc.

DedupConversationFilter
Deduplicate multiple branches of the same conversation thread

Key Design Decisions

1. No Hand-Engineered Features

The system relies entirely on the Grok-based transformer to learn relevance from user engagement sequences. No manual feature engineering for content relevance. This significantly reduces the complexity in our data pipelines and serving infrastructure.

2. Candidate Isolation in Ranking

During transformer inference, candidates cannot attend to each other—only to the user context. This ensures the score for a post doesn’t depend on which other posts are in the batch, making scores consistent and cacheable.

3. Hash-Based Embeddings

Both retrieval and ranking use multiple hash functions for embedding lookup

4. Multi-Action Prediction

Rather than predicting a single “relevance” score, the model predicts probabilities for many actions.

5. Composable Pipeline Architecture

The candidate-pipeline crate provides a flexible framework for building recommendation pipelines with:

Separation of pipeline execution and monitoring from business logic
Parallel execution of independent stages and graceful error handling
Easy addition of new sources, hydrations, filters, and scorers

License

This project is licensed under the Apache License 2.0. See LICENSE for details.