Rally Scoring System
The Rally scoring system uses AI-powered evaluation to ensure fair and transparent reward distribution based on content quality and engagement.
TLDR
- Quality Gates: Submissions must pass 4 AI-evaluated gates (alignment, accuracy, compliance, originality)
- Dual Scoring: Fixed quality component + dynamic engagement metrics determine your Campaign Points
- Metric Weights: Each of the 11 metrics has a weight (0–1); 0 disables it, higher means more impact
- Fair Distribution: Projects choose reward distribution curves (Balanced, Default, or Extreme)
- Refresh Engagement: Update engagement metrics in later periods for additional Campaign Points
- Multi-LLM Consensus: All evaluations use multiple AI models for fairness
Overview
The scoring system evaluates tweets across multiple dimensions, combining intrinsic quality metrics with real-time engagement data to calculate fair rewards. Campaign Points determine a content creator’s share of the rewards pot — higher points mean higher rewards. During the Alpha phase, these rewards are distributed as Rally Points.
Categories of Evaluation
At a glance, Rally evaluates content across 3 categories (11 total metrics):
- Gates (4, scored 0–2 each): Content Alignment, Information Accuracy, Campaign Compliance, Originality & Authenticity. Submissions must pass all gates (score > 0) to be eligible for rewards.
- Quality Metrics (2, scored 0–5): Engagement Potential, Technical Quality. These capture intrinsic quality beyond the gates.
- Engagement Metrics (5, dynamic): Retweets, Likes, Replies, Quality of Replies (AI-evaluated), Followers of Repliers. These update over time based on real audience interaction.
Reward Distribution Curves
Projects select how sharply rewards concentrate among top-performing creators:
- Balanced: ~25% of rewards to the top 10%
- Default: ~90% of rewards to the top 10%
- Extreme: ~99% of rewards to the top 10%
These curves are implemented in the scoring formula via alpha (α). Higher α concentrates rewards more among top performers; lower α spreads them more evenly. See the math section for the exact mapping.
Weights (0–1 per metric)
Each of the 11 metrics can be assigned a weight from 0 to 1:
- 1.0: Maximum importance
- 0.0: Metric is completely turned off (does not affect scoring)
- Values in between scale the metric’s relative impact
Weights are visible to campaign managers in the setup wizard and to content creators in the campaign briefing, so priorities are transparent. Projects can tailor these per campaign. For example:
- To emphasize quality over engagement, lower the weights for engagement metrics (RT, LK, RP, QR, FR) and raise the weights for EP and TQ.
- If uniqueness is not a priority, set the Originality & Authenticity gate weight to 0 to turn it off. Turning a gate off means it will not disqualify submissions and will not influence the gate multiplier.
Refresh Engagement (Overview)
Refresh Engagement allows content creators to update engagement metrics for prior submissions in later periods to capture ongoing performance.
- After the initial submission in a period, a transaction can be submitted in subsequent periods to refresh engagement metrics for that submission
- The quality component remains fixed from the first submission; only engagement metrics are refreshed
- The refresh credits only the positive difference versus the previous baseline, rewarding genuine growth in engagement
Detailed Metrics
Gates (pass/fail with quality)
Submissions must pass through four quality gates. Each gate scores from 0-2:
- 0 = Fail (disqualifies the submission)
- 1-2 = Pass with quality rating
1. Content Alignment (0–2)
How well the content aligns with the campaign’s message and values:
- Message accuracy
- Correct terminology usage
- Brand consistency
- Target audience fit
2. Information Accuracy (0–2)
Factual correctness of the content:
- Technical accuracy
- Consistency with official materials
- Accurate data and statistics
- Proper context
3. Campaign Compliance (0–2)
Adherence to campaign rules:
- Required hashtags and mentions
- Format requirements
- Style guidelines
- Necessary disclosures
4. Originality & Authenticity (0–2)
Uniqueness and authentic voice:
- Fresh perspective
- Personal insights
- Natural language
- Creative expression
Quality Metrics
Beyond the gates, two additional quality metrics are evaluated:
Engagement Potential (0–5)
- Hook effectiveness
- Call-to-action quality
- Content structure
- Conversation potential
Technical Quality (0–5)
- Grammar and spelling
- Formatting and structure
- Platform optimization
- Media integration
Engagement Metrics
These metrics update over time based on content performance:
Direct Metrics (dynamic)
- Retweets (RT) - Amplification of the message (log-scaled)
- Likes (LK) - Audience appreciation (log-scaled)
- Replies (RP) - Conversation generation (log-scaled)
Advanced Metrics (dynamic)
- Quality of Replies (QR) - AI analysis of reply quality (0–1)
- Followers of Repliers (FR) - Reach of engaged audience (log-scaled)
Thread Scoring
For multi-tweet threads, we use the peak performance of any tweet in the thread for each metric. Ranges and scaling:
- RT, LK, RP: counted via log(R+1), log(L+1), log(RP+1)
- QR: 0–1 AI score reflecting relevance, civility, informativeness
- FR: log(Followers+1) for accounts that replied
Score Calculation (Full Math)
The Complete Formula
The scoring system uses a multi-step calculation:
Step 1: Gate Pass & Multiplier
gate_pass = min(G₁, G₂, G₃, G₄) > 0
g_star = avg(G₁, G₂, G₃, G₄)
M_gate = 1 + β × (g_star - 1)Where G₁-G₄ are the four gate scores (0-2) and β = 0.5
Step 2: Campaign Points (Q Score)
Campaign_Points = M_gate × Σ(W[i] × normalized_metrics[i])Where:
- W is the vector of metric weights in [0,1] for the 11 metrics (W[i]=0 turns a metric off; higher values increase impact)
- Normalized metrics include: EP, TQ, log(RT+1), log(LK+1), log(RP+1), QR, log(FR+1)
Step 3: Period Accumulation (Refresh Engagement)
- New submissions:
user_Q[period] += Campaign_Points - Refresh Engagement:
user_Q[period] += max(0, Q_current - Q_baseline)
Step 4: Final Distribution (Distribution Curves)
S_user = max(user_Q[period], 0)^α
share_user = S_user / Σ(S_all_users)
rewards_user = share_user × total_rewardsWhere α corresponds to the selected curve:
- Balanced → α = 1.0
- Default → α = 3.0
- Extreme → α = 8.0
Higher α values increase concentration (more to the very top performers), while lower α values distribute rewards more broadly.
The Formula (Simplified)
- Gate Multiplier: Exceptional gate performance (scores >1) provides a bonus
- Campaign Points: Weighted combination of all metrics
- Period Accumulation: Scores accumulate throughout the campaign period
- Final Distribution: At period end, rewards are distributed based on relative performance
Refresh Engagement System
Content creators can refresh engagement in later periods to capture additional performance:
- Quality component remains fixed from first submission
- Only engagement metrics are updated
- You earn the difference between new and old Campaign Points
- Prevents gaming through repeated refreshes
Tips for High Scores
Quality Tips
- Research the campaign thoroughly before tweeting
- Be authentic - use a distinctive voice and perspective
- Follow all requirements exactly
- Add value - don’t just repeat talking points
Engagement Tips
- Post at optimal times for the intended audience
- Engage with replies to boost conversation
- Create compelling hooks to grab attention
- Use threads strategically - the best-performing tweet counts
Distribution Curves (at a glance)
Projects choose the model that best fits their goals. At a high level:
- Balanced: ~25% of rewards to the top 10% of content creators
- Default: ~90% of rewards to the top 10% of content creators
- Extreme: ~99% of rewards to the top 10% of content creators
Important Notes
- Images are not evaluated - Focus on text content
- Quality component is permanent - The first submission is the baseline
- Engagement updates - Check back to resubmit high-performing tweets
- Period-based distribution - Rewards distributed at campaign period end
Technical Details
For developers and projects wanting deeper understanding:
- Scores stored as “atto” values (×10^18) for precision
- Vector similarity used for originality comparison
- Non-deterministic LLM calls use consensus validation
- Maximum 2-point variance allowed between validators
This scoring system ensures fair rewards for quality content while preventing spam and low-effort submissions. The goal is to support campaigns and audiences with valuable, engaging content.