Model Card & Safety - RakshaLink

Our AI Models

RakshaLink uses a multi-layered AI approach to detect fraud with high accuracy while maintaining complete privacy through on-device processing.

Conv1D Neural Network v1.4

Primary detection model for real-time fraud analysis

Model Size 1.5 MB

Inference Time < 30ms

Accuracy 94.2% (±2.1%)

TinyBERT v0.9

Advanced contextual analysis for complex messages

Model Size 4 MB

Inference Time < 50ms

Precision 96.5% (±1.8%)

Pattern Detector v2.2

Rule-based system for known fraud patterns

Rules Updated Monthly

Categories 15+

Coverage Known Patterns

Performance Metrics

Detection Capabilities

Evaluated on test set of 5,200 messages (Aug-Sep 2025) with Hindi/English/Hinglish mix

Fraud Type	Detection Rate (Recall at Operating Point)	False Positive Rate	Test Set
UPI/Payment Frauds	94-97% (±2.1%)	0.8-1.6%	N=1,850
Phishing Links	92-95% (±2.8%)	1.2-2.8%	N=1,200
KYC/Identity Scams	91-94% (±3.1%)	1.5-3.5%	N=980
Digital Arrest Threats	96-99% (±1.8%)	0.5-1.2%	N=420
Job/Investment Scams	89-92% (±3.5%)	2.1-4.8%	N=550
Lottery/Prize Frauds	93-96%	0.9-2.1%	N=200

Language Support

English (Primary)
Hindi (Including Hinglish/Romanized)
Mixed language messages and code-switching
High-volume WhatsApp patterns in Marathi & Bengali
Early support for Kannada & Telugu (work-in-progress)

Resource Usage

Tested on devices: Snapdragon 6/7/8 series, Android 11-14, 4-8GB RAM

Memory

Median RAM Usage 38-62 MB

95th Percentile 78 MB

Storage ~6 MB

Battery

Active Scanning 0.6-1.8%/day

Background < 0.5%/day

Measurement 7-day median

How We Measure

Dataset Summary

Training & Evaluation Data

Total Messages: 50,000+ labeled samples (anonymized)
Sources: 70% WhatsApp, 25% SMS, 5% Other
Language Split: 40% English, 35% Hindi/Hinglish, 25% Mixed/Regional
Time Window: June 2024 - September 2025
Balance Strategy: Oversampling rare fraud types, undersampling common legitimate messages
Validation: Time-based split (80/10/10 train/val/test)

Evaluation Protocol

Operating Point: Optimized for F2 score (emphasizing recall)
Cross-validation: 5-fold with stratified sampling
Adversarial Tests: Unicode tricks (ZWSP, RLO), homoglyphs, compressed URLs
Confidence Intervals: Bootstrap with 1000 iterations
Drift Monitoring: Monthly review of false positives/negatives

Safety Principles

🔒 Privacy by Design

Detection runs locally on your device. No chat content, contacts, call logs, or OTPs are uploaded. Optional aggregated statistics (if enabled) contain no message text.

🎯 High Precision Focus

Tuned to reduce false alarms on legitimate bank OTPs, bill reminders, and delivery notifications. We prioritize avoiding incorrect flags over catching every possible threat.

🔄 Rapid Updates

Rules and models updated offline and shipped via app updates (opt-in). No real-time learning on your device to prevent model poisoning.

⚖️ Bias Mitigation

Regular audits for linguistic and regional bias. Balanced training across urban/rural usage patterns and socioeconomic contexts.

Known Limitations

⚠️ Current Limitations

Unicode Evasion: Advanced tricks (ZWSP, RLO/U+202E, homoglyphs) combined with truncated notifications may evade detection
Media-Only Scams: Pure image/audio scams (voice notes, stickers) detected only when metadata contains text
New Patterns: Fresh bank brands, newly registered phishing domains have cold-start detection lag (24-48 hours)
Ambiguous Messages: Legitimate KYC reminders, mandate notices, e-commerce fees may trigger cautionary flags
Regional Coverage: Lower accuracy for pure regional languages (Assamese, Odia, etc.) without Romanization
Device Limits: Cannot protect against device-level malware or screen-overlay attacks

Technical Details

Model Architecture

Conv1D Network: 3-layer CNN with max pooling, dropout (0.3), batch normalization
TinyBERT: 4-layer, 312-hidden, 12-heads, 14.5M parameters, distilled from BERT-base
Tokenizer: WordPiece with 10K vocab, special tokens for Indian financial terms
Cascade System: Confidence gating at 0.35/0.65 thresholds for efficiency

Evaluation Metrics

Metric	Conv1D	TinyBERT	Cascade
Precision	94.8% (±2.2%)	96.5% (±1.8%)	96.2% (±1.9%)
Recall	93.2% (±2.5%)	94.2% (±2.1%)	95.1% (±2.0%)
F1-Score	94.0% (±2.3%)	95.3% (±1.9%)	95.6% (±1.8%)
Latency (P95)	28ms	48ms	35ms
PR-AUC	0.947	0.968	0.971

Red Team Test Results

Attack Vector	Success Rate (Pre-mitigation)	Success Rate (Current)
Homoglyph Substitution	45%	< 5%
Hidden Unicode (ZWSP)	30%	< 2%
Adversarial Noise	15%	< 8%

User Control

All AI features can be disabled at any time
Detection thresholds adjustable in developer settings (7 taps on version)
Model updates are optional with clear changelog
Explanations show which signals triggered detection
One-tap feedback for false positives/negatives

Transparency

Model versions and last update date shown in app
Detection confidence scores (0-100) for each alert
Monthly aggregated accuracy reports (opt-in)
Source patterns shown for rule-based detections
No hidden data collection or behavioral profiling

Security

🔐 Security Measures

Models stored in app sandbox with integrity checks
SHA-256 verification on model load
Kill-switch for emergency model disable (auto-recovery after 24h)
No network required for core detection
Play Integrity API for app attestation (where available)
Android Keystore for sensitive keys (hardware-backed when supported)

Feedback & Improvement

Your feedback improves detection while preserving privacy. Here's how you can help:

Report False Positives

Help reduce incorrect flags on legitimate bank messages and OTPs

Report Missed Scams

Share new fraud patterns we haven't seen (text only, no personal info)

Anonymous Metrics

Opt-in telemetry shares detection counts, not message content

📧 Contact Us

For technical inquiries about our AI models: support@rakshalink.in

Report security issues: support@rakshalink.in

Additional Resources

Downloads:

Footnotes

¹ Device testing methodology: 7-day continuous monitoring using Android Battery Historian and Perfetto. Devices represent 68% of Indian smartphone market share (IDC Q3 2024). Memory measured via Android Studio Profiler during active notification processing. Battery impact calculated as delta from baseline idle consumption.

² Detection rates represent recall at our chosen operating point optimizing for F2 score (weighing recall 2x precision). Confidence intervals calculated using Wilson score method on holdout test set. False positive rates measured on curated legitimate message corpus (N=12,000) including bank OTPs, delivery notifications, and appointment reminders.