How RIC Works

Technical methodology behind Research Integrity Checker. Transparent by design — you should know exactly how your manuscript is being analyzed.

01

AI Writing Detection

Rule-based linguistic analysis, not a black-box classifier

Why rule-based?

Most AI detectors use machine learning classifiers trained on AI vs. human text. The problem: they produce false positives on non-native English writers and can be easily fooled by paraphrasing. RIC uses transparent, interpretable linguistic metrics instead. You can see exactly which patterns triggered and why.

What we measure

RIC analyzes 9 linguistic dimensions across your text. Each metric is compared against empirically observed ranges in academic writing:

  • Burstiness (0.45–0.65 typical): measures variation in sentence complexity. Human writing naturally "bursts" between simple and complex sentences. AI tends to be uniform.
  • Type-Token Ratio / TTR (0.50–0.70 typical): vocabulary diversity. AI often reuses the same words in predictable patterns.
  • Coefficient of Variation / CV (0.40–0.65 typical): sentence length variation. AI-generated text tends to produce sentences of similar length.
  • Sentence uniformity: detects repetitive sentence structures (same opening words, similar syntactic patterns).
  • Hedging language: academic writing uses hedging ("may", "suggests", "appears to") naturally. AI often overuses or underuses these.
  • Transition patterns: checks for formulaic transitions ("Furthermore", "Moreover", "Additionally") that AI tends to chain.
  • Passive voice ratio: measured against discipline-specific norms.
  • Buzzword density: detects overuse of vague intensifiers common in AI output ("cutting-edge", "transformative", "holistic").
  • Paragraph structure: analyzes opening/closing patterns across paragraphs.

Score interpretation

The overall score represents the convergence of multiple weak signals — no single metric is conclusive. Typical ranges: 10–30% for human-written academic text, 40–70% for AI-edited drafts, 70%+ for fully AI-generated text. The score is a screening signal, not a verdict.

02

Citation Verification

Cross-referencing against 4 academic databases

How it works

RIC extracts individual references from your text using pattern matching (numbered lists, APA/Vancouver/Harvard formats, bracketed references). Each extracted reference is then verified against academic databases using fuzzy title matching.

Databases used

  • CrossRef (free tier + Pro): the largest DOI registration agency. Covers most peer-reviewed journals. We use their public API with polite rate limiting.
  • OpenAlex (free tier + Pro): open catalog of 250M+ scholarly works. Good coverage of open access and non-English publications.
  • PubMed (Pro only): NLM's database of biomedical literature. Essential for clinical and life science papers.
  • Semantic Scholar (Pro only): AI-powered academic search by Allen Institute. Strong for computer science and interdisciplinary work.

What "Not Found" means

"Not Found" means the reference could not be matched in any checked database above a 40% title similarity threshold. This does NOT mean the citation is fabricated. Common causes: book chapters, conference proceedings, government reports, non-English publications, very recent preprints, or references with typos. Always verify "Not Found" references manually.

Similarity matching

We use fuzzy string matching (Levenshtein-based) to compare your reference title against database records. A match above 80% similarity is marked "Verified." Between 40–80% is marked "Suspicious" — the closest match is shown so you can check if it's the right paper with a slightly different title.

03

Plagiarism Scanning

Sentence-level web search with citation-aware filtering

Approach

RIC splits your text into individual sentences, then samples evenly across the document to check against web sources. This sampling approach provides representative coverage while keeping scan times reasonable (typically under 30 seconds).

Citation-aware filtering

A key differentiator: RIC detects when a flagged sentence is a legitimate citation or direct quote. Sentences containing inline citations (Author et al., 2024), bracketed references [1-3], or quoted text are marked as "Citation match" (blue) instead of "Flagged" (red). This dramatically reduces false positives compared to generic plagiarism checkers.

Search engine

We use the Tavily search API for web-scale sentence matching. Each sentence is searched as a quoted phrase, and results are compared using similarity scoring. The similarity threshold for flagging is deliberately conservative to minimize false positives.

Coverage

Free tier scans 20 sentences (sampled evenly). Pro scans up to 50 sentences. The coverage percentage is always displayed (e.g., "20/200 sentences scanned — 10%"). This is a screening tool for high-similarity passages, not a comprehensive database comparison like Turnitin.

04

RIC Peer Review

LLM-powered editorial feedback with article-type awareness

How it works

RIC combines rule-based structure analysis with a large language model fine-tuned for academic review. The LLM receives your manuscript with article-type-specific guidelines, current-year context for citation dating, and structured instructions to separate Issues from actionable Suggestions.

Article-type awareness

RIC automatically detects whether your manuscript is an original research article, case report, review, viewpoint/commentary, editorial, or short communication. Each type gets different review criteria:

  • Original research: evaluates hypothesis, methodology, statistical reporting, and IMRaD structure.
  • Case report: focuses on clinical significance, patient consent, literature context, and learning points.
  • Review articles: checks systematic approach, search strategy, synthesis quality, and bias assessment.
  • Viewpoint/commentary: evaluates argument structure, evidence support, and acknowledges it intentionally has a strong authorial voice.
  • Editorial: lightweight review focused on timeliness and clarity, not methodology.

Structure checks

Before the LLM review, RIC runs rule-based structure checks: IMRaD section detection, abstract length, keyword consistency, reference count, passive voice ratio, and abstract-conclusion overlap (calibrated by article type — higher overlap is expected in viewpoints).

Output format

The review is structured as: Summary → Strengths → Weaknesses (Major/Minor, each as Issue + Suggestion) → Novelty & Significance → Recommendation. Free tier shows Summary + Recommendation with locked section headers showing issue counts. Pro shows the full review.

Technical Details

Infrastructure

  • Frontend: Next.js + Tailwind CSS (Vercel)
  • Backend: FastAPI + Python (Render)
  • Usage tracking: Supabase (PostgreSQL)
  • Payment: Gumroad

APIs & Models

  • Citation: CrossRef, OpenAlex, PubMed, Semantic Scholar
  • Plagiarism: Tavily Search API
  • Peer Review: LLM + rule-based structure checks
  • AI Detection: rule-based (no external API)

Privacy

  • Your text is never stored on our servers
  • No user accounts required
  • Processing happens per-request and is discarded

Open Source

  • Citation engine: github.com/tuyentran-md/cite_checker
  • AI detection: rule-based, auditable metrics
  • No black-box classifiers
Try RIC — free, no signup

Paste your manuscript and check in under 2 minutes.