XRef converts published Word documents and PDFs into validated JATS XML and PubMed Citation XML — the structured formats that power scholarly indexing, archiving, and full-text discovery.
Faithful to the published record. Every element labeled, every citation linked, every reference resolvable.
<xref>Passrid resolves to a matching idPass<ext-link>Pass<journal-meta>PassXML — eXtensible Markup Language — is the structural language of modern scholarly publishing. When a published article exists as XML, every element of the record is individually labeled and machine-readable: the title, each author's name and affiliation, the abstract, every section, every table cell, every reference, every figure caption, every inline citation.
A Word document or PDF is designed for human reading. A search engine, an indexing database, a citation tracker, or a full-text reader cannot reliably tell where the abstract ends and the introduction begins — or who the corresponding author is — without structured markup. XML closes that gap.
It converts a document that looks like a published article into one that behaves like a published article across every platform that consumes it: PubMed, PMC, institutional repositories, reference managers, and the journal's own full-text reader.
For journals that want to be indexed, discoverable, and preserved for the long term, XML is the format that makes everything else possible. XRef produces it without changing a word of the published record.
Citations in PubMed/MEDLINE, all structured as XML records
Full-text articles in PubMed Central, deposited as JATS XML
International standard defining JATS XML, adopted by NLM in 2012
Structural assertions verified per article before every XRef delivery
JATS XML and PubMed Citation XML are the two standards that govern scholarly publishing infrastructure. Together they cover every platform that indexes, archives, and delivers journal articles to readers worldwide.
The Journal Article Tag Set encodes a complete scholarly article — front matter, full body, tables, figures, equations, and a structured reference list with linked DOIs.
<disp-formula>The PubMed Publisher DTD feeds the 36-million-record PubMed database. Citation metadata and abstract only — not the article body.
<ArticleSet>The published article already exists. Structured XML does not add to it or alter it. It makes what is already there legible to every system that needs to read it.
Indexing databases read structured XML fields directly. Articles without XML depend on PDF text extraction, which routinely misses authors, affiliations, and structured references.
PubMed citation records are structured XML submissions from the publisher. Accurate author names, correct DOIs, and complete pagination all depend on the quality of the XML submitted.
XML is format-independent. A JATS file created today can be validated, rendered, and re-published decades later, regardless of the software that produced it.
Inline citations linked to structured reference entries enable CrossRef Cited-by, Scite, and reference managers to track citation relationships at the sentence level.
Screen readers, reflow layouts, and alternative-text rendering depend on document structure. JATS XML with proper alt-text and semantic markup makes articles accessible in ways PDF cannot.
One validated JATS XML file can power a reading platform, a mobile app, a print-on-demand service, and a repository deposit simultaneously.
Every conversion preserves the published article verbatim. No text is rewritten, summarized, corrected, or deleted. XRef converts structure. It does not touch content.
The complete published Word article converted to a single validated JATS 1.3 file. Every section, table, figure, equation, and reference — structured and linked.
A structured ArticleSet XML file confirmed against the published article landing page — journal identifiers, DOI, authors, abstract, and keywords.
Full-text JATS from the published PDF with layout-aware extraction. Suitable for back-catalog articles where the original Word file is no longer available.
PubMed-ready citation XML with all ArticleSet fields confirmed against the journal's article page — suitable for back-catalog indexing submissions.
XRef follows a documented four-stage process for every article. Each stage produces a verifiable artifact. No stage is skipped.
Provide the DOCX or PDF, the published article URL, and optionally the PDF link. XRef pre-checks the file before conversion begins.
The published article page is fetched and cross-checked. The publisher page is authoritative for DOI, ISSN, page range, and publication dates.
The full JATS or PubMed XML is built section by section. Every inline citation is linked. Every URL in the reference list is wrapped in <ext-link>.
27 pre-flight assertions run against the produced XML. DTD validation passes. The ZIP package is assembled with a full conversion audit report.
XRef does not deliver XML that looks correct. It delivers XML that passes a documented set of assertions before leaving the system — and attaches the audit report so you can verify each one.
| Assertion | What it checks | Why it matters |
|---|---|---|
| A1 / A3 | Every inline citation marker is linked to a reference entry via <xref> | Plain-text citation markers break reference links in every downstream reader |
| A4 / A5 | Every <table-wrap> has a label and caption; labels match source numbering | Mislabeled tables cannot be cross-referenced or cited |
| A10 | Every rid attribute resolves to a matching id | Broken cross-references fail silently in validators but visibly in readers |
| A24 | Every URL in the reference list is an <ext-link>, not plain text | Plain-text URLs cannot be resolved by reference managers or DOI resolvers |
| A27 | At least one ISSN is present in <journal-meta> | Missing ISSN blocks PubMed and Crossref from resolving the journal record |
XRef is a standalone subdomain product, while login, users, and admin remain centralized in CheckRef.
Validate references and DOI metadata before publishing.
Audit published metadata quality and record health.
Assess bibliography relevance and manuscript-topic support.
Convert published articles to validated JATS and PubMed XML.
Author visibility and journal outreach after publication.
No. XRef converts structure, not substance. The text of the published article — including every comma and source imperfection — is preserved verbatim. Source quirks are logged in the conversion audit report but never silently corrected.
JATS XML is full-text: body, tables, figures, references, and all back matter. You deposit it at PMC or render it in a full-text viewer. PubMed XML (ArticleSet) is citation-only: bibliographic record and abstract but not the article body. You submit it to NLM for PubMed indexing.
The published article page is authoritative for all metadata — DOI, ISSN, page range, publication date, ORCIDs. The source file is authoritative for body content. Every reconciliation is logged in the conversion report.
Yes. JATS supports <trans-title> and <trans-abstract> for bilingual content. Arabic content included is handled using Unicode UTF-8 throughout.
Every JATS XML file is checked against DTD well-formedness, 27 internal structural assertions, the JATS4R Schematron validator, and the PMC Style Checker.
XRef delivers a ZIP package containing the main XML file, all figures in a figures/ subfolder, and a markdown audit report. Upload to the CheckRef JATS viewer at checkref.org/xml/viewer for visual review, or submit to PMC, Crossref, or your repository.
XRef uses the normal CheckRef login at checkref.org. Administrators grant access through the CheckRef admin panel; there is no separate XRef account.
Validated JATS and PubMed Citation XML, delivered with a full audit report. The published record — faithfully structured.