Part of CheckRef · JATS 1.3 & PubMed Citation XML

The published article, fully structured

XRef converts published Word documents and PDFs into validated JATS XML and PubMed Citation XML — the structured formats that power scholarly indexing, archiving, and full-text discovery.

Faithful to the published record. Every element labeled, every citation linked, every reference resolvable.

https://xml.checkref.org/
Structured XML Conversion Report
Sources: Publisher page · JATS DTD · JATS4R Schematron · PMC Style Checker
27/27 assertions passed
References
29
Citations linked
29
Tables & figures
8
Validation
PASS

Structural assertions

A1Every inline citation linked via <xref>Pass
A10Every rid resolves to a matching idPass
A24Reference URLs wrapped in <ext-link>Pass
A27ISSN present in <journal-meta>Pass

JATS XML specimen

article.xml · JATS 1.3 · NISO Z39.96
<article article-type="research-article" dtd-version="1.3"> <article-id pub-id-type="doi">10.32649/ajas.2025.188982</article-id> <article-title>Estimating Heavy Metal Concentrations…</article-title> <contrib contrib-type="author">Al-Amidi</contrib> <ref id="ref1"> … ext-link DOI … </ref> </article>
Foundation

Why XML is the language of scholarly publishing

XML — eXtensible Markup Language — is the structural language of modern scholarly publishing. When a published article exists as XML, every element of the record is individually labeled and machine-readable: the title, each author's name and affiliation, the abstract, every section, every table cell, every reference, every figure caption, every inline citation.

A Word document or PDF is designed for human reading. A search engine, an indexing database, a citation tracker, or a full-text reader cannot reliably tell where the abstract ends and the introduction begins — or who the corresponding author is — without structured markup. XML closes that gap.

It converts a document that looks like a published article into one that behaves like a published article across every platform that consumes it: PubMed, PMC, institutional repositories, reference managers, and the journal's own full-text reader.

For journals that want to be indexed, discoverable, and preserved for the long term, XML is the format that makes everything else possible. XRef produces it without changing a word of the published record.

36M+

Citations in PubMed/MEDLINE, all structured as XML records

4.5M+

Full-text articles in PubMed Central, deposited as JATS XML

NISO Z39.96

International standard defining JATS XML, adopted by NLM in 2012

27

Structural assertions verified per article before every XRef delivery

Formats

Two formats. Every scholarly destination.

JATS XML and PubMed Citation XML are the two standards that govern scholarly publishing infrastructure. Together they cover every platform that indexes, archives, and delivers journal articles to readers worldwide.

JATS 1.3 · NISO Z39.96

Full-Text XML for Archives and Readers

The Journal Article Tag Set encodes a complete scholarly article — front matter, full body, tables, figures, equations, and a structured reference list with linked DOIs.

Required by
PubMed Central for full-text deposit
Validated by
JATS DTD · PMC Style Checker · JATS4R
Figures
SVG from OOXML chart data · PNG fallback
Math
MathML in <disp-formula>
Use for: PMC deposit · journal website · full-text archiving · XML readers
PubMed Publisher DTD · ArticleSet

Citation XML for Indexing and Discovery

The PubMed Publisher DTD feeds the 36-million-record PubMed database. Citation metadata and abstract only — not the article body.

Required by
NLM for MEDLINE/PubMed citation submission
Root element
<ArticleSet>
Delivery
SFTP to NLM PubMed citation loader
Verified against
Published article landing page
Use for: PubMed indexing · MEDLINE submission · citation registration
Additional formats on request — Crossref Deposit XML · DataCite XML · Dublin Core / OAI-PMH · ONIX for Journals · TEI XML
Importance

What structured XML makes possible

The published article already exists. Structured XML does not add to it or alter it. It makes what is already there legible to every system that needs to read it.

🔍

Discoverability

Indexing databases read structured XML fields directly. Articles without XML depend on PDF text extraction, which routinely misses authors, affiliations, and structured references.

📚

PubMed and MEDLINE indexing

PubMed citation records are structured XML submissions from the publisher. Accurate author names, correct DOIs, and complete pagination all depend on the quality of the XML submitted.

🗄️

Long-term preservation

XML is format-independent. A JATS file created today can be validated, rendered, and re-published decades later, regardless of the software that produced it.

🔗

Reference linking

Inline citations linked to structured reference entries enable CrossRef Cited-by, Scite, and reference managers to track citation relationships at the sentence level.

Accessibility

Screen readers, reflow layouts, and alternative-text rendering depend on document structure. JATS XML with proper alt-text and semantic markup makes articles accessible in ways PDF cannot.

🌐

Platform independence

One validated JATS XML file can power a reading platform, a mobile app, a print-on-demand service, and a repository deposit simultaneously.

Services

What XRef converts

Every conversion preserves the published article verbatim. No text is rewritten, summarized, corrected, or deleted. XRef converts structure. It does not touch content.

01

Word → JATS Full-Text XML

The complete published Word article converted to a single validated JATS 1.3 file. Every section, table, figure, equation, and reference — structured and linked.

02

Word → PubMed Citation XML

A structured ArticleSet XML file confirmed against the published article landing page — journal identifiers, DOI, authors, abstract, and keywords.

03

PDF → JATS Full-Text XML

Full-text JATS from the published PDF with layout-aware extraction. Suitable for back-catalog articles where the original Word file is no longer available.

04

PDF → PubMed Citation XML

PubMed-ready citation XML with all ArticleSet fields confirmed against the journal's article page — suitable for back-catalog indexing submissions.

Process

From published file to validated XML

XRef follows a documented four-stage process for every article. Each stage produces a verifiable artifact. No stage is skipped.

1

Upload the article

Provide the DOCX or PDF, the published article URL, and optionally the PDF link. XRef pre-checks the file before conversion begins.

2

Verify metadata

The published article page is fetched and cross-checked. The publisher page is authoritative for DOI, ISSN, page range, and publication dates.

3

Convert and structure

The full JATS or PubMed XML is built section by section. Every inline citation is linked. Every URL in the reference list is wrapped in <ext-link>.

4

Validate and deliver

27 pre-flight assertions run against the produced XML. DTD validation passes. The ZIP package is assembled with a full conversion audit report.

Quality

Every article passes 27 structural assertions

XRef does not deliver XML that looks correct. It delivers XML that passes a documented set of assertions before leaving the system — and attaches the audit report so you can verify each one.

AssertionWhat it checksWhy it matters
A1 / A3Every inline citation marker is linked to a reference entry via <xref>Plain-text citation markers break reference links in every downstream reader
A4 / A5Every <table-wrap> has a label and caption; labels match source numberingMislabeled tables cannot be cross-referenced or cited
A10Every rid attribute resolves to a matching idBroken cross-references fail silently in validators but visibly in readers
A24Every URL in the reference list is an <ext-link>, not plain textPlain-text URLs cannot be resolved by reference managers or DOI resolvers
A27At least one ISSN is present in <journal-meta>Missing ISSN blocks PubMed and Crossref from resolving the journal record
The CheckRef platform

One scholarly quality platform, separate product surfaces

XRef is a standalone subdomain product, while login, users, and admin remain centralized in CheckRef.

CheckRef

Validate references and DOI metadata before publishing.

MetaRef

Audit published metadata quality and record health.

RefLens

Assess bibliography relevance and manuscript-topic support.

XRef

Convert published articles to validated JATS and PubMed XML.

ScholaRef

Author visibility and journal outreach after publication.

Questions

What publishers ask first

Does XRef change the article content?

No. XRef converts structure, not substance. The text of the published article — including every comma and source imperfection — is preserved verbatim. Source quirks are logged in the conversion audit report but never silently corrected.

What is the difference between JATS XML and PubMed Citation XML?

JATS XML is full-text: body, tables, figures, references, and all back matter. You deposit it at PMC or render it in a full-text viewer. PubMed XML (ArticleSet) is citation-only: bibliographic record and abstract but not the article body. You submit it to NLM for PubMed indexing.

What if the source file and the published article page disagree on metadata?

The published article page is authoritative for all metadata — DOI, ISSN, page range, publication date, ORCIDs. The source file is authoritative for body content. Every reconciliation is logged in the conversion report.

Does XRef work for bilingual or Arabic-language journals?

Yes. JATS supports <trans-title> and <trans-abstract> for bilingual content. Arabic content included is handled using Unicode UTF-8 throughout.

Which external validators does the output pass?

Every JATS XML file is checked against DTD well-formedness, 27 internal structural assertions, the JATS4R Schematron validator, and the PMC Style Checker.

Where do the output files go?

XRef delivers a ZIP package containing the main XML file, all figures in a figures/ subfolder, and a markdown audit report. Upload to the CheckRef JATS viewer at checkref.org/xml/viewer for visual review, or submit to PMC, Crossref, or your repository.

How does login work?

XRef uses the normal CheckRef login at checkref.org. Administrators grant access through the CheckRef admin panel; there is no separate XRef account.

Convert your first article today

Validated JATS and PubMed Citation XML, delivered with a full audit report. The published record — faithfully structured.