Distil About →
Journal · Methodology

How we grade evidence.

A, B, C, D, and the list we publish but never recommend.

Published 19 May 2026 · Sebastian Stallard

You probably have a bottle of magnesium on a shelf at home. There is a one-in-three chance the form on the label is magnesium oxide, which absorbs at well under 4% in head-to-head studies. There is a smaller chance it is magnesium glycinate, which absorbs at around 40%. Both bottles say "magnesium" on the front. In Distil's evidence rubric, they sit at opposite ends of the scale.

The grade is the missing column on the back of every supplement bottle in the UK.

Publish the rejection list.

Distil grades every compound in the database against four published levels: A, B, C, and D. Grades A and B make it into reports when a profile calls for them. Grade C compounds appear with an explicit "emerging evidence" flag in the card. Grade D never appears in a report. But the list of Grade D compounds is published, and it is the part of this post that should be read most carefully.

A grading rubric without a public rejection list is theatre. The grade is meaningful in proportion to the grades it rules out. So Distil publishes the rejection list; three entries on it are worth a closer look.

Kava. Kava is a traditional Pacific plant historically used for anxiety. There is a defensible clinical case for its anxiolytic effect. There is also a documented case for severe liver injury in some users, including liver failure requiring transplant. Multiple national regulators including Germany's BfArM, France's ANSM, and the UK's MHRA have at various points withdrawn or restricted kava-containing products in response to case clusters. The mechanism is not fully understood but appears to involve specific kavalactone metabolism in the liver and to vary by extraction method (ethanolic and acetonic extracts behave differently from traditional aqueous preparations). Passionflower and lemon balm both have anxiolytic evidence at smaller effect sizes, with cleaner safety profiles. The honest position is that the kava trade-off is unfavourable for an unsupervised over-the-counter supplement, and the safer alternatives sit at acceptable risk.

St John's Wort. St John's Wort has reasonable depression evidence at the mild-to-moderate range. The grading problem is not whether it works. It is what it does to the rest of the medicine cabinet. St John's Wort is one of the strongest natural inducers of the CYP3A4 enzyme in the liver, and CYP3A4 metabolises roughly half of all prescription medicines. The specific interactions that put St John's Wort on the rejection list as an over-the-counter supplement: reduced warfarin levels and unpredictable INR; reduced contraceptive levels and reports of contraceptive failure; reduced ciclosporin levels and post-transplant rejection risk; reduced antiretroviral levels and HIV treatment failure. None of these are theoretical. The Markowitz 2003 JAMA pharmacokinetic study quantified the underlying mechanism: 14 days of St John's Wort doubled the clearance of alprazolam (a CYP3A4 probe substrate) and halved its plasma half-life from 12.4 to 6.0 hours, with the implication generalising to the ~50% of marketed medicines metabolised by the same enzyme. Case reports for each of the named clinical interactions are extensive in the literature. A medicine cabinet that contains St John's Wort and any of those prescriptions needs a clinician review, not a personal supplement report. (Our free interactions checker carries the St John's Wort pairs, and the supplements and SSRIs guide covers the serotonergic side.)

High-dose beta-carotene in smokers. This is the cleanest demonstration of why population context matters in grading. Beta-carotene is a vitamin A precursor; in non-smokers, supplementation at typical doses has a defensible safety profile. In smokers, two large randomised trials both produced harm. ATBC (29,133 Finnish male smokers, 1985 to 1993) found an 18% relative increase in lung cancer incidence in the beta-carotene arm. CARET (18,314 smokers and asbestos-exposed workers, 1985 to 1996) was stopped early after a 28% relative increase in lung cancer incidence and a 17% increase in all-cause mortality in the supplemented arm. Both trials were powered to detect benefit and detected harm. The grading consequence: high-dose beta-carotene is plausibly Grade A in non-smokers and Grade D in current or recent smokers. The grade depends on the population.

The full Grade D list at distil.health/about/methodology also covers high-dose isolated B6 above 50mg, proprietary blends, generic testosterone boosters, fat-burner and thermogenic stacks, kratom, yohimbine, comfrey, pennyroyal, chaparral, and DHEA precursors. Each entry has its own reason; each is on the list because the trade-off is unfavourable.

The considered list is, in many reports, longer than the recommendation list. That asymmetry is the product.

The rubric, in full.

Now the apparatus that produced the rejection list above. The rules are short, and they are the same for everyone.

Grade A. Multiple high-quality randomised controlled trials. At least one meta-analysis. Consistent direction of effect across studies. Safety profile established at the recommended dose.

Magnesium glycinate at 300 to 400mg elemental for sleep and blood pressure sits here, anchored on the Kass 2012 meta-analysis (22 trials, dose-dependent reductions in systolic and diastolic blood pressure). So does omega-3 EPA at 2 to 4g for cardiovascular outcomes, off the back of the REDUCE-IT trial and the Mozaffarian meta-analysis. So does vitamin D3 plus K2 for bone, immune, and respiratory outcomes, anchored on the Martineau 2017 BMJ individual-patient-data meta-analysis of 25 RCTs.

Grade B. Randomised controlled trials with consistent results, but a limited or absent meta-analysis. Acceptable safety profile. The mechanism is well characterised. Standardised ashwagandha extracts for cortisol modulation sit here. So do creatine monohydrate for muscle and cognitive outcomes, and absorption-enhanced curcumin preparations with proper RCT support.

Grade C. Mechanistic evidence is strong, but the clinical evidence is small, inconsistent, or extrapolated from related forms. Every Grade C compound in a Distil report is flagged in the compound card as emerging evidence. The reader is told, in plain English, that the mechanism is plausible but the trials are not yet at scale. The compound may be in the report. The uncertainty is not hidden.

Grade D is the list above.

The grade has to be per-form, not per-name.

Here is where most published "grading" goes wrong. A compound gets a single grade and the grade does not survive the abstraction.

Take magnesium, which the bottle on the shelf forced us to use as the opening. The Grade A evidence is for specific forms: glycinate for sleep, malate for fatigue, citrate for general use, threonate for cognition. The Grade A evidence is not for "magnesium" as a label. Magnesium oxide, which dominates the supermarket aisle because it is cheap and shelf-stable, absorbs at under 4% in the small intestine. The clinical evidence for magnesium glycinate at 300mg elemental does not transfer to magnesium oxide at the same label dose.

The same problem appears in vitamin D. The Grade A respiratory and immune evidence anchored on Martineau 2017 is for vitamin D3 (cholecalciferol). Vitamin D2 (ergocalciferol) raises 25-hydroxyvitamin D levels less efficiently at the same dose in head-to-head trials. The label says "vitamin D." The bioavailable form is doing different work. (We unpacked this one in full in vitamin D2 versus D3.)

And the same problem appears in folate, with higher stakes. Folic acid is the synthetic form on most supermarket B-complexes and on many prenatal multivitamins. Methylfolate (L-5-MTHF, sold as Quatrefolic or Metafolin) is the methylated active form that enters the methylation cycle without needing enzymatic conversion. An estimated 40% of the UK population carries an MTHFR enzyme variant that meaningfully reduces conversion of folic acid to the active form. For an MTHFR-variant carrier trying to conceive, the difference between folic acid and methylfolate on the front of the bottle is not academic.

In a Distil report, the grade is per-form. The card recommends magnesium glycinate, or omega-3 EPA-dominant from an IFOS-certified source, or vitamin K2 in the MK-7 form rather than MK-4, or methylfolate rather than folic acid where the profile warrants it. The form is what carries the grade. The label is not. We wrote a whole companion essay on this, the form is the molecule, and a plain-English guide on how to read a supplement label.

What "evidence-based" has to mean.

A grading system that is not maintained becomes a fiction. Calibration is what the word "evidence-based" has to mean if it is to mean anything at all. The compound list, the grading rubric, the Grade D rejections, the citations behind each claim, and the schedule on which the work is rechecked: all of it has to be published, otherwise the phrase is a brand asset, not a methodology.

The April 2026 cycle is the most recent example of what the maintenance actually looks like. The database was put through seventeen parallel evidence audits, sixteen covering six compounds each and one running a missing-compound search. Each agent went compound by compound through PubMed: verifying that every cited PMID points to the paper we claim it does; checking that journal name, year, and first author match; resolving previously-unverified citations; integrating any 2024 to 2026 RCT or meta-analysis the database had not yet caught up with.

The pass changed thirty-three citations. Twenty were PMID corrections, where the cited paper pointed to a different study than we had claimed. Eight resolved previously-unverified residuals. Eleven phantom citations were dropped. Seven new RCTs and meta-analyses from 2024 to 2026 were integrated.

The full log is open at distil.health/about/methodology. Every change is named, dated, and traceable. The page carries a freshness stamp, and the report viewer surfaces a warning if the database has not been recalibrated in the past 100 days. If you want the plain-English version of what a grade means before you read the log, start with what evidence-graded means.

The closing line.

A grade hides until a label is read carefully. The grade is per-form, not per-name. The list of things we reject is sometimes longer than the list of things we recommend, because that is what an honest rubric produces.

Publish the rejections, name the forms, log the citations, recheck the work. Everything else is decoration.

Sebastian
Founder · Distil
Keep reading

/about/methodology: the live grading rubric, the full Grade D list, and the dated calibration log.

/journal/the-form-is-the-molecule: why the grade has to be per-form, worked through five compounds.

/guides/what-evidence-graded-means: the plain-English version of how we rate a supplement.

/tools/interactions-checker: the free tool that flags supplement-medication interactions, every pair cited.

Sources

Specific studies behind the clinical claims in this essay, verified against PubMed. For the rules behind every recommendation and exclusion in a Distil report, including the full bibliography per compound and the live Grade D list, see distil.health/about/methodology.