App 2 — DFT vs ML Cost Calculator

Post 2 companion · Adjust parameters to see how the combined DFT+ML pipeline saves time

Parameters

Total compounds to screen 10,000

DFT time per structure (hours) 4 h

CPU cores per DFT job 16

Training set size (% of total) 10%

ML validation set (top candidates) 50

Pure DFT Cost

–

core-hours

DFT+ML Cost

–

core-hours

–

× faster with DFT + ML

– core-hours saved

Core-hours comparison

Pure DFT (all structures)–

100%

Training set DFT–

DFT validation (top candidates)–

The DFT + ML Pipeline

🗄️

Step 1 — Collect structures

Download candidate structures from databases: Materials Project, ICSD, AFLOW, or generate them computationally.

Cost: free / minimal

🧮

Step 2 — Generate DFT training data

Run DFT (VASP, WIEN2k, QE) on a representative subset. Calculate target properties: band gap, formation energy, etc. These become the labels for ML.

Cost: ~40,000 core-hours (10% of database)

🧠

Step 3 — Train the ML model

Train a model (e.g. ALIGNN, random forest, neural network) on the DFT-labelled data. Validate on a held-out test set. Tune hyperparameters.

Cost: GPU hours — much cheaper than DFT

⚡

Step 4 — Screen all candidates with ML

Apply the trained model to all remaining structures. Predict the target property for each in milliseconds. Rank by predicted value and filter.

Cost: minutes on a laptop

✅

Step 5 — Validate top candidates with DFT

Take the top 50–100 candidates from ML screening. Run full DFT calculations to verify. These are your final, reliable predictions.

Cost: ~200 core-hours (50 candidates)

⚡ DFT vs ML — Cost Explorer