โšก DFT vs ML โ€” Cost Explorer

Post 2 companion ยท Adjust parameters to see how the combined DFT+ML pipeline saves time

Parameters

Pure DFT Cost
โ€“
core-hours
DFT+ML Cost
โ€“
core-hours
โ€“
ร— faster with DFT + ML
โ€“ core-hours saved

Core-hours comparison

Pure DFT (all structures)โ€“
100%
Training set DFTโ€“
DFT validation (top candidates)โ€“

The DFT + ML Pipeline

๐Ÿ—„๏ธ

Step 1 โ€” Collect structures

Download candidate structures from databases: Materials Project, ICSD, AFLOW, or generate them computationally.

Cost: free / minimal
๐Ÿงฎ

Step 2 โ€” Generate DFT training data

Run DFT (VASP, WIEN2k, QE) on a representative subset. Calculate target properties: band gap, formation energy, etc. These become the labels for ML.

Cost: ~40,000 core-hours (10% of database)
๐Ÿง 

Step 3 โ€” Train the ML model

Train a model (e.g. ALIGNN, random forest, neural network) on the DFT-labelled data. Validate on a held-out test set. Tune hyperparameters.

Cost: GPU hours โ€” much cheaper than DFT
โšก

Step 4 โ€” Screen all candidates with ML

Apply the trained model to all remaining structures. Predict the target property for each in milliseconds. Rank by predicted value and filter.

Cost: minutes on a laptop
โœ…

Step 5 โ€” Validate top candidates with DFT

Take the top 50โ€“100 candidates from ML screening. Run full DFT calculations to verify. These are your final, reliable predictions.

Cost: ~200 core-hours (50 candidates)
1. What is the main role of DFT in the DFT+ML pipeline?
2. Why is ML prediction much faster than DFT for the same property?
3. After screening 100,000 compounds with ML and finding 50 promising candidates, what should you do next?
SCORE
0 / 3