Vision Transformers for Accurate Overlay Metrology
Performing fast and accurate metrology parameter extrac- tion is critical for semiconductor manufacturing and has a direct im- pact on the yield of the final product. Dark-field Digital Holographic Microscopy (df-DHM) offers a promising method that allows for the ex- traction of such parameters, but it’s effectiveness is often hindered by optical aberrations and coherent imaging effects. This thesis explores data-driven approaches which directly infer a metrology parameter of in- terest from df-DHM measurements affected by aberrations, without the need for any phase measurements. In particular, we investigate the ap- plication of Vision Transformers (ViTs) and compare the its performance to other well established architectures such as Convolutional Neural Net- works (CNNs) and Multilayer Perceptrons (MLPs). We utilize simulated df-DHM datasets that incorporate a variety of aberrations and coher- ent imaging effects, and perform extensive experiments to effectively compare the performance between the models in terms of accuracy, ro- bustness to aberrations, and data efficiency. We report that our models are capable of making fast and accurate metrology measurements from df-DHM images with fixed aberrations, offering similar performance df- DHM images that feature no aberrations. Furthermore, we find that ViTs consistently achieve low prediction error, and excel under limited data regimes compared to our baselines. These findings highlight the potential ViTs for robust and scalable optical metrology, especially in real-world semiconductor pipelines where obtaining large, high-quality labeled datasets can be time consuming and expensive.