Numerical weather prediction models play an important role in the field of wind energy, for example, in power forecasting, resource assessment, wind farm (wake) simulations, and load assessment. Continuous evaluation of their performance is crucial for successful operations and further understanding of meteorology for wind energy purposes. However, extensive offshore observations are rarely available. In this paper, we use unique met mast and Lidar observations up to 315 m from met mast “IJmuiden,” located in the North Sea 85 km off the Dutch coast, to evaluate the representation of wind and other relevant variables in three mainstream meteorological models: ECMWF‐IFS, HARMONIE‐AROME, and WRF‐ARW, for a wide range of weather conditions. Overall performance for hub‐height wind speed is found to be comparable between the models, with a systematic wind speed bias <0.5 m/s and random wind speed errors (centered RMSE) <2 m/s. However, the model performance differs considerably between cases, with better performance for strong wind regimes and well‐mixed wind and potential temperature profiles. Conditions characterized by moderate wind speeds combined with stable stratification, which typically produce substantial wind shear and power fluctuations, lead to the largest misrepresentations in all models.