The purpose of this study is to determine items which have differential item functioning (DIF) in TIMSS 2011 mathematics subtest with three different item response theory (IRT)-based DIF methods and compare results of these methods. For this purpose, DIF values obtained by Lord's Chi-Square, Raju's Area and Likelihood-Ratio Test methods were compared with respect to gender (males were the reference group while females were the focal group) to test whether these procedures yielded similar results. In addition, item purification was performed for each methods and results were compared in order to determine the effect of item purification. These comparisons can provide evidence for determining the best models for detecting DIF items. Results indicated that 2PL IRT model fitted best to the data for both Lord's Chi-Square method and Raju's Signed Area method. Although number of items detected as DIF differed for each methods, 2 out of 22 dichotomous items in the test observed consistently across all methods, which were more likely to be answered correctly by males after controlling for overall ability.