Cryptology ePrint Archive: Report 2018/1063

In this work, we analyze a meta-data rich data leak from a Middle Eastern bank with a demographically-diverse user base. We provide an analysis of passwords created by groups of people of different cultural backgrounds, some of which are under-represented in existing data leaks, e.g., Arab, Filipino, Indian, and Pakistani.

The contributions provided by this work are many-fold. First, our results contribute to the existing body of knowledge regarding how users include personal information in their passwords. Second, we illustrate the differences that exist in how users from different cultural/linguistic backgrounds create passwords. Finally, we study the (empirical and theoretical) guessability of the dataset based on two attacker models, and show that a state of the art password strength estimator inflates the strength of passwords created by users from non-English speaking backgrounds. We improve its estimations by training it with contextually relevant information.