So it’s been… a year or so into my first real job. I’ve been doing data science things for KeyMe. Built some really cool shit, made some really big mistakes, here are some more lessons, live from the trap:

The most important thing a data scientist has to do is to get buy-in from management. It doesn’t matter how good your models or predictions are if nobody uses them. Nothing. Is. More. Important. NaN is a float in python. Make sure your test cases know that. The next most important thing is to never stop learning. Never stop being curious. Never stop asking why, because the moment you do someone else will surpass you. Most people at your company will be hot garbage with data. They will slice it in weird ways, create misleading results, and the worst part is that they will usually be doing it unintentionally. If you don’t teach them better, they’ll never stop. If you do the above to make yourself look better because you know nobody has the math to call you on it, you are a Bad Person. When solving a new problem, you want to figure out what other people have done, what works, what doesn’t work, and how to best fit your problem to their solutions. I cannot overemphasize this. Whatever problem you’re working on, someone has probably solved a variation of it already and you will get better results by fitting your work to theirs. And if you’re out here doing truly novel work, I hope you’re not getting your answers from a Medium article. Knowing what corners you can cut is crucial to startup culture. You’re going to make a lot of trade-offs in terms of time, resources, priorities, etc. and you’re going to fuck a lot of them up. It’s your job to learn from them and try to do better next time. You’re going to build a lot of shit under time constraints and decide ‘yeah this is good enough’, and you’re going to find out 2 years later it wasn’t good enough. Reflect, reflect, reflect. Try your best to not fuck up anything structural badly enough that it can’t be rebuilt. All data vendors suck, but some are useful. It’s my humble (and mostly uneducated) opinion that any good company using an out of the box solution will at some point, whether it be this year or 5 years down the line will run into the limitations of their solution. (If you don’t, your company isn’t ambitious or innovative enough.) However, THERE IS ABSOLUTELY NOTHING WRONG WITH THAT. If your out of the box solution gets your company through 5 years, that’s 4 years longer than most startups last! If using Tableau or whatever works well enough and is cost-effective for you, that’s great! But you have to be willing to reevaluate. https://towardsdatascience.com/data-science-for-startups-data-pipelines-786f6746a59a read this it’s so good. A data scientist is only as good as the data they have. Understand where possible gaps and flaws in your data exist, but understand that if you can’t account for these errors, there’s no point in bringing them up. You can always say “we need more data”, but you can never say “the data’s wrong” without a plan to fix it. In the immortal words of Sam Hinkie, you can be right for the wrong reasons. You can be wrong for the right reasons. The company will only see the end result. They will never see the rationale or the alternate realities where you were right. It is your job to divorce process from results and swing on the things you believe and predict to be true. Sometimes that will not be enough. Hinkie was fired, after all. You can find an entry level data scientist. You can find an entry level software engineer and make them a data engineer. You can set off a flare in the wilderness and three entry level data analysts will come running. You know how fucking hard it is to find an entry level data architect? The vast majority of real world models are not Kaggle competitions where you compete on a fixed test/train/validate split. It’s fine to stack a dozen neural networks on top of your GBM and lose all clarity into how your features interact in a stable dataset. But in the real world, if you can’t understand how your model changes when a feature does, you’re fired. Leadership will always take a 5% tradeoff in performance for a 50% increase in clarity. This has every format for pandas/python datetime things. Use it, abuse it. I don’t understand why this was so hard to Google. Googling Pandas datetime format or whatever usually doesn’t pull up the Python documentation. http://strftime.org Similarly, this site is pretty good at covering datetime properties. http://predictablynoisy.com/date-time-python.html You would be surprised at how little the people who control the pursestrings actually understand about data. If you don’t believe me, go to any conference about marketing or leadership where the hot topic is AI. All the leaders, all the C-Suite, they’ve got a veneer of buzzwords that they they use to gloss over the nitty-gritty of what’s actually happening. And that’s fine. It’s not their job to understand it, it’s their job to take what you claim to be able to do and figure out how to monetize that. Just keep that in mind. This would be funnier if XKCD made it (and have better labeled axes, I know I’m a shame to the profession), but you get the point.