linkedin 数据科学实习的5个经验总结
5 Things I Learned as a Data Science Intern
1. Spend time cleaning your data. Data is your starting point so make sure it is clean. This will only make things simpler as you proceed further and make your results more reliable.
花一些时间再数据清洗上面。数据是起点,尽可能使其干净。这些可以使接下来的工作更加简单,结果更加可信。
2. Start simple and start with a vision. As a data scientist at LinkedIn, you have access to Petabytes of data (1 Petabyte as much data as is transferred when viewing HDTV for about 13.5 years). It can be overwhelming if you try to make sense of it all at once. I have rather found it really useful to start out simple, get initial results and then iteratively improve my models. Also, it is absolutely essential to define what is it that you are trying to accomplish with your project when you start. This helps you direct your efforts and evaluate tradeoffs between accuracy and computation time/cost better.
从简单入手,并且要有目标。Linkedin的数据科学家,可以访问PB级的数据。如果企图一次理解这些数据,可能会存在很大的问题。从简单入手,得到初始结果,不断迭代的优化模型,是非常有作用的。同时,在创建项目之初,明确项目的目标是非常重要的。
3. Getting results is just the beginning. As a data scientist, an equally important part of your work is to interpret those results, understand what they mean and explain those results to others on the team.
得到结果只是开始。作为一名数据科学家,工作中很重要的一部分是解释这些结果,理解这些结果的意义,向其他人或者同事解释它们隐藏的含义。
4. The breadth of skills required to handle various steps from parsing data to interpreting the final results is very wide. Data Science is really a blend of Computer Science, Statistics, Machine Learning and some domain expertise depending on specific application (Sociology, Economics, Physics and the like). Know your strengths and don’t be afraid to ask for help when you need it. But try to learn something new every time you ask for help.
从数据转换到解释最终结果,需要经过很多步骤,并且,需要广泛的数据处理能力。数据科学是一门交叉学科,涉及到计算机科学,统计学,机器学习和一些基于实际应用的领域专家知识。明白自己的长处,不耻下问。每次寻找帮助,尝试学习一些新东西。
5. Have fun along the way. Very often, as a data scientist, you work on problems that haven’t been solved before. Wrong decisions and missed opportunities are all a part of the process as you try out new methods to solve it. Learn quickly and don’t be afraid to take the path less trodden. That’s where the treasure often lies.
享受玩转数据之乐。作为一名数据科学家,往往需要处理以前从没有解决的问题。错误决策和错过机会是尝试新方法解决问题过程中的一部分。快速学习,不要害怕困难,探索的过程,也是宝藏所在之处。
青春就应该这样绽放 游戏测试:三国时期谁是你最好的兄弟!! 你不得不信的星座秘密