ten things to try in 2017 --- just for selt-motivation
2017-02-22 14:43
393 查看
[reprinted article]
Rebecca
Bilbro
2016 marked a zenith in the data science renaissance. In the wake of a series of articles and editorials declaiming the shortage
of data analysts, the internet responded in force, exploding with blog posts, tutorials, and listicles aimed at launching the beginner into the world of data science. And yet, in spite of all the claims that this language or that library
make up the essential know-how of a "real" data scientist, if 2016 has taught us anything it's that the only essential skill is a willingness to keep learning.
U.S. Chief Data Scientist DJ Patil famously referred to data science as “a
team sport” — and within an organization, data science does work best when practiced collaboratively. But the emerging field of data science is more organic and mutable than it is systematic and coordinated. Data scientists must continue learning
new domains, languages, techniques, and applications so as to move forward as the field continues to evolve. In the age of the
data product, a more apt analogy for data science is the amoeba — an organism continually in motion, altering its shape, spreading and changing. For this reason, it's probably more difficult to stay a data scientist than it is to become one.
For those of us who spent 2016 reading all those articles ("50 essential things every data scientist MUST know" and "How to spot a FAKE data scientist"), taking Coursera and Codeacademy courses,
following Analytics Vidhya tutorials, competing in Kaggle competitions,
and trolling Kirk Borne on Twitter, now is a good time to think about what comes next.
So now you're a data scientist, congrats; where do you go from here?
In the spirit of New Years resolutions, here's a list of 10 (technical) things for the intermediate data scientist to try in 2017 — things you do can do to push yourself forward, keep your edge, set yourself apart, and be a better data scientist by 2018.
Let's assume you're an old hat at data analytics, but how systematic is your process? If the answer is "not very," you might want to consider establishing a more
structured path to discovery. Not only can a more systematic approach make you more efficient, but it can also ensure that you consistently consider a broader range of techniques for each problem, including things like graph
analytics andtime dynamics.
Love CSV? So do we! CSVs are great — simple, compact, distributable, and gotta love those header rows. But
getting overly comfortable with wrangling one particular type of data can be limiting. This coming year, why not expand your analytical range by trying out a new serialization format like JSON or XML? Work mainly with categorical data? Try experimenting with
a time series analysis. Mostly use relational data? Try your hand at unstructured text or geospatial
data like rasters.
Have you fallen into an algorithmic comfort zone? People like to argue about which machine learning model is the best, and everyone seems to have their favorite! Sure, picking a good model is important, but it's debatable whether a model can actually be 'good'
devoid of the context of the domain, the hypothesis, the shape of the data, and the intended application. Fortunately, high-level Python libraries like Scikit-learn (also Tensorflow, Theano, NLTK, Gensim, and Spacy) provide APIs that make it easy to test and
compare a host of models without additional data wrangling. In 2017, build breadth by exploring new models — honestly, it has become lazy not to!
We're living in an age where any data scientist with a little Python know-how can us
4000
e a library like Scikit-Learn to predict the future, but few can describe what's actually happening under the hood. Guess what? Clients and customers are becoming more discerning
and demanding more interpretability. For the many self-taught machine learning practitioners out there, now's the time to learn how that algorithm you love so much actually works. For Scikit-Learn-users, check out thedocumentation to
find a link to the paper used in the implementation for each algorithm. You can also check out some of our previous posts to learn how things like PCA, distributed
representations,skipgram, and parameter
tuning work in theory as well as practice.
The machine learning process often combines a series of transformers on raw data, transforming the data set each step of the way until it is passed to the fit method of a final estimator. A pipeline is
a mechanism for sanely combining these steps — a step-by-step set of transformers that takes input data and transforms it, until finally passing it to an estimator at the end. Pipelines can be constructed using a named declarative syntax so that they're easy
to modify and develop. If you're just getting started with pipelines, check out Zack Stewart's excellent post
on the topic.
Data scientists tend to leave a massive amount of technical debt in their wake. But what do you think will happen to those data scientists when all the good software engineers figure out how to do logistic regressions?
In 2017, boost your software engineering skills by pushing yourself to develop higher quality code, to build object-oriented,
reusable methods, and to practice good habits like writing documentation and using exception
handling to facilitate better communication with the team (and with future you!).
Move over imposter syndrome and meet your know-it-all sibling, contempt culture! Now that you're a data scientist and have accumulated enough confidence
to override the natural impulse toward self-doubt, there's a tendency to get a bit cocky. Don't! Stay humble by pushing yourself to learn a new programming language. Know Python? Try teaching yourself Javascript or CSS. Know R? Branch out to learn Julia or
master SQL.
What's the biggest security risk for a modern business? Hiring a data scientist! Why? It's because data scientists often unknowingly expose their companies to massive security vulnerabilities. Attackers are interested in all kinds of data, and as Will Voorhees
says in Eat Your Vegetables, as data scientists we often mistakenly think we can rely on the magic information security elves
to protect our precious data. In 2017, make an effort to learn about encryption, account separation, and temporary credentials — and for the love of Hilary Mason, stop committing your access tokens to Github.
Still have scripts running all night in the hopes you'll wake up to results to enjoy with your morning coffee? You're out of excuses; it's time to get on the MapReduce bandwagon and teach yourselfHadoop and Spark.
Know what else will speed things up? More efficient code! One low-hanging fruit is mutable data structures. Sure, Pandas data frames are great, but did you ever wonder what makes those lookups, joins, and aggregations so easy? It's holding a bunch of data
in memory all at the same time. In 2017, try switching to NumPy arrays and see what you think.
You may not realize it, but as a data scientist, you are already significantly involved in the open source community. Nearly every single one of the tools we use — Linux, Git, Python, R, Julia, Java, D3, Hadoop, Spark, PostgreSQL, MongoDB — is open source.
We look to StackOverflow and StackExchange to find answers to our programming questions, grab code from blog posts, and
project — it's not just for data science karma, it's also a way to build up your GitHub cred! A lot of the senior data scientists I know don't even look at candidates' resumes anymore; someone's Github portfolio and commit history often tell volumes more.
In 2016, the world of analytics, machine learning, multiprocessing, and programming got a lot bigger. The result of the data science eruption has been a broader and more diverse community of colleagues, people who will meaningfully augment not only the quantity
but the quality of the next generation of data products.
And yet, this expansion has also meant that the field of data science began to lose some of its mysticism and cache. As new practitioners flood the market, data scientist salaries have started to drop off, from highs
in the $200K-range to ones topping out closer to $150K; as Barb Darrow signaled in her 2015
Fortune article, "Supply, meet demand. And bye-bye perks."
So how can you distinguish yourself in a landscape which may once have felt impenetrable, but has now started to feel routine? Whether you use ours or set your own, pick ten things you can do over the next year to keep your mind sharp and your skills current,
and remember — when it comes to data science, nothing endures but change!
Ten Things to Try in 2017
New Years Resolutions for the Intermediate Data Scientist
RebeccaBilbro
2016 marked a zenith in the data science renaissance. In the wake of a series of articles and editorials declaiming the shortage
of data analysts, the internet responded in force, exploding with blog posts, tutorials, and listicles aimed at launching the beginner into the world of data science. And yet, in spite of all the claims that this language or that library
make up the essential know-how of a "real" data scientist, if 2016 has taught us anything it's that the only essential skill is a willingness to keep learning.
U.S. Chief Data Scientist DJ Patil famously referred to data science as “a
team sport” — and within an organization, data science does work best when practiced collaboratively. But the emerging field of data science is more organic and mutable than it is systematic and coordinated. Data scientists must continue learning
new domains, languages, techniques, and applications so as to move forward as the field continues to evolve. In the age of the
data product, a more apt analogy for data science is the amoeba — an organism continually in motion, altering its shape, spreading and changing. For this reason, it's probably more difficult to stay a data scientist than it is to become one.
For those of us who spent 2016 reading all those articles ("50 essential things every data scientist MUST know" and "How to spot a FAKE data scientist"), taking Coursera and Codeacademy courses,
following Analytics Vidhya tutorials, competing in Kaggle competitions,
and trolling Kirk Borne on Twitter, now is a good time to think about what comes next.
So now you're a data scientist, congrats; where do you go from here?
Okay, you're a data scientist, now what?
In the spirit of New Years resolutions, here's a list of 10 (technical) things for the intermediate data scientist to try in 2017 — things you do can do to push yourself forward, keep your edge, set yourself apart, and be a better data scientist by 2018.
1. Adopt repeatable, systematic processes
Let's assume you're an old hat at data analytics, but how systematic is your process? If the answer is "not very," you might want to consider establishing a morestructured path to discovery. Not only can a more systematic approach make you more efficient, but it can also ensure that you consistently consider a broader range of techniques for each problem, including things like graph
analytics andtime dynamics.
2. Explore a new data type
Love CSV? So do we! CSVs are great — simple, compact, distributable, and gotta love those header rows. Butgetting overly comfortable with wrangling one particular type of data can be limiting. This coming year, why not expand your analytical range by trying out a new serialization format like JSON or XML? Work mainly with categorical data? Try experimenting with
a time series analysis. Mostly use relational data? Try your hand at unstructured text or geospatial
data like rasters.
3. Break out of your machine learning rut
Have you fallen into an algorithmic comfort zone? People like to argue about which machine learning model is the best, and everyone seems to have their favorite! Sure, picking a good model is important, but it's debatable whether a model can actually be 'good'devoid of the context of the domain, the hypothesis, the shape of the data, and the intended application. Fortunately, high-level Python libraries like Scikit-learn (also Tensorflow, Theano, NLTK, Gensim, and Spacy) provide APIs that make it easy to test and
compare a host of models without additional data wrangling. In 2017, build breadth by exploring new models — honestly, it has become lazy not to!
4. Learn how your favorite models actually work
We're living in an age where any data scientist with a little Python know-how can us4000
e a library like Scikit-Learn to predict the future, but few can describe what's actually happening under the hood. Guess what? Clients and customers are becoming more discerning
and demanding more interpretability. For the many self-taught machine learning practitioners out there, now's the time to learn how that algorithm you love so much actually works. For Scikit-Learn-users, check out thedocumentation to
find a link to the paper used in the implementation for each algorithm. You can also check out some of our previous posts to learn how things like PCA, distributed
representations,skipgram, and parameter
tuning work in theory as well as practice.
5. Start using pipelines
The machine learning process often combines a series of transformers on raw data, transforming the data set each step of the way until it is passed to the fit method of a final estimator. A pipeline isa mechanism for sanely combining these steps — a step-by-step set of transformers that takes input data and transforms it, until finally passing it to an estimator at the end. Pipelines can be constructed using a named declarative syntax so that they're easy
to modify and develop. If you're just getting started with pipelines, check out Zack Stewart's excellent post
on the topic.
6. Build up your software engineering chops
Data scientists tend to leave a massive amount of technical debt in their wake. But what do you think will happen to those data scientists when all the good software engineers figure out how to do logistic regressions?In 2017, boost your software engineering skills by pushing yourself to develop higher quality code, to build object-oriented,
reusable methods, and to practice good habits like writing documentation and using exception
handling to facilitate better communication with the team (and with future you!).
7. Learn a new programming language
Move over imposter syndrome and meet your know-it-all sibling, contempt culture! Now that you're a data scientist and have accumulated enough confidenceto override the natural impulse toward self-doubt, there's a tendency to get a bit cocky. Don't! Stay humble by pushing yourself to learn a new programming language. Know Python? Try teaching yourself Javascript or CSS. Know R? Branch out to learn Julia or
master SQL.
8. Consider data security
What's the biggest security risk for a modern business? Hiring a data scientist! Why? It's because data scientists often unknowingly expose their companies to massive security vulnerabilities. Attackers are interested in all kinds of data, and as Will Voorheessays in Eat Your Vegetables, as data scientists we often mistakenly think we can rely on the magic information security elves
to protect our precious data. In 2017, make an effort to learn about encryption, account separation, and temporary credentials — and for the love of Hilary Mason, stop committing your access tokens to Github.
9. Make your code go faster
Still have scripts running all night in the hopes you'll wake up to results to enjoy with your morning coffee? You're out of excuses; it's time to get on the MapReduce bandwagon and teach yourselfHadoop and Spark.Know what else will speed things up? More efficient code! One low-hanging fruit is mutable data structures. Sure, Pandas data frames are great, but did you ever wonder what makes those lookups, joins, and aggregations so easy? It's holding a bunch of data
in memory all at the same time. In 2017, try switching to NumPy arrays and see what you think.
10. Contribute to an open source project
You may not realize it, but as a data scientist, you are already significantly involved in the open source community. Nearly every single one of the tools we use — Linux, Git, Python, R, Julia, Java, D3, Hadoop, Spark, PostgreSQL, MongoDB — is open source.We look to StackOverflow and StackExchange to find answers to our programming questions, grab code from blog posts, and
pip installlike there's no tomorrow. In 2017, consider giving back by making your own contributionto an open source
project — it's not just for data science karma, it's also a way to build up your GitHub cred! A lot of the senior data scientists I know don't even look at candidates' resumes anymore; someone's Github portfolio and commit history often tell volumes more.
Conclusion
In 2016, the world of analytics, machine learning, multiprocessing, and programming got a lot bigger. The result of the data science eruption has been a broader and more diverse community of colleagues, people who will meaningfully augment not only the quantitybut the quality of the next generation of data products.
And yet, this expansion has also meant that the field of data science began to lose some of its mysticism and cache. As new practitioners flood the market, data scientist salaries have started to drop off, from highs
in the $200K-range to ones topping out closer to $150K; as Barb Darrow signaled in her 2015
Fortune article, "Supply, meet demand. And bye-bye perks."
So how can you distinguish yourself in a landscape which may once have felt impenetrable, but has now started to feel routine? Whether you use ours or set your own, pick ten things you can do over the next year to keep your mind sharp and your skills current,
and remember — when it comes to data science, nothing endures but change!
相关文章推荐
- Two things need to note for unpack in Perl
- A person just needs three things to be truly happy in the world ....
- Top 10 Things to Be Thankful for in .NET
- How to move and rotate things in 3D
- here is aprogram for doing it in the mobile to mobile
- Solution for exercise 1.3-7 in Introduction to Algorithms
- Code view is missing in SharePoint Designer Beta 2 when you try to edit a WSS v3 site.
- Solution for exercise 1.1-4 in Introduction to Algorithms
- Top 10 Things You Need to Know in java 6 beta 2
- Just do nothing, smile in the dream and waiting for the sun raising.
- To JavaScript Prompts for Buttons in Asp::DataGrid for Delete Column(ZT)
- [导入]How to keep a local variable in scope across a try and catch block?
- How to get Intellisense for Web.config and App.config in Visual Studio .NET?(转载)
- One tip for javascript to invoke variables in asp.net web page
- server 2 task(s) are sleeping waiting for space to become available in the log segment for database tempdb.
- here is aprogram for doing it in the mobile to mobile
- JS:Trim() in javascript, how to define a function of checkinput for a WebControl(ascx)
- Unable to find the report in the manifest resources. Please build the project, and try again.
- Solution for exercise 1.3-5 in Introduction to Algorithms
- Just do nothing, smile in the dream and waiting for the sun raising.