Getting started with data science

Recently, a few of my friends have shown interest in what I do and the skill set required for my job. For those who don’t know, I am a market data developer. This means that I work with time-series databases to capture and store both real-time and historical data. I am also responsible for writing queries to help my users (i.e. researchers) analyze this data efficiently. Most of the times, when people say they work with big data, they are exaggerating. But I can promise you, market data is big data. In case you don’t believe me, let me tell you that our system captures around 4 billion rows daily.

Once I explain this to my friends, they are interested in finding out more. How do I capture so much data? What tools do I use to analyze this data? How can they get into data analysis? Where should they begin?

Learning data analysis can be tricky because it’s such a vague term. In its truest sense, you analyze data. But what kind of data are you analyzing? Which tools are you using to analyze? Out of hundreds of tools available out there, which is the most efficient one for analyzing certain type of data? What information are you trying to extract from the data?

Learn Python and Pandas

In my case, I deal with tick by tick market data such as every single tick for stocks such as Apple. I capture this data and use a time-series database to analyze it. If I am dealing with handful of rows, I use python and pandas to analyze the data. If you are new to data science, then I strongly recommend picking up python and pandas. Out of all the different tools available out there, python and pandas is probably the best combination. Both are widely used, easy to use, free and heavily documented. I am not saying that other tools are not as good as python/pandas but I don’t recommending starting with them.

To learn python and pandas, I recommend:

  1. learning python basics by doing the course on code academy
  2. reading a beginner’s python book and familiarize yourself with basic syntax. Learning Python is a good book.
  3. reading Python for data analysis to learn pandas. It’s written by Wes McKinney who created pandas.
  4. watching python related videos. There are a bunch of good python videos on youTube. I recommend starting with watching old PyCon videos and videos by user sentdex.

Download Python and iPython Notebook

While you are learning python, you should download python so it’s available for you to practice. I would recommend downloading it via Anaconda. Anaconda is a python distributor which comes with several popular packages pre-installed. It also comes with iPython Notebook which is a great tool for running and presenting python code.

Start Practicing

Once you have some basic knowledge of python and pandas, I would suggest getting some sample data and analyzing it. The analysis doesn’t have to be ground-breaking…it simply needs to be something you can use to practice. Take some sample data and learn different ways to manipulate it as well as present it to your audience.

I recommend practicing the basics such as:

  • reading/writing data from/to a csv file
  • cleaning the data by filtering, renaming columns, resetting indexes
  • manipulating the data by performing calculations, adding new columns
  • visualizing the data by plotting

The function of nitric oxide viagra sales canada deeprootsmag.org is elevation of blood circulation. You cannot go beyond the limit of the female organ that results in energetic and breathtaking sex browse around content levitra properien sessions with various climaxes. This can be quite dangerous because the medication might react badly with your sildenafil online without prescription body and lead to many severe problems. There is no any ED drug that can start its action tadalafil buy in usa within merely 20 minutes.
For example, I practiced by analyzing NYC’s traffic data and writing a script for analyzing my portfolio.

Bootcamps

I find data analysis easy to pick up yourself without having to spend thousands of dollars on a degree. However, these days there are several bootcamps available that will teach you data analysis. They can be expensive but effective. They can also help you find a job after your course is finished.

Final advice

To be a good data scientist requires genuine interest in taking some messy data, cleaning it and extracting some useful knowledge out of it. If this does not fascinate you then data science is not for you. Take your time learning the basic skills and familiarizing yourself with the art of data analysis. I hope you will enjoy it!

 

Leave a comment

Your email address will not be published. Required fields are marked *