Diolinux

5 steps for you to start your journey in the field of Data Science

Diolinux

You may have heard about Data Science, Big Data and lots of terms related to Artificial Intelligence. Therefore, more and more people are interested in becoming professionals in these areas. But how to start?

Start with the basics

If you don’t know anything about Data Science you will probably be confused by so many new terms and concepts.

The ideal is to start learning about the subject to know which way to go. Preferably start looking for Portuguese content, so you will minimize the learning curve, one of the good options for finding content in our language is the blog Mining Data. There you will find technical articles on Machine Learning, Data Manipulation, Data Analysis, Concepts and everyday tasks of a Data Scientist, it is nice for those who are starting from scratch.

Obviously you will find a lot of material in English, this should not be a hindrance

Fundamental disciplines

As everything new requires study to be done more effectively, studying disciplines that are the basis for these technologies is fundamental.

For example, knowing the basics of Mathematics and Statistics will be very important when it comes to understanding how the algorithms work. Then, research about the most used programming languages, which are the best tools used by professionals and mainly how to install in your favorite Linux distribution, most of these projects that may have a high scaling factor run on Linux.

Python or R languages

R vs Python

Research which programming languages ​​and platforms are most used in projects in this area.

For example, if you like Python you are already a step ahead, as a large portion of Data Science projects use this language as their main one.

Python, in addition to being a known language among developers, is also very well accepted in the academic community. In addition to being robust and very friendly, there are several libraries ready to work with Data Science. Look on here an example of data manipulation using a powerful library and realize the power of that language.

This language, being so widespread, became a success in this area, both for its ability and for its simplicity.

Another language that is very important in this area is R. THE R is a statistical language widely used in the academic community, in addition, the R contains many ready-to-use libraries and packages. Libraries for mathematical calculations, data visualization, processing, among others.

Because it is a very simple language, it has become a widely used language in Data Science projects. This took the R unless it’s just a language used only in academia.

So which one to choose? Python or R?

I suggest that you choose the one that suits you best. It will really depend on the project. Don’t try to study both languages ​​at once, you will end up lost with so much information.

I strongly recommend that you choose the language that interests you most and study it.

Learn Machine Learning

Machine Learning

You can’t talk about Data Science without talking about Machine Learning.

Machine Learning or “Machine Learning” is an area that has been growing a lot and is increasingly present in our daily lives. An interesting example is:

How does Google manage to classify spam emails for thousands of email accounts? And let’s agree that he rarely misses, right?

Knowing about Machine learning is essential for any Data Scientist, but you don’t need to know and know how to use all the algorithms right away.

To start, choose some of the most used algorithms and try to learn how they work and it is already in great size.

Below are examples of tasks that use Machine Learning and that there are several articles, tutorials and books available for you to learn about the algorithms for free:

– Sentiment Analysis: This task is widely used by applying Machine Learning to texts, where the algorithm wants to learn to classify positive, negative or neutral test data.

– Prediction of Values: This task typically uses Machine Learning algorithms that use regression to learn patterns and predict values. A well-known example would be to predict property prices in a region.

– Data Grouping: Algorithms that group data use Machine Learning to discover similar information in the data which allows you to create similar groups. An application of this type of algorithm is a bank that wants to group customers into categories without having to define very well-established criteria for this.

– Image recognition: This is a task very well used by Facebook. The social network uses Machine Learning algorithms to identify people’s faces through photos.

The examples mentioned above are widely used today, it shows that Machine Learning is here to stay and the trend is that we will have more and more solutions using this technology.

Putting the “hands on”!

The best way to learn something is to put it into practice. But how to start?

Well, a great way to start learning about these technologies is to search for free databases and practice on small projects. A very interesting site is the Kaggle, you can download several databases for free and start playing with the data.

As I mentioned at the beginning, for those who want to download material in Portuguese and find databases to download, along with free codes, you can find on Mining Data.

In addition to the blog I mentioned above, follow other sources of study for you to check, enjoy without moderation ?

  1. Data Science Blogs:
    1. Kdnuggets
    2. Data Science Central
  2. Python: Python Brazil
  3. Machine Learning: Coursera (Machine Learning course)
  4. Statistics: Statitics.org

I hope this article has helped you on your journey to start in Data Science!

I thank Rodrigo Santana Ferreira for his collaboration with the text.

To the next!