10 Minutes To Pandas



10 Minutes to pandas. GitHub Gist: instantly share code, notes, and snippets. Pandas 10분 완성 역자 주: 본 자료는 10 Minutes to Pandas (하단 원문 링크 참조)의 한글 번역 자료로, 번역은 데잇걸즈2 프로그램 교육생 모두가 함께 진행하였습니다.

From the above result, you can see that both results are equal.

The above process of mapping using a function can be visualised through the following animated video,

From the possible different types of arguments to the map function mentioned above, let’s use the “Indexed Series” type in this section. The people in our DataFrame are ready to provide their nicknames to us. Assume that the nicknames are provided in a Series object. We would like to map our “Name” column of the DataFrame to the nicknames. The condition is;

  • The index of the nicknames (called) Series should be equal to the “Name” (caller) column values.

Let’s construct the nicknames column below with the above condition,

Let’s map the above created Series to the “Name” column of the Datarame;

10 Minutes To Pandas Ipynb

The code for it is,

  • The major point of observation in applying the map function is – the index of the resultant Series index is equal to the caller index. This is important because we can add the resultant Series to DataFrame as a column.

Let’s add the resultant Series as a “nick_Name” column to the DataFrame, Find mac address for iphone 4s.

The above process of mapping using an indexed Series can be visualised through the following animated video,

Every single column in a DataFrame is a Series and the map is a Series method. So, we have seen only mapping a single column in the above sections using the Pandas map function. But there are hacks in Pandas to make the map function work for multiple columns. Multiple columns combined together form a DataFrame. There is a process called stacking in Pandas. “Stacking” creates a Series of Series (columns) from a DataFrame. Here, all the columns of DataFrame are stacked as Series to form another Series.

We have encoded the “M” and “F” values to 0 and 1 in the previous section. When building Machine Learning models, there are chances where 1 is interpreted as greater than 0 in doing calculations. But, here they are 2 different categories and are not comparable.

So, let’s store the data in a different way in our DataFrame. Let’s dedicate separate columns for male (“M”) and female (“F”). And, we can fill in “Yes” and “No” for a person based upon their gender. This introduces the redundancy of the data but solves our discussed problem above.

It can be done so by the following code,

Now, we shall map the 2 columns “Male” and “Female” to numerical values. To do so, we should take the subset of the DataFrame.

You can observe that we have a DataFrame of two columns above. The main point to note is both of the columns have the same set of possible values.

Thereafter, we will use the stacking hack and map two columns to the numerical values. This can be implemented using the following code,

If you observe the above code and results, the DataFrame is first stacked to form a Series. Then the map method is applied to the stacked Series. FInally unstacking it results in, numerical values replaced DataFrame.

10 Minutes To Pandas Game

In Machine Learning, there are routines to convert a categorical variable column to multiple discrete numerical columns. Such a process of encoding is termed as One-Hot Encoding in Machine Learning terminology.

We have discussed Pandas apply function in detail in another tutorial. The map and apply functions have some major differences between them. They are;

10 Minutes To Pandas Python

  • The first difference is;
    • map is only a Series method.
    • apply is both the Series and DataFrame method.
  • The second difference is;
    • map takes dict / Series / function as an argument
    • apply takes the only function as an argument
  • The third difference is;
    • map is an element-wise operation on Series
    • apply is used for complex element-wise operations on Series and DataFrame
  • The fourth difference is;
    • map is used majorly to map values using a dictionary
    • apply is used for applying functions that are not available as vectorized aggregation routines on DataFrames

A map function is used majorly to map values of a Series using a dictionary. Whenever you find any categorical data, you can think of a map method to convert them to numerical values. If you liked this tutorial on the map( ) function and like quiz-based learning, please consider giving it a try to read our Coffee Break Pandas book.

10 Minutes To Pandas Tutorial

Related Posts