Pandas is an open-source, BSD-licensed Python library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language. Python with Pandas is used in a wide range of fields including academic and commercial domains including finance, economics, Statistics, analytics, etc.

**Pandas is an open-source Python Library providing high-performance data manipulation and analysis tool using its powerful data structures.** The name Pandas is derived from the word **Panel Data – an Econometrics from Multidimensional data.**

In 2008, developer Wes McKinney started developing pandas when in need of high performance, flexible tool for analysis of data.

Prior to Pandas, Python was majorly used for data munging and preparation. It had very little contribution towards data analysis. Pandas solved this problem. **Using Pandas, we can accomplish five typical steps in the processing and analysis of data, regardless of the origin of data — load, prepare, manipulate, model, and analyze.**

Python with Pandas is used in a wide range of fields including academic and commercial domains including finance, economics, Statistics, analytics, etc.

## Key Features of Pandas

- Fast and efficient DataFrame object with default and customized indexing.
- Tools for loading data into in-memory data objects from different file formats.
- Data alignment and integrated handling of missing data.
- Reshaping and pivoting of date sets.
- Label-based slicing, indexing and sub setting of large data sets.
- Columns from a data structure can be deleted or inserted.
- Group by data for aggregation and transformations.
- High performance merging and joining of data.
- Time Series functionality.

Standard Python distribution doesn't come bundled with Pandas module. A lightweight alternative is to install NumPy using popular Python package installer, **pip.**

pip install pandas

**If you install Anaconda Python package, Pandas will be installed by default** with the following −

## Windows

**Anaconda**(from https://www.continuum.io) is a free Python distribution for SciPy stack. It is also available for Linux and Mac.**Canopy**(https://www.enthought.com/products/canopy/) is available as free as well as commercial distribution with full SciPy stack for Windows, Linux and Mac.**Python**(x,y) is a free Python distribution with SciPy stack and Spyder IDE for Windows OS. (Downloadable from http://python-xy.github.io/)

## Linux

Package managers of respective Linux distributions are used to install one or more packages in SciPy stack.

**For Ubuntu Users**

sudo apt-get install python-numpy python-scipy python-matplotlibipythonipythonnotebook python-pandas python-sympy python-nose

**For Fedora Users**

sudo yum install numpyscipy python-matplotlibipython python-pandas sympy python-nose atlas-devel

Pandas deals with the following three data structures −

- Series
- DataFrame
- Panel

These data structures are built on top of Numpy array, which means they are fast.

## Dimension & Description

**The best way to think of these data structures is that the higher dimensional data structure is a container of its lower dimensional data structure.** For example, DataFrame is a container of Series, Panel is a container of DataFrame.

Data Structure | Dimensions | Description |
---|---|---|

Series | 1 | 1D labeled homogeneous array, size immutable. |

Data Frames | 2 | General 2D labeled, size-mutable tabular structure with potentially heterogeneously typed columns. |

Panel | 3 | General 3D labeled, size-mutable array. |

Building and handling two or more dimensional arrays is a tedious task, burden is placed on the user to consider the orientation of the data set when writing functions. But using Pandas data structures, the mental effort of the user is reduced.

For example, with tabular data (DataFrame) it is more semantically helpful to think of the **index** (the rows) and the **columns** rather than axis 0 and axis 1.

### Mutability

**All Pandas data structures are value mutable (can be changed) and except Series all are size mutable**. Series is size immutable.

**Note** − DataFrame is widely used and one of the most important data structures. Panel is used much less.

## Series (C Array)

**Series is a one-dimensional array like structure with homogeneous data**. For example, the following series is a collection of integers 10, 23, 56, …

10 | 23 | 56 | 17 | 52 | 61 | 73 | 90 | 26 | 72 |

### Key Points

- Homogeneous data
- Size Immutable
- Values of Data Mutable

## DataFrame

DataFrame is a two-dimensional array with heterogeneous data. For example,

Name | Age | Gender | Rating |
---|---|---|---|

Steve | 32 | Male | 3.45 |

Lia | 28 | Female | 4.6 |

Vin | 45 | Male | 3.9 |

Katie | 38 | Female | 2.78 |

The table represents the data of a sales team of an organization with their overall performance rating. The data is represented in rows and columns. Each column represents an attribute and each row represents a person.

## Data Type of Columns

The data types of the four columns are as follows −

Column | Type |
---|---|

Name | String |

Age | Integer |

Gender | String |

Rating | Float |

### Key Points

- Heterogeneous data
- Size Mutable
- Data Mutable

## Panel

Panel is a three-dimensional data structure with heterogeneous data. It is hard to represent the panel in graphical representation. But a panel can be illustrated as a container of DataFrame.

### Key Points

- Heterogeneous data
- Size Mutable
- Data Mutable

#############################NUMPY Array##################

import numpy as np

a=[1,2,3,4]

print(a)

b=np.array(a,float)

print(b)

print(b[2])

print(b[-2])

print(b[1:4])

print(b.size)

print(b.shape)

print(type(b))

b=np.zeros(5)

print(b)

b=np.ones(8)

print(b)

b=np.array([[1,2,3,4],[1,2,3,4]],int)

print(b)

print(b[1,1])

print(b[-1,-1])

print(b.size)

print(b.shape)

print(type(b))

b=np.zeros((2,4))

print(b)

b=np.ones((2,4))

print(b)

###### ###########GRAPHS#######################

import matplotlib.pyplot as plt

x=[1,2,3]

y=[6,7,4]

plt.plot(x,y)

plt.show()

###### #######BAR CHART###############

import matplotlib.pyplot as plt

l=[1,2,3,4,5]

h=[10,24,36,40,5]

plt.bar(l,h,color=['red','blue'])

plt.show()

###### ########PIE CHART ##################

import matplotlib.pyplot as plt

a=['eat','sleep','work','play']

s=[3,7,8,6]

c=['r','m','g','b']

plt.pie(s,labels=a,colors=c)

plt.show()

###### ############SERIES#################

import pandas as pd

import numpy as np

s=pd.Series()

print("s=",s)

l=['a','b','c','d']

a=np.array(l)

s=pd.Series(a)

print(a)

print("s=",s)

#Labelled index

s=pd.Series(a,index=["x","y","z","w"])

print(a)

print("s=",s)

print(s[0])

print(s[["y","z"]])

###### #####################FRAMES ###########

import pandas as pd

import numpy as np

df = pd.DataFrame()

print(df)

data=[1,2,3,4,5]

df = pd.DataFrame(data)

print(df)

data=[['abc',10],['def',34],['ghi',23]]

df = pd.DataFrame(data)

print(df)

df = pd.DataFrame(data,columns=['name','age'])

print(df)

data={'name':['acd','asdc','edff'],'age':[23,32,34]}

df = pd.DataFrame(data)

print(df)

df = pd.DataFrame(data,index=['rank1','rank2','rank3'])

print(df)