Python – read csv, tsv, excel files with pandas

Python – read csv, tsv, excel files with pandas

I will focus on data input and output using pandas here, but there are many other libraries that can help you read and write data in various file formats.

Inputs and outputs generally fall into several broad categories.

Patterns for reading text files or more efficiently formatted files from disk, patterns for reading data from databases, and patterns for reading data from network sources such as Web APIs.

Reading and writing data in text format

A feature of pandas is that it has many functions that read tabular data as DataFrame objects. Probably the most commonly used ones are read_csv and read_table.

FunctionExplanation
read_csvReads delimited data from files, URLs, and other file-based objects. The default delimiter is a comma.
read_tableReads delimited data from files, URLs, and other file-based objects. The default delimiter is tab (‘t’).
read_excelImport tabular data from Excel XLS and XLSX files.

0. Get Iris Dataframe from sklearn

First, get the iris data from sklearn.datasets as a preparation. Create a CSV file.

from sklearn.datasets import load_iris
df_iris = pd.DataFrame(iris.data, columns = iris.feature_names)
df_iris.head()

1. Output Dataframe as CSV file

Now, output the acquired iris Dataframe as a CSV file.

# Output CSV file from df
df_iris.to_csv('iris_out.csv', encoding='utf8', index=False)

2. Output Dataframe to TSV file

Since it will be used in subsequent work, it will also be output as a TSV file this time.

# Output TSV file from df
df_iris.to_csv('iris_out.tsv', encoding='utf8', index=False, sep='t')

3. Read CSV file with pandas

Now, read the CSV file (comma delimited). You can read Data frame (data frame) as a return value with read_csv of pandas. Also, check the top 5 results of Dataframe after loading.

# Read CSV file
df = pd.read_csv('iris_out.csv')
df.head()

4. Read TSV file with pandas

Try reading a TSV file (tab delimited). Read_csv of pandas can be used to read TSV files as well as CSV files. When read with read_csv, you can get a Data frame as a return value.

# Read TSV file
df_tsv = pd.read_csv('iris_out.tsv', delimiter='t')

It is also possible to read using read_table. In that case, a delimiter must be specified.

df_1 = pd.read_table('iris_out.csv', sep=',')
df_1.head()

Below are some of the options commonly given when using pandas.read_csv and pandas.read_table.

ParametersExplanation
pathA string representing a filesystem location, URL, or other file-based object.
sep or delimiterA string or regular expression used to split each line into fields.
headerRow number to use as column names. 0 (first row) by default. Specify None if there is no header row.
index_colThe number or name of the column used as the row index in the returned object. A single name/number, or a list of names/numbers for hierarchical indexes.
namesA list of column names for the returned object. use with header=None
skiprowsNumber of lines to ignore at the beginning of the file, or line numbers to skip (first line is 0).
na_valuesA set of values to replace with missing values.
commentSeparate the character or character string specified in this argument from each line as a comment.
parse_datesAttempts to read the data as a datetime. False by default, if True try to read on all columns. Specify the number or name of the column to read if you don’t want it to apply to all columns. If the elements of the list are tuples or lists, combine their multiple columns and read them as dates (e.g. used when date and time are separated into two columns)
date_parserThe function used to read the date.
nrowsThe number of lines to read at the beginning of the file.
iteratorReturns a TextFileReader object for partially reading the file.
encodingCharacter code to use as Unicode (for example, specify ‘utf-8’ for UTF-8 encoded text).
squeezeReturns a Series if the loaded data contains only one column.
thousandsThousands separator (eg ‘,’ or ‘.’).

Export data in text format

Data can also be exported in delimited format. Consider, for example, one of the CSV files we just read.

df = pd.read_csv('iris_out.csv')
df.head()

You can use the dataframe’s to_csv method to write the data to a comma-separated file.

df.to_csv('out.csv')

Importing Microsoft Excel files

Pandas also supports reading table data saved in Excel 2003 (and newer) files.

Reading is done using either the ExcelFile class or the pandas.read_excel function. These tools internally use add-on packages called xlrd for reading Excel files, and xlwt (for XLS files) and openpyxl (for XLSX files) for writing, so use pip or conda when using These add-on packages must be installed separately.

To use ExcelFile, create an instance by passing the path of the xls file or xlsx file as an argument as follows.

xlsx = pd.ExcelFile('ex1.xlsx')

This time, if the sheet name is specified as “Sheet1” as the second argument, the data in the sheet can be read as a data frame by the read_excel function.

# Case of the first row is index
df_xlsx = pd.read_excel(xlsx, 'Sheet1', index_col=0)

# Case of using the index
df_xlsx = pd.read_excel(xlsx, 'Sheet1')

# You can also read by directly specifying the file path
df_xlsx = pd.read_excel('ex1.xlsx', 'Sheet1', index_col=0)

df_xlsx.head()

To write out pandas data in Excel format, first create an ExcelWriter and then write the data using the to_excel method of the pandas object.

writer = pd.ExcelWriter('ex2.xlsx')
df_xlsx.to_excel(writer, 'Sheet1')
writer.save()

You can also write the file path to to_excel without using ExcelWriter.

df_xlsx.to_excel('ex2.xlsx')

Conclusion

This time, it was about reading CSV, TSV, Excel files with Python – pandas. Loading data is often the first step in the process of data analysis. In addition to the contents described this time, I think that there may be cases where detailed parameter settings are necessary, but I think that the basics are like this.

Please also refer to this article for steps such as filtering and aggregation after loading csv, tsv, and xlsx into dataframe with pandas.

Please also refer to here for joins between dataframes.

(Visited 77 times, 6 visits today)