Skip to content

Commit 661eea5

Browse files
[Edit] Python: Pandas: .to_datetime() (#7037)
* [Edit] SQL: DATEDIFF() * Update datediff.md * [Edit] Python: Pandas: .to_datetime() * Update content/pandas/concepts/built-in-functions/terms/to-datetime/to-datetime.md * Update content/pandas/concepts/built-in-functions/terms/to-datetime/to-datetime.md * Update content/pandas/concepts/built-in-functions/terms/to-datetime/to-datetime.md * Update content/pandas/concepts/built-in-functions/terms/to-datetime/to-datetime.md * Update content/pandas/concepts/built-in-functions/terms/to-datetime/to-datetime.md * Update content/pandas/concepts/built-in-functions/terms/to-datetime/to-datetime.md ---------
1 parent 902dbce commit 661eea5

File tree

1 file changed

+175
-27
lines changed
  • content/pandas/concepts/built-in-functions/terms/to-datetime

1 file changed

+175
-27
lines changed
Lines changed: 175 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -1,56 +1,204 @@
11
---
22
Title: '.to_datetime()'
3-
Description: 'Returns a pandas datetime object for a given object, such as a Series or DataFrame'
3+
Description: 'Converts various date and time representations into standardized pandas datetime objects for time series analysis.'
44
Subjects:
5-
- 'Data Science'
65
- 'Computer Science'
6+
- 'Data Science'
77
Tags:
8-
- 'Date'
9-
- 'Display'
8+
- 'Data Types'
9+
- 'Functions'
10+
- 'Time'
1011
- 'Pandas'
1112
CatalogContent:
1213
- 'learn-python-3'
1314
- 'paths/data-science'
1415
---
1516

16-
The **`.to_datetime()`** function returns a pandas datetime object for a given object, often an array or dictionary-like type such as a Series or DataFrame.
17+
The **`.to_datetime()`** function in Pandas transforms various date and time representations into standardized pandas datetime objects. It serves as the primary mechanism for converting strings, integers, lists, Series, or [DataFrames](https://www.codecademy.com/resources/docs/pandas/dataframe) containing date-like information into `datetime64` objects that can be used for time series analysis and date arithmetic operations.
18+
19+
This function is essential in data preprocessing workflows where raw data contains dates in multiple formats, making temporal analysis difficult. Common use cases include converting CSV file date columns from strings to datetime objects, standardizing mixed date formats within datasets, handling Unix timestamps from APIs, parsing dates with different regional formats, and creating time series indexes for financial or scientific data analysis. The function provides robust error handling and format inference capabilities, making it indispensable for real-world data cleaning scenarios.
1720

1821
## Syntax
1922

20-
This function returns a value in datetime format. Various input arguments can be used as described below.
23+
```pseudo
24+
pandas.to_datetime(arg, errors='raise', dayfirst=False, yearfirst=False,
25+
utc=False, format=None, exact=<no_default>, unit=None,
26+
infer_datetime_format=<no_default>, origin='unix', cache=True)
27+
```
28+
29+
**Parameters:**
30+
31+
- `arg`: The object to convert to `datetime`. Can be scalar, array-like, Series, or `DataFrame`/dict-like
32+
- `errors`: How to handle parsing errors - 'raise' (default), 'coerce', or 'ignore'
33+
- `dayfirst`: Boolean, if True parses dates with day first (e.g., "31/12/2023" as Dec 31)
34+
- `yearfirst`: Boolean, if True parses dates with year first when ambiguous
35+
- `utc`: Boolean, if True returns UTC `DatetimeIndex`
36+
- `format`: String format to parse the datetime (e.g., '%Y-%m-%d')
37+
- `exact`: Boolean, if True requires exact format match
38+
- `unit`: Unit for numeric timestamps ('D', 's', 'ms', 'us', 'ns')
39+
- `infer_datetime_format`: Boolean, attempts to infer format for faster parsing (deprecated)
40+
- `origin`: Reference date for numeric values, default 'unix' (1970-01-01)
41+
- `cache`: Boolean, use cache for improved performance with duplicate values
42+
43+
**Return value:**
44+
45+
The function returns datetime-like objects depending on input type:
46+
47+
- **Scalar input**: Returns pandas Timestamp
48+
- **Array-like input**: Returns `DatetimeIndex`
49+
- **Series input**: Returns Series with `datetime64[ns]` dtype
50+
- **`DataFrame` input**: Returns Series with `datetime64[ns]` dtype from assembled columns
51+
52+
## Example 1: Basic String Conversion Using `.to_datetime()`
53+
54+
The following example demonstrates the fundamental usage of `.to_datetime()` for converting date strings into pandas `datetime` objects:
2155

2256
```py
23-
pandas.to_datetime(arg, format=None, errors='raise', dayfirst=False, yearfirst=False, utc=None, box=True, infer_datetime_format=False, origin='unix', cache=True)
57+
import pandas as pd
58+
59+
# Create a Series with various date string formats
60+
date_strings = pd.Series(['2023-01-15', '2023-02-20', '2023-03-25', '2023-04-30'])
61+
62+
# Convert string dates to datetime objects
63+
converted_dates = pd.to_datetime(date_strings)
64+
65+
print("Original strings:")
66+
print(date_strings)
67+
print("\nConverted to datetime:")
68+
print(converted_dates)
69+
print(f"\nData type: {converted_dates.dtype}")
70+
```
71+
72+
The output produced by this code will be:
73+
74+
```shell
75+
Original strings:
76+
0 2023-01-15
77+
1 2023-02-20
78+
2 2023-03-25
79+
3 2023-04-30
80+
dtype: object
81+
82+
Converted to datetime:
83+
0 2023-01-15
84+
1 2023-02-20
85+
2 2023-03-25
86+
3 2023-04-30
87+
dtype: datetime64[ns]
88+
89+
Data type: datetime64[ns]
2490
```
2591

26-
| Parameter Name | Data Type | Usage |
27-
| ----------------------- | ------------------------------------------------------------------------------ | -------------------------------------------------------------------------------------------------------------------------------- |
28-
| `arg` | int, float, str, datetime, list, tuple, 1-d array, Series, DateFrame/dict-like | Converts given data into a datetime |
29-
| `errors` | 'ignore', 'raise', 'coerce' | The given keyword determines the handling of errors |
30-
| `dayfirst` | bool (default `False`) | Specifies that the str or list-like object begins with a day |
31-
| `yearfirst` | bool (default `True`) | Specifies that the str or list-like object begins with a year |
32-
| `utc` | bool (default `None`) | When `True`, output is converted to UTC time zone |
33-
| `format` | str (default `None`) | Pass a strftime to specify the format of the datetime conversion |
34-
| `exact` | bool (default `True`) | Determines how the format parameter is applied |
35-
| `unit` | str (default 'ns') | Specifies the units of the passed object |
36-
| `infer_datetime_format` | bool (default `False`) | When `True`, and no format has been passed, the datetime string will be based on the first non-`NaN` element within the object |
37-
| `origin` | scalar (default unix) | Sets the reference date |
38-
| `cache` | bool (default `True`) | Allows the use of a unique set of converted dates to apply the conversion (only applied when object contains at least 50 values) |
92+
This example shows how `.to_datetime()` automatically recognizes standard date formats and converts them to pandas datetime objects. The resulting Series has `datetime64[ns]` dtype, enabling time-based operations and analysis.
3993

40-
## Example
94+
## Example 2: Financial Data Processing
4195

42-
The code below demonstrates the conversion of a string to a datetime object with the `.to_datetime()` function.
96+
The following example shows how to process financial data with mixed date formats and handle missing values, a common scenario in real-world datasets:
4397

4498
```py
4599
import pandas as pd
46-
my_list = ['11/09/30']
100+
import numpy as np
101+
102+
# Create a DataFrame simulating financial data with mixed date formats
103+
financial_data = pd.DataFrame({
104+
'trade_date': ['2023-01-15', '15/02/2023', '03-25-2023', 'invalid_date', '2023-04-30'],
105+
'stock_price': [150.25, 155.80, 148.90, 152.10, 159.75],
106+
'volume': [1000000, 1200000, 950000, 1100000, 1300000]
107+
})
108+
109+
# Convert dates with error handling for invalid entries
110+
financial_data['trade_date'] = pd.to_datetime(
111+
financial_data['trade_date'],
112+
errors='coerce', # Convert invalid dates to NaT
113+
dayfirst=False # Assume month comes first in ambiguous dates
114+
)
115+
116+
# Display the processed data
117+
print("Financial data with processed dates:")
118+
print(financial_data)
119+
120+
# Check for any missing dates after conversion
121+
missing_dates = financial_data['trade_date'].isna().sum()
122+
print(f"\nNumber of invalid dates converted to NaT: {missing_dates}")
47123

48-
xyz = pd.to_datetime(my_list, dayfirst=True)
49-
print(xyz)
124+
# Filter out rows with invalid dates for analysis
125+
clean_data = financial_data.dropna(subset=['trade_date'])
126+
print(f"\nClean data shape: {clean_data.shape}")
50127
```
51128

52-
This example results in the following output::
129+
The output of this code is:
53130

54131
```shell
55-
DatetimeIndex(['2030-11-09'], dtype='datetime64[ns]', freq=None)
132+
Financial data with processed dates:
133+
trade_date stock_price volume
134+
0 2023-01-15 150.25 1000000
135+
1 2023-02-15 155.80 1200000
136+
2 2023-03-25 148.90 950000
137+
3 NaT 152.10 1100000
138+
4 2023-04-30 159.75 1300000
139+
140+
Number of invalid dates converted to NaT: 1
141+
142+
Clean data shape: (4, 3)
143+
```
144+
145+
This example demonstrates handling real-world financial data where dates might be in different formats or contain invalid entries. Using `errors='coerce'` converts unparseable dates to NaT (Not a Time), allowing the analysis to continue with valid data.
146+
147+
## Codebyte Example: Sensor Data Time Series Analysis
148+
149+
The following example processes sensor data with Unix timestamps and demonstrates creating a time series index for scientific data analysis:
150+
151+
```codebyte/python
152+
import pandas as pd
153+
import numpy as np
154+
155+
# Create sensor data with Unix timestamps (seconds since 1970-01-01)
156+
sensor_timestamps = [1672531200, 1672534800, 1672538400, 1672542000, 1672545600] # Hourly readings
157+
temperature_readings = [23.5, 24.1, 23.8, 24.3, 24.7]
158+
humidity_readings = [45.2, 46.8, 44.9, 47.1, 48.3]
159+
160+
# Create DataFrame with sensor data
161+
sensor_data = pd.DataFrame({
162+
'timestamp': sensor_timestamps,
163+
'temperature_c': temperature_readings,
164+
'humidity_percent': humidity_readings
165+
})
166+
167+
# Convert Unix timestamps to datetime objects
168+
sensor_data['datetime'] = pd.to_datetime(
169+
sensor_data['timestamp'],
170+
unit='s' # Specify that timestamps are in seconds
171+
)
172+
173+
# Set datetime as index for time series analysis
174+
sensor_data.set_index('datetime', inplace=True)
175+
176+
# Drop the original timestamp column
177+
sensor_data.drop('timestamp', axis=1, inplace=True)
178+
179+
print("Processed sensor data with datetime index:")
180+
print(sensor_data)
181+
182+
# Demonstrate time series capabilities
183+
print(f"\nData collection period: {sensor_data.index[0]} to {sensor_data.index[-1]}")
184+
print(f"Average temperature: {sensor_data['temperature_c'].mean():.1f}°C")
185+
186+
# Resample data (example: if we had more data points)
187+
print(f"\nTime series index frequency: {sensor_data.index.freq}")
56188
```
189+
190+
This example shows how to process sensor data with Unix timestamps, which is common in IoT applications and scientific data collection. Converting timestamps to datetime objects and using them as an index enables powerful time series analysis capabilities in pandas.
191+
192+
## Frequently Asked Questions
193+
194+
### 1. Can I convert multiple date columns at once?
195+
196+
Yes, you can apply `to_datetime()` to multiple columns using `apply()` or process each column individually. For DataFrames with separate year, month, day columns, pass the DataFrame directly to `to_datetime()` and it will automatically assemble the datetime from the columns.
197+
198+
### 2. How do I handle dates before 1677 or after 2262?
199+
200+
Pandas `datetime64[ns]` has limitations for dates outside this range. For such dates, pandas will return Python datetime objects instead of Timestamp objects, which may have reduced functionality for time series operations.
201+
202+
### 3. Can I specify custom origins for Unix timestamps?
203+
204+
Yes, use the `origin` parameter to set a custom reference date. For example, `origin='2000-01-01'` will interpret numeric values as time units from that date instead of the Unix epoch.

0 commit comments

Comments
 (0)