Data Preprocessing

The jasmine.preprocessing module provides data preprocessing utilities.

Classes

StandardScaler([epsilon])

StandardScaler standardizes features by removing the mean and scaling to unit variance.

StandardScaler

class jasmine.preprocessing.StandardScaler(epsilon: float = 1e-08)[source]

Bases: object

StandardScaler standardizes features by removing the mean and scaling to unit variance.

copy

If True, a copy of X will be created; otherwise, it will be modified in place.

Type:: bool

with_mean

If True, center the data before scaling.

Type:: bool

with_std

If True, scale the data to unit variance.

Type:: bool

epsilon

Small value to avoid division by zero.

Type:: float

Methods

`__init__`([epsilon])
`fit`(X)	Fit the scaler to the data.
`transform`(X)	Transform the data using the fitted parameters.
`fit_transform`(X)	Fit the scaler and transform the data in one step.
`inverse_transform`(X)	Inverse transform the standardized data back to original scale.

Properties

is_fitted

Check if the scaler has been fitted.

__init__(epsilon: float = 1e-08)[source]

property is_fitted: bool

Check if the scaler has been fitted.

Returns:: True if fitted, False otherwise.
Return type:: bool

fit(X: Array)[source]

Fit the scaler to the data. :param X: Input features of shape (n_samples, n_features). :type X: jnp.ndarray

Returns:: Fitted scaler instance.
Return type:: self

transform(X: Array) → Array[source]

Transform the data using the fitted parameters. :param X: Input features of shape (n_samples, n_features). :type X: jnp.ndarray

Returns:: Transformed features.
Return type:: jnp.ndarray

fit_transform(X: Array) → Array[source]

Fit the scaler and transform the data in one step.

Parameters:: X (jnp.ndarray) – Input features of shape (n_samples, n_features).
Returns:: Transformed features.
Return type:: jnp.ndarray

inverse_transform(X: Array) → Array[source]

Inverse transform the standardized data back to original scale.

Parameters:: X (jnp.ndarray) – Standardized features of shape (n_samples, n_features).
Returns:: Original scale features.
Return type:: jnp.ndarray

Examples

Basic Feature Scaling

from jasmine.preprocessing import StandardScaler
import jax.numpy as jnp

# Create sample data with different scales
X = jnp.array([[1, 100, 10000],
               [2, 200, 20000],
               [3, 300, 30000]])

# Fit and transform
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

print("Original data shape:", X.shape)
print("Scaled data mean:", jnp.mean(X_scaled, axis=0))
print("Scaled data std:", jnp.std(X_scaled, axis=0))

Preprocessing Pipeline

from jasmine.preprocessing import StandardScaler
from jasmine.linear_model import LinearRegression
from jasmine.model_selection import train_test_split

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Fit scaler on training data only
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Train model on scaled data
model = LinearRegression()
model.train(X_train_scaled, y_train)

# Evaluate
score = model.evaluate(X_test_scaled, y_test)
print(f"R² Score: {score:.4f}")

Inverse Transformation

# Transform data
X_scaled = scaler.fit_transform(X)

# Recover original data
X_recovered = scaler.inverse_transform(X_scaled)

# Verify recovery
recovery_error = jnp.mean(jnp.abs(X - X_recovered))
print(f"Recovery error: {recovery_error:.2e}")

Performance Notes

StandardScaler uses JIT compilation for fast transforms
epsilon parameter prevents division by zero for constant features
Scaling parameters are stored in params dictionary
Use is_fitted property to check if scaler has been fitted