Data Preprocessing

The jasmine.preprocessing module provides data preprocessing utilities.

Classes

StandardScaler([epsilon])

StandardScaler standardizes features by removing the mean and scaling to unit variance.

StandardScaler

class jasmine.preprocessing.StandardScaler(epsilon: float = 1e-08)[source]

Bases: object

StandardScaler standardizes features by removing the mean and scaling to unit variance.

copy

If True, a copy of X will be created; otherwise, it will be modified in place.

Type:

bool

with_mean

If True, center the data before scaling.

Type:

bool

with_std

If True, scale the data to unit variance.

Type:

bool

epsilon

Small value to avoid division by zero.

Type:

float

Methods

__init__([epsilon])

fit(X)

Fit the scaler to the data.

transform(X)

Transform the data using the fitted parameters.

fit_transform(X)

Fit the scaler and transform the data in one step.

inverse_transform(X)

Inverse transform the standardized data back to original scale.

Properties

is_fitted

Check if the scaler has been fitted.

__init__(epsilon: float = 1e-08)[source]
property is_fitted: bool

Check if the scaler has been fitted.

Returns:

True if fitted, False otherwise.

Return type:

bool

fit(X: Array)[source]

Fit the scaler to the data. :param X: Input features of shape (n_samples, n_features). :type X: jnp.ndarray

Returns:

Fitted scaler instance.

Return type:

self

transform(X: Array) Array[source]

Transform the data using the fitted parameters. :param X: Input features of shape (n_samples, n_features). :type X: jnp.ndarray

Returns:

Transformed features.

Return type:

jnp.ndarray

fit_transform(X: Array) Array[source]

Fit the scaler and transform the data in one step.

Parameters:

X (jnp.ndarray) – Input features of shape (n_samples, n_features).

Returns:

Transformed features.

Return type:

jnp.ndarray

inverse_transform(X: Array) Array[source]

Inverse transform the standardized data back to original scale.

Parameters:

X (jnp.ndarray) – Standardized features of shape (n_samples, n_features).

Returns:

Original scale features.

Return type:

jnp.ndarray

Examples

Basic Feature Scaling

from jasmine.preprocessing import StandardScaler
import jax.numpy as jnp

# Create sample data with different scales
X = jnp.array([[1, 100, 10000],
               [2, 200, 20000],
               [3, 300, 30000]])

# Fit and transform
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

print("Original data shape:", X.shape)
print("Scaled data mean:", jnp.mean(X_scaled, axis=0))
print("Scaled data std:", jnp.std(X_scaled, axis=0))

Preprocessing Pipeline

from jasmine.preprocessing import StandardScaler
from jasmine.linear_model import LinearRegression
from jasmine.model_selection import train_test_split

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Fit scaler on training data only
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Train model on scaled data
model = LinearRegression()
model.train(X_train_scaled, y_train)

# Evaluate
score = model.evaluate(X_test_scaled, y_test)
print(f"R² Score: {score:.4f}")

Inverse Transformation

# Transform data
X_scaled = scaler.fit_transform(X)

# Recover original data
X_recovered = scaler.inverse_transform(X_scaled)

# Verify recovery
recovery_error = jnp.mean(jnp.abs(X - X_recovered))
print(f"Recovery error: {recovery_error:.2e}")

Performance Notes

  • StandardScaler uses JIT compilation for fast transforms

  • epsilon parameter prevents division by zero for constant features

  • Scaling parameters are stored in params dictionary

  • Use is_fitted property to check if scaler has been fitted