Data Preprocessing
The jasmine.preprocessing module provides data preprocessing utilities.
Classes
|
StandardScaler standardizes features by removing the mean and scaling to unit variance. |
StandardScaler
- class jasmine.preprocessing.StandardScaler(epsilon: float = 1e-08)[source]
Bases:
objectStandardScaler standardizes features by removing the mean and scaling to unit variance.
Methods
__init__([epsilon])fit(X)Fit the scaler to the data.
transform(X)Transform the data using the fitted parameters.
Fit the scaler and transform the data in one step.
Inverse transform the standardized data back to original scale.
Properties
Check if the scaler has been fitted.
- property is_fitted: bool
Check if the scaler has been fitted.
- Returns:
True if fitted, False otherwise.
- Return type:
- fit(X: Array)[source]
Fit the scaler to the data. :param X: Input features of shape (n_samples, n_features). :type X: jnp.ndarray
- Returns:
Fitted scaler instance.
- Return type:
self
- transform(X: Array) Array[source]
Transform the data using the fitted parameters. :param X: Input features of shape (n_samples, n_features). :type X: jnp.ndarray
- Returns:
Transformed features.
- Return type:
jnp.ndarray
Examples
Basic Feature Scaling
from jasmine.preprocessing import StandardScaler
import jax.numpy as jnp
# Create sample data with different scales
X = jnp.array([[1, 100, 10000],
[2, 200, 20000],
[3, 300, 30000]])
# Fit and transform
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
print("Original data shape:", X.shape)
print("Scaled data mean:", jnp.mean(X_scaled, axis=0))
print("Scaled data std:", jnp.std(X_scaled, axis=0))
Preprocessing Pipeline
from jasmine.preprocessing import StandardScaler
from jasmine.linear_model import LinearRegression
from jasmine.model_selection import train_test_split
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
# Fit scaler on training data only
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
# Train model on scaled data
model = LinearRegression()
model.train(X_train_scaled, y_train)
# Evaluate
score = model.evaluate(X_test_scaled, y_test)
print(f"R² Score: {score:.4f}")
Inverse Transformation
# Transform data
X_scaled = scaler.fit_transform(X)
# Recover original data
X_recovered = scaler.inverse_transform(X_scaled)
# Verify recovery
recovery_error = jnp.mean(jnp.abs(X - X_recovered))
print(f"Recovery error: {recovery_error:.2e}")
Performance Notes
StandardScaler uses JIT compilation for fast transforms
epsilonparameter prevents division by zero for constant featuresScaling parameters are stored in
paramsdictionaryUse
is_fittedproperty to check if scaler has been fitted