Analysis Model Development
AI Analysis Model Selection: GRU
Recurrent Neural Networks (RNNs) and LSTMs
Recurrent Neural Networks (RNNs) are highly effective for sequential data processing due to their ability to capture temporal dependencies. However, traditional RNNs face limitations in retaining long-term dependencies, often suffering from the vanishing gradient problem.
To address this, Long Short-Term Memory (LSTM) networks were developed, introducing memory cells and gates that effectively manage long-term dependencies. LSTMs have since become a standard for sequential data tasks due to their superior performance in handling complex time-series problems.
Introduction to GRU
Gated Recurrent Units (GRUs), introduced in 2014 by Cho et al., are a simplified variant of LSTMs. Unlike LSTMs, GRUs use fewer gates (reset and update gates instead of three) and maintain only a single hidden state, reducing computational complexity while delivering comparable or even superior performance in certain scenarios.
The structure of a GRU allows it to:
- Handle sequential dependencies without requiring excessive computational resources.
- Achieve faster training convergence compared to LSTMs.
Why GRU for This Project?
Several factors influenced the choice of GRU as the preferred model:
- Cyclic Nature of the Data:
- The melting process exhibits cyclic patterns (e.g., temperature and stirring speed fluctuating approximately every 10 intervals). GRUs effectively capture these medium-term dependencies without needing the extended memory capabilities of LSTMs.
- Simplified Architecture:
- GRUs are computationally lighter than LSTMs, making them more efficient for a dataset of this size (835,200 observations) while maintaining comparable accuracy.
- Empirical Performance:
- Both GRU and LSTM models were tested during the study. The GRU demonstrated slightly better performance or similar outcomes compared to the LSTM, aligning well with the dataset's characteristics.
- GRU with 3 layers outperformed shallower or deeper GRU and LSTM networks.
- Applicability to Manufacturing Data:
- Prior studies suggest GRUs outperform LSTMs in scenarios with limited features or relatively simple sequential patterns, both of which apply to this problem.
Model Architecture
The architecture of the GRU model was designed to balance complexity and efficiency, ensuring the model could effectively learn patterns from the sequential data while minimizing computational overhead. Each component of the architecture serves a specific purpose, detailed below:
- Input Layer
- Input Shape: The input to the model is a 3D tensor with the shape (number of samples, 10, 15), where:
- number of samples is the total number of sliding windows created from the dataset.
- 10 represents the window size (i.e., the number of past time steps used to predict the current quality label).
- 15 is the number of features per time step (e.g., temperature, stirring speed, etc.).
- This input format allows the GRU to process temporal dependencies and relationships across the 10 most recent observations for each sample.
- Input Shape: The input to the model is a 3D tensor with the shape (number of samples, 10, 15), where:
- GRU Layers
- First GRU Layer:
- Contains 64 hidden units to capture broad temporal patterns.
- The return_sequences=True parameter ensures that outputs from all time steps are passed to the next layer, preserving the full temporal resolution.
- Second GRU Layer:
- Contains 32 hidden units, gradually reducing the dimensionality to focus on more refined temporal features.
- Again, return_sequences=True is used to pass outputs from all time steps to the subsequent layer.
- Third GRU Layer:
- Contains 16 hidden units to extract the most specific and high-level patterns from the sequential data.
- Here, return_sequences=False is used, meaning only the output of the final time step (representing the entire sequence's context) is passed to the next layer.
- Better gradient flow compared to sigmoid activation, particularly in recurrent models.
- The ability to handle both positive and negative values, enabling the GRU to learn richer and more nuanced representations of temporal patterns.
- First GRU Layer:
- Each GRU layer uses the hyperbolic tangent (tanh) activation function, which maps input values to the range [-1, 1]. The advantages of using tanh include:
- The model uses three stacked GRU layers to capture temporal dependencies and hierarchical features:
- Dense Output Layer
- Purpose:
- The dense layer maps the final GRU output (a single vector from the last GRU layer) to a probability score between 0 and 1.
- This score represents the likelihood of the product being "OK" (label 1) or "NG" (label 0).
- Activation Function:
- A sigmoid activation function is used in the dense layer, as it is well-suited for binary classification tasks.
- The sigmoid function outputs values in the range [0, 1], directly interpretable as probabilities.
- Purpose:
- Regularization and Callbacks
- Early Stopping:
- Early stopping monitors the validation loss during training and halts the process when performance stops improving. This prevents overfitting, especially given the relatively small validation dataset (21% of the original data).
- Model Checkpoint:
- The ModelCheckpoint callback saves the model weights whenever validation performance improves, ensuring the best-performing model is retained for evaluation.
- Early Stopping:
- Optimizer
- Choice of Optimizer:
- The Adam optimizer was selected for its adaptive learning rate capabilities, which dynamically adjust step sizes based on gradient updates. This ensures faster convergence and avoids local minima.
- Default parameters for Adam (learning rate = 0.001) were used, as they performed well during preliminary tests.
- Choice of Optimizer:
- Summary of Final ArchitectureLayer Details Output Shape
Input Layer Input shape (10, 15) (None, 10, 15) GRU Layer 1 64 units, tanh, return_sequences=True (None, 10, 64) GRU Layer 2 32 units, tanh, return_sequences=True (None, 10, 32) GRU Layer 3 16 units, tanh, return_sequences=False (None, 16) Dense Layer (Output) Sigmoid activation (None, 1) - The architecture can be summarized as follows:
Why This Architecture Was Effective
- Depth and Complexity:
- The three-layer GRU design balances depth with computational efficiency, ensuring the model can capture both short-term and medium-term dependencies in the data.
- Temporal Focus:
- The decreasing number of hidden units per layer ensures that the model progressively filters out irrelevant patterns, focusing on the most significant features.
- Binary Classification:
- The final dense layer with a sigmoid activation function aligns perfectly with the binary nature of the target variable, providing interpretable probability scores.
- Effective Use of Activation Functions:
- The tanh activation in GRU layers enhances the model's ability to process complex temporal dependencies, while the sigmoid activation in the output layer ensures accurate probability mapping for the binary classification task.
- Regularization:
- Early stopping and checkpoints prevent overfitting, ensuring the model generalizes well to unseen data.
Key Takeaway:
The final architecture leverages GRU's strengths in sequential modeling, combined with carefully chosen activation functions and regularization techniques, to deliver a powerful predictive model tailored to the melting process.
Model Training
- Training Process:
- The GRU model was trained on the preprocessed and normalized data (X_train_over and y_train_over) using a batch size of 128 and a maximum of 200 epochs.
- Early stopping typically concluded training after approximately 20 epochs, as the validation loss stabilized.
- Performance Tracking:
- During training, metrics such as loss (binary cross-entropy) and accuracy were monitored for both training and validation datasets.
Key Takeaway:
The GRU-based architecture effectively captured the sequential and cyclical patterns of the melting process while minimizing computational complexity. Its layered structure ensured robust feature extraction, leading to accurate predictions of product quality.
'AI > Project' 카테고리의 다른 글
2nd K-AI Manufacturing Competition (1) (2) | 2024.12.25 |
---|---|
2nd K-AI Manufacturing Competition (0) (2) | 2024.12.24 |
Marketing Strategy Proposal with Instacart Data Analysis (0) | 2024.12.23 |