Cherreads

Chapter 5 - 2

 the first batch

(Q1–Q5):

 

Q1. Explain the concept of analytics, its importance in

today's business environment, and discuss how it aids in strategic

decision-making with relevant examples.

 

Answer:

Analytics refers to the systematic analysis of data to uncover

meaningful patterns and insights that support decision-making. It plays a

central role in helping organizations understand past performance, predict

future outcomes, and make informed decisions. There are three primary types of

analytics: descriptive (what happened), predictive (what will happen), and

prescriptive (what to do next).

 

Importance:

In the digital age, companies generate vast amounts of data

through customer interactions, operational systems, and third-party sources.

Analytics enables organizations to derive value from this data. For example:

 

* Retailers use customer data to forecast demand and

personalize marketing.

* Financial institutions detect fraud through anomaly

detection.

* Healthcare providers predict patient readmission risks.

 

Strategic Decision-Making:

Analytics supports strategy by revealing trends, identifying

risks and opportunities, and testing hypotheses. For example, Starbucks uses

location analytics to determine store placement, and Netflix uses predictive

analytics to recommend content, enhancing user satisfaction.

 

Q2. Analyze the major challenges faced by organizations in

implementing analytics solutions and suggest approaches to overcome these

challenges.

 

Answer:

Challenges include:

 

1. Data Quality Issues: Inaccurate, inconsistent, or

incomplete data reduces model effectiveness.

2. Skills Gap: Lack of data science professionals and domain

experts.

3. Organizational Resistance: Employees may resist

data-driven changes.

4. Integration with Legacy Systems: Old IT infrastructure

often lacks compatibility with modern tools.

5. Cost and ROI Uncertainty: High upfront costs and unclear

long-term returns.

6. Ethical and Privacy Concerns: Compliance with regulations

like GDPR.

 

Approaches:

 

* Establish robust data governance and data quality

programs.

* Invest in employee training and attract skilled

professionals.

* Promote a culture of data-driven decision-making.

* Use cloud-based, scalable platforms to reduce

infrastructure burden.

* Start with small, high-impact projects to demonstrate ROI.

* Maintain transparency and follow ethical guidelines when

handling data.

 

Q3. Discuss the various applications of predictive analytics

across different industry sectors. Provide detailed case studies to illustrate

successful implementations.

 

Answer:

Applications:

 

* Retail: Demand forecasting, customer segmentation.

* Finance: Credit scoring, fraud detection.

* Healthcare: Disease prediction, hospital readmission.

* Manufacturing: Predictive maintenance, quality assurance.

* Telecom: Customer churn prediction.

 

Case Studies:

 

1. Target Corporation: Used predictive analytics to identify

pregnant customers based on purchase behavior, enabling personalized marketing

and early customer retention.

2. General Electric (GE): Integrated sensors on jet engines

for predictive maintenance, reducing downtime and saving millions.

3. Netflix: Uses viewing patterns to recommend shows,

improving viewer retention and engagement.

 

Q4. Demonstrate the evolution of analytics as a field and

explain why it has gained such popularity among businesses in the past decade.

Include industry experts' perspectives in your answer.

 

Answer:

Evolution:

 

* 1990s: Descriptive analytics using spreadsheets and basic

BI tools.

* 2000s: Diagnostic analytics with OLAP and data mining.

* 2010s: Predictive analytics using machine learning.

* 2020s: Prescriptive analytics with AI and automation.

 

Reasons for Growth:

 

* Explosion of big data and IoT.

* Affordable cloud computing.

* Advances in ML/AI algorithms.

* Competitive pressure and demand for innovation.

 

Industry Perspectives:

 

* Thomas Davenport emphasized "competing on analytics" as a

business imperative.

* Gartner identified data and analytics as top technology

priorities for CEOs.

 

Examples:

 

* Uber uses real-time predictive analytics for dynamic

pricing and routing.

* Amazon leverages data to optimize supply chain and product

recommendations.

 

Q5. Explain with a detailed analysis of a business case

where predictive analytics transformed operations. Discuss the problems and

approach.

 

Answer:

Case Study: UPS (United Parcel Service)

 

Problem:

UPS faced issues with inefficient routes, high fuel

consumption, and unexpected vehicle breakdowns.

 

Approach:

 

* Implemented predictive analytics using telematics sensors

in trucks.

* Built models to forecast mechanical issues (predictive

maintenance).

* Optimized delivery routes using ORION (On-Road Integrated

Optimization and Navigation).

 

Results:

 

* Saved millions of gallons of fuel.

* Reduced CO2 emissions and improved delivery punctuality.

* Enhanced operational efficiency and customer satisfaction.

 

Predictive analytics empowered UPS to transition from

reactive to proactive operations.

 

Next: I will send Q6–Q10 in the following message.

 

Here is the next batch of full answers: Questions 6–10 from

the CSE3036 – Predictive Analytics question bank.

 

Q6. Explain the ethical considerations and privacy concerns

related to the use of predictive analytics. How can organizations address these

issues?

 

Answer:

Ethical Considerations:

 

* Bias and Discrimination: Algorithms may inherit historical

bias in data, leading to unfair treatment of certain groups. For example,

predictive hiring tools may unintentionally discriminate against minorities.

* Lack of Transparency: Complex models like deep learning

may be difficult to interpret, making it unclear why decisions are made.

* Manipulation: Predictive models can be used to nudge users

unethically (e.g., manipulating purchase decisions).

* Consent and Autonomy: Users may not be aware that their

data is being used to predict or influence their behavior.

 

Privacy Concerns:

 

* Data Collection: Massive data collection from mobile apps,

sensors, and social platforms can infringe on individual privacy.

* Data Sharing: Sharing customer data with third parties can

lead to misuse or unauthorized access.

* Data Breaches: Predictive systems are vulnerable to

cyberattacks, risking exposure of sensitive information.

* Re-identification: Even anonymized datasets can sometimes

be reverse-engineered to identify individuals.

 

Solutions:

 

* Implement strong data governance and compliance with

privacy regulations like GDPR and HIPAA.

* Adopt transparent models and use Explainable AI (XAI) to

improve understanding.

* Limit data collection to only what is necessary (data

minimization).

* Regularly audit models for fairness and bias.

* Clearly inform users about data usage and obtain consent.

 

Q7. Compare and contrast the different types of analytics

with examples of how each contributes to business value.

 

Answer:

There are four main types of analytics:

 

1. 

Descriptive Analytics:

 

* What happened?

* Uses historical data to summarize outcomes.

* Example: Monthly sales report.

* Business Value: Offers situational awareness and

reporting.

 

2. 

Diagnostic Analytics:

 

* Why did it happen?

* Investigates the root cause of past outcomes.

* Techniques: Drill-down, correlation analysis.

* Example: Analyzing why sales dropped in a region.

* Business Value: Helps understand underlying problems.

 

3. 

Predictive Analytics:

 

* What is likely to happen?

* Uses statistical models and ML to forecast future events.

* Example: Predicting customer churn.

* Business Value: Enables proactive decision-making.

 

4. 

Prescriptive Analytics:

 

* What should be done?

* Recommends actions using optimization and simulation.

* Example: Suggesting the best combination of products for

cross-selling.

* Business Value: Guides strategic actions for maximum

impact.

 

Comparison:

 

| Type |

Timeframe | Key Question |

Example |

| ------------ | --------- | -------------------- |

------------------------- |

| Descriptive |

Past | What happened? | Revenue report |

| Diagnostic |

Past | Why did it happen? | Regional performance drop |

| Predictive |

Future | What will happen? | Sales forecast |

| Prescriptive | Future 

| What should be done? | Price optimization |

 

Q8. Discuss the role of predictive analytics in customer

relationship management. Illustrate with relevant case studies.

 

Answer:

Predictive analytics enhances Customer Relationship

Management (CRM) by forecasting customer behavior and enabling personalized

interactions.

 

Applications:

 

* Customer Segmentation: Groups customers by behavior or

demographics.

* Churn Prediction: Identifies customers likely to leave.

* Personalized Recommendations: Suggests products tailored

to preferences.

* Customer Lifetime Value (CLV): Estimates long-term value

of a customer.

* Upselling/Cross-selling: Identifies opportunities to sell

related products.

 

Case Study 1: Amazon

Amazon uses predictive models to recommend products based on

browsing and purchase history, increasing engagement and repeat purchases.

 

Case Study 2: Salesforce Einstein

Einstein AI in Salesforce predicts which leads are more

likely to convert and recommends actions to improve success.

 

Case Study 3: Vodafone

Implemented a churn prediction model that accurately

identified high-risk customers and offered retention deals, reducing churn by

20%.

 

Q9. Evaluate the current state of predictive analytics

adoption across industries. What factors contribute to successful

implementation?

 

Answer:

Current State:

 

* Widespread adoption across industries such as retail,

banking, healthcare, telecom, and logistics.

* Growth in investment in AI and analytics platforms.

* Organizations moving from descriptive to predictive and

prescriptive models.

 

Success Factors:

 

1. Clear Business Objective: Predictive models aligned with

strategic goals perform better.

2. High-Quality Data: Reliable, clean, and integrated data

is essential.

3. Skilled Workforce: Data scientists, analysts, and domain

experts are key.

4. Organizational Culture: Support for data-driven

decision-making from leadership.

5. Scalable Infrastructure: Cloud computing and big data

tools support real-time analytics.

6. Robust Governance: Ensures data security, ethics, and

compliance.

 

Example: Netflix's success with personalized recommendations

and content investment decisions is due to their data-first culture and

analytics capabilities.

 

Q10. Analyze how predictive analytics can provide

competitive advantage in today's data-driven market environment with

appropriate examples.

 

Answer:

Predictive analytics offers a competitive edge by enabling

organizations to:

 

* Anticipate customer needs and improve personalization.

* Reduce costs through optimized operations.

* Detect and prevent risks like fraud or churn.

* Innovate by identifying new trends early.

 

Examples:

 

1. Netflix: Uses predictive models for content

recommendation and greenlighting new shows, increasing viewer satisfaction and

retention.

2. Uber: Predictive analytics informs surge pricing, driver

allocation, and estimated time of arrival.

3. Walmart: Forecasts demand to optimize inventory, reducing

waste and increasing shelf availability.

4. American Express: Predicts customer churn and offers

timely retention offers.

 

By turning data into foresight, companies outperform those

relying only on intuition or historical reporting.

 

 

Next: I'll continue with Questions 11–15 in the following

message.

Here are the full answers for Questions 11 to 20 from the

CSE3036 – Predictive Analytics question bank.

 

Q11. Explain the concept of analytics, its importance in

today's business environment, and discuss how it aids in strategic

decision-making with relevant examples.

 

Answer:

This question is a repetition of Q1. As already explained,

analytics helps in transforming data into actionable insights through

descriptive, predictive, and prescriptive techniques. Its importance lies in

improving operational efficiency, enhancing customer satisfaction, and

fostering innovation.

 

Example: Domino's Pizza uses analytics to forecast order

demand and staff scheduling, improving delivery speed and reducing food waste.

 

Q12. Demonstrate the evolution of analytics as a field and

explain why it has gained such popularity among businesses in the past decade.

 

Answer:

Analytics has evolved through four stages:

 

* Descriptive Analytics: Summarizes historical data.

* Diagnostic Analytics: Explores reasons behind outcomes.

* Predictive Analytics: Forecasts future trends.

* Prescriptive Analytics: Recommends optimal actions.

 

In recent years, analytics has surged in popularity due to:

 

* Availability of big data.

* Advancements in computing power.

* Machine learning breakthroughs.

* Demand for competitive advantage.

 

Example: Spotify's use of analytics to recommend music and

personalize playlists has transformed user engagement.

 

Q13. Explain the concept of propensity models in detail,

discussing their types, applications, and methodology of development with

appropriate examples.

 

Answer:

Propensity models estimate the likelihood of a customer

performing a certain action, such as buying, leaving, or subscribing.

 

Types:

 

* Propensity to Buy: Who is likely to purchase.

* Propensity to Churn: Who is likely to leave.

* Propensity to Upsell: Who is likely to buy more.

 

Development Methodology:

 

1. Collect customer data.

2. Select target behavior.

3. Preprocess data and engineer features.

4. Train a classification model (e.g., logistic regression).

5. Validate and interpret scores.

 

Example: An insurance company uses a propensity model to

predict which customers are likely to renew their policies and targets

high-risk individuals with incentives.

 

Q14. Analyze the working principles of collaborative

filtering systems. Compare and contrast user-based and item-based collaborative

filtering approaches with examples.

 

Answer:

Collaborative filtering (CF) recommends products based on

similarities in user behavior or item consumption.

 

1. 

User-based CF:

 

* Recommends items liked by similar users.

* Example: If User A and User B have similar preferences,

recommend what B liked to A.

 

2. 

Item-based CF:

 

* Recommends items similar to those the user liked.

* Example: If A liked Item X and Item Y is similar,

recommend Y.

 

Comparison:

 

| Aspect |

User-Based CF |

Item-Based CF |

| ----------- | ----------------------------------- |

------------------------------------- |

| Scalability | Lower | Higher |

| Stability | Less

stable (user behavior changes) | More stable (items don't change much) |

| Example |

MovieLens system |

Amazon "also bought" |

 

Both techniques suffer from the cold start problem for new

users or items.

 

Q15. Discuss the complete process of cluster modeling,

including different algorithms, evaluation methods, and business applications.

Illustrate with a case study.

 

Answer:

Clustering is an unsupervised learning technique that groups

similar data points.

 

Process:

 

1. Data Preprocessing: Normalize and clean data.

2. Select Algorithm:

 

 * K-Means:

Partitions data into k clusters.

 * Hierarchical:

Builds nested clusters.

 * DBSCAN:

Identifies arbitrary-shaped clusters.

3. Evaluate:

 

 * Silhouette Score.

 * Davies-Bouldin

Index.

4. Interpret Results:

 

· 

Assign business meaning to each cluster.

 

Applications:

 

* Customer segmentation.

* Fraud detection.

* Market basket analysis.

 

Case Study: A telecom company uses K-means to segment users

into budget users, premium users, and business users, enabling personalized

plans and reducing churn.

 

Q16. Explain univariate and multivariate statistical

analysis techniques. Compare their methodologies, applications, and limitations

in the context of predictive analytics.

 

Answer:

Univariate Analysis:

 

* Involves one variable.

* Describes distribution, central tendency, dispersion.

* Example: Analyzing average income.

 

Multivariate Analysis:

 

* Involves multiple variables.

* Explores relationships (e.g., correlation, regression).

* Example: Predicting income based on age, education, and

experience.

 

Comparison:

 

| Aspect |

Univariate |

Multivariate 

| ---------- | ----------------------- |

----------------------------------------------- |

| Variables |

One | Two or

more 

| Output |

Descriptive |

Predictive/Explanatory 

| Techniques | Histograms, mean, SD | Regression, factor analysis, MANOVA |

| Limitation | No interaction insights | Assumes

independence, risk of multicollinearity |

 

In predictive analytics, multivariate models provide deeper

insights but require careful validation.

 

Q17. Critically evaluate the limitations of predictive

modeling and discuss strategies to overcome these limitations in practical

business scenarios.

 

Answer:

Limitations:

 

* Overfitting: Model performs well on training data but

poorly on new data.

* Underfitting: Model too simple to capture data patterns.

* Data Quality: Garbage in, garbage out.

* Model Bias: May reflect historical discrimination.

* Lack of Explainability: Complex models like neural

networks are black boxes.

 

Strategies:

 

* Cross-validation to detect overfitting.

* Regularization (Lasso, Ridge) to simplify models.

* Clean and enrich data sources.

* Use explainable AI tools like SHAP and LIME.

* Monitor and update models regularly.

 

Example: A retail chain retrains its demand forecasting

model quarterly to adjust for seasonal shifts and market dynamics.

 

Q18. Describe the statistical foundations necessary for

effective predictive analytics. Include discussions on probability,

distributions, and hypothesis testing.

 

Answer:

 

1. 

Probability:

 

* Foundation for classification models (e.g., Naïve Bayes).

* Concepts include conditional, joint, and marginal

probabilities.

 

2. 

Statistical Distributions:

 

* Normal distribution: Common in regression and control

charts.

* Poisson: Used in queuing and event count modeling.

* Binomial: Applicable to binary outcomes.

 

3. 

Hypothesis Testing:

 

* Tests claims using data evidence.

* Common tests: t-test, chi-square, ANOVA.

* Example: A/B testing for campaign effectiveness.

 

These tools support model selection, validation, and

understanding of relationships between variables.

 

Q19. Analyze how predictive modeling techniques are applied

in customer segmentation. Discuss the methodology, variable selection, and

business implications with examples.

 

Answer:

Methodology:

 

1. Data Collection: Demographics, behavior, transactions.

2. Feature Selection: RFM (Recency, Frequency, Monetary),

geography, product use.

3. Choose Model: K-means, DBSCAN for clustering.

4. Evaluate and Label Clusters.

5. Use segments for targeting.

 

Business Implications:

 

* Personalized marketing.

* Improved resource allocation.

* Better customer retention.

 

Example: An online retailer segments customers into VIPs,

discount-seekers, and window shoppers. Tailored emails to each group increase

conversion rates.

 

Q20. Explain the mathematical principles behind various

statistical techniques used in predictive analytics and their practical

implementations.

 

Answer:

 

1. 

Linear Regression:

 

* Models relationship as: Y = β₀ + β₁X + ε

* Minimizes sum of squared residuals.

 

2. 

Logistic Regression:

 

· 

Models probability: P(Y=1) = 1 / (1 + e^-(β₀ +

β₁X))

 

3. 

Decision Trees:

 

· 

Splits data using metrics like entropy or Gini

index.

 

4. 

K-Means:

 

· 

Minimizes within-cluster sum of squares.

 

5. 

PCA (Principal Component Analysis):

 

· 

Reduces dimensionality by transforming variables

into orthogonal components.

 

Practical Use:

 

* Regression in forecasting.

* Logistic models for classification.

* Decision trees in credit scoring.

 

These techniques are implemented in Python libraries

(scikit-learn, statsmodels) and used across industries for predictive tasks.

 

 

Next: I will continue with full answers for Questions 21–30

in the next message.

 

Here are the full answers for Questions 21 to 35 from the

CSE3036 – Predictive Analytics question bank:

 

Q21. Discuss the importance of feature engineering and

variable selection in multivariate analysis. How do these processes impact

model performance?

 

Answer:

Feature engineering transforms raw data into meaningful

input variables for modeling. It includes:

 

* Encoding categorical variables.

* Creating interaction features (e.g., age × income).

* Handling missing values.

* Scaling and normalizing features.

 

Variable selection helps reduce dimensionality and improves

model accuracy and interpretability.

 

Techniques:

 

* Filter: Correlation, mutual information.

* Wrapper: Recursive feature elimination.

* Embedded: LASSO, Ridge.

 

Impact:

 

* Reduces overfitting.

* Speeds up training.

* Enhances interpretability.

* Improves predictive performance.

 

Example: In a loan default model, "debt-to-income ratio" may

be a more informative feature than income alone.

 

Q22. Critically evaluate the limitations of predictive

modeling and discuss strategies to overcome these limitations in practical

business scenarios.

 

Answer:

(Already covered in Q17. Refer back for detailed

explanation.)

 

Q23. Explain the mathematical principles behind various

statistical techniques used in predictive analytics and their practical

implementations.

 

Answer:

(Already covered in Q20. See above.)

 

Q24. Critically evaluate linear regression and its variants

(ridge, lasso, elastic net). Discuss their mathematical foundations and

applications with examples.

 

Answer:

Linear Regression:

Y = β₀ + β₁X₁ + … + βnXn + ε. It minimizes squared errors.

Assumes linearity and homoscedasticity.

 

Ridge Regression (L2):

Penalizes sum of squares of coefficients: λΣβ². Shrinks

coefficients to prevent overfitting.

 

Lasso Regression (L1):

Penalizes sum of absolute coefficients: λΣ|β|. Performs

feature selection.

 

Elastic Net:

Combination of L1 and L2. Useful when predictors are

correlated.

 

Applications:

 

* Ridge: Multicollinear data.

* Lasso: Sparse feature sets.

* Elastic Net: High-dimensional data (e.g., genomics).

 

Q25. Demonstrate a comprehensive analysis of non-linear

regression models including polynomial regression.

 

Answer:

Non-linear models capture curved relationships.

 

Polynomial Regression:

Y = β₀ + β₁X + β₂X² + … + βnXⁿ + ε.

 

* Allows modeling of U-shaped curves.

* Risk: Overfitting with high-degree polynomials.

 

Other Non-linear Forms:

 

* Exponential: Y = ae^(bX)

* Logarithmic: Y = a + b ln(X)

* Power: Y = aX^b

 

Applications:

 

* Polynomial: Pricing models.

* Exponential: Growth modeling.

* Logarithmic: Learning curves.

 

Q26. Discuss classification performance metrics beyond

accuracy. Explain their significance in imbalanced datasets with appropriate

examples.

 

Answer:

Accuracy is misleading when class distribution is skewed.

 

Better metrics:

 

* Precision = TP / (TP + FP): Low FP.

* Recall = TP / (TP + FN): Low FN.

* F1 Score = 2PR / (P + R): Balance of precision and recall.

* AUC-ROC: Measures classifier's ability to rank

predictions.

 

Example: In spam detection, high recall ensures most spam is

caught, while high precision avoids misclassifying legitimate emails.

 

Q27. Compare and contrast different classification

algorithms including logistic regression and decision trees. Analyze their

strengths and weaknesses.

 

Answer:

Logistic Regression:

 

* Linear decision boundary.

* Simple, interpretable.

* Poor with complex data.

 

Decision Trees:

 

* Non-linear splits.

* Easy to visualize.

* Overfits on small data.

 

Others:

 

* SVM: Great in high dimensions.

* k-NN: Simple, needs scaling.

* Naïve Bayes: Good with text; assumes feature independence.

 

Comparison Table:

 

| Algorithm |

Interpretability | Performance |

Complexity |

| ------------- | ---------------- | -------------------- |

---------- |

| Logistic |

High | Medium | Low |

| Decision Tree | High | High risk of overfit |

Medium |

| SVM |

Low | High | High |

 

Q28. Compare and contrast supervised and unsupervised

learning methods in detail, providing examples of algorithms from each category

and their appropriate applications.

 

Answer:

 

| Type |

Supervised |

Unsupervised 

 

 

 

| Input |

Features + Labels | Features

only |

| Goal | Predict

target | Discover patterns |

| Algorithms | Regression, SVM, Trees | K-Means, PCA, DBSCAN |

| Use Cases | Churn

prediction, sales | Customer segmentation, anomaly detection |

 

Examples:

 

* Supervised: Predicting if a customer will default.

* Unsupervised: Clustering users into marketing segments.

 

Q29. Discuss the importance of cross-validation in model

selection. Explain different cross-validation techniques and how they help in

building robust predictive models.

 

Answer:

Cross-validation evaluates model generalizability.

 

Techniques:

 

* Holdout: Simple split.

* k-Fold: Train on k-1 folds, test on 1; repeat.

* Stratified k-Fold: Maintains class distribution.

* LOOCV: Leave-one-out; high variance.

* Time Series CV: Maintains temporal order.

 

Importance:

 

* Reduces overfitting.

* Supports hyperparameter tuning.

* Provides reliable performance estimates.

 

Q30. Analyze the bias-variance trade-off in predictive

modeling. How does this concept influence model complexity and what strategies

can be employed to find the optimal balance?

 

Answer:

Bias: Error from overly simplistic assumptions. High bias

underfits.

 

Variance: Error from model sensitivity to data fluctuations.

High variance overfits.

 

Trade-off:

 

* Simple model: High bias, low variance.

* Complex model: Low bias, high variance.

 

Strategies:

 

* Use cross-validation to monitor performance.

* Apply regularization (L1/L2).

* Prune decision trees.

* Use ensemble methods (bagging reduces variance, boosting

reduces bias).

 

Q31. Explain the methodology of selecting the most

appropriate model for a given business problem. Include discussions on business

constraints, data characteristics, and model complexity.

 

Answer:

Methodology:

 

1. Understand business objective.

2. Analyze data type, size, structure.

3. Test baseline models (e.g., logistic regression, decision

trees).

4. Evaluate with metrics aligned to goal (e.g., F1, AUC,

RMSE).

5. Consider constraints:

 

 * Interpretability.

 * Inference speed.

 * Data

availability.

 

Example: For a healthcare diagnostic tool, logistic

regression may be preferred over deep learning for explainability.

 

Q32. Discuss techniques for handling imbalanced datasets in

classification problems.

 

Answer:

Techniques:

 

· 

Resampling:

 

 * Oversample

minority (SMOTE).

 * Undersample

majority.

* Algorithmic:

 

 * Adjust class

weights.

 * Use cost-sensitive

learning.

* Ensemble:

 

 * Balanced Random

Forest.

* Evaluation:

 

· 

Use precision, recall, F1 instead of accuracy.

 

Example: In fraud detection (1% fraud), SMOTE increases

minority class representation, improving recall.

 

Q33. Analyze techniques for handling non-stationary time

series data. Include discussions on transformations, differencing, and testing

methods with examples.

 

Answer:

Non-stationary data has changing mean/variance.

 

Techniques:

 

* Differencing (ΔYₜ = Yₜ - Yₜ₋₁).

* Log or Box-Cox transformations for variance.

* Detrending via regression.

* Seasonal Differencing (ΔYₜ = Yₜ - Yₜ₋ₛ).

 

Testing:

 

* Augmented Dickey-Fuller (ADF): H₀ = non-stationary.

* KPSS: H₀ = stationary.

 

Example: Air passenger data requires log + seasonal

differencing to stabilize before ARIMA modeling.

 

Q34. Evaluate the application of time series analysis in

demand forecasting. Discuss methodology, variable selection, and evaluation of

forecast accuracy with a business case study.

 

Answer:

Methodology:

 

1. Data prep: Remove outliers, fill missing.

2. Model: ARIMA, Prophet, LSTM.

3. Feature selection: Lagged values, seasonality, holidays.

4. Evaluate: MAE, RMSE, MAPE.

 

Case Study: Walmart uses ARIMA + promotions/holiday

variables to forecast weekly store sales, reducing overstock and improving

customer satisfaction.

 

Q35. Explain the various performance metrics used for

evaluating regression models. Discuss their calculations, interpretations, and

situations where each would be most appropriate.

 

Answer:

 

* MAE = mean(|yᵢ - ŷᵢ|): Intuitive; robust to outliers.

* MSE = mean((yᵢ - ŷᵢ)²): Penalizes large errors.

* RMSE = √MSE: Same units as output; sensitive to outliers.

* R² = 1 – SSR/SST: Proportion of variance explained.

* Adjusted R²: Adjusts for number of predictors.

 

Use MAE when outliers are not critical, RMSE when large

errors matter, and R² for overall model fit.

 

 

Next, I'll continue with the final set of answers: Questions

36–47. Let me know if you'd like that now.

 

 

More Chapters