Topic 1 Question 92
A company wants to predict the sale prices of houses based on available historical sales data. The target variable in the company's dataset is the sale price. The features include parameters such as the lot size, living area measurements, non-living area measurements, number of bedrooms, number of bathrooms, year built, and postal code. The company wants to use multi-variable linear regression to predict house sale prices. Which step should a machine learning specialist take to remove features that are irrelevant for the analysis and reduce the model's complexity?
Plot a histogram of the features and compute their standard deviation. Remove features with high variance.
Plot a histogram of the features and compute their standard deviation. Remove features with low variance.
Build a heatmap showing the correlation of the dataset against itself. Remove features with low mutual correlation scores.
Run a correlation check of all features against the target variable. Remove features with low target variable correlation scores.
ユーザの投票
コメント(17)
D should be the more comprehensive answer. If it's not correlated, you can't make use of it in a linear regression A lot of others say B, but low variance can also be due to the nature/typical magnitudes of the variable itself
👍 28puffpuff2021/10/26Answer B. Is not the best solucion prior can use other analysis. https://community.dataquest.io/t/feature-selection-features-with-low-variance/2418 If the variance is low or close to zero, then a feature is approximately constant and will not improve the performance of the model. In that case, it should be removed. Or if only a handful of observations differ from a constant value, the variance will also be very low.
👍 16ahquiceno2021/09/20- 👍 5MikkyO2021/11/02
シャッフルモード