2 min readfrom Data Science

Defining a new analysis: help defining the feature space

I am weighing creating an informal analysis of innovation and its effect on economic performance.

So far, I have the following data pulled; from a preliminary look, most datasets appear to have a large number of non-null values. I am thinking of performing OLS/Linear Regression. The data is grouped by country and would per analyzed per capita.

Independent variables:

- New patent applications(discrete)

- Average work hours per week (continuous)

- Government type (categorical)

- Social progress score (continuous)

Dependent variable:

- GDP (continuous)

However, I have two concerns. First, I would like to have more variables as inputs, as what I have so far seems to be a weak proxy for “innovation”. One option is to add in confounders (addressed below), normalize for these, and create an “innovation composite score”.

Second, if I do an innovation composite score, I am unclear exactly how to normalize the input variables based on the confounding variables. If I do not do an innovation composite score, I am also at a loss for how to add in these features into the feature space - categorical binning of a “developed” score? Am I overthinking it?

Potential confounders

- Education score (continuous)

- Income (DON’T HAVE - need to find)

- Poverty (proxied through “number of calories per day”, continuous)

- Infrastructure score (continuous)

In summary, I am looking to further define my feature space, including accounting for confounders. Thank you for your thoughts!

Sources:

New patents by country (2023, 2024)

- https://worldpopulationreview.com/country-rankings/patents-by-country

Education levels by country (2023)

- https://worldpopulationreview.com/country-rankings/education-rankings-by-country

Average hours in a work week by country (2023)

- https://worldpopulationreview.com/country-rankings/average-work-week-by-country

Poverty, proxied through daily supply of calories per person (2023)

- https://ourworldindata.org/grapher/daily-per-capita-caloric-supply?time=2022..latest&country=~USA

Infrastructure (various factors) (2023)

- https://worldpopulationreview.com/country-rankings/infrastructure-by-country

Government type -

- https://worldpopulationreview.com/country-rankings/government-system-by-countryW

World Happiness Report (various factors) (2023, 2024)

- https://www.worldhappiness.report/data-sharing/

Social progress by country (2023)

- https://worldpopulationreview.com/country-rankings/social-progress-index-by-country

Population (2023)

- https://data.worldbank.org/indicator/SP.POP.TOTL?end=2024&start=2022

Output: GDP change % YoY (per capita)

- https://data.worldbank.org/indicator/NY.GDP.MKTP.KD?end=2024&start=2021

submitted by /u/SingerEast1469
[link] [comments]

Want to read more?

Check out the full article on the original site

View original article

Tagged with

#generative AI for data analysis
#Excel alternatives for data analysis
#rows.com
#conversational data analysis
#real-time data collaboration
#data analysis tools
#big data performance
#big data management in spreadsheets
#intelligent data visualization
#data visualization tools
#enterprise data management
#data cleaning solutions
#natural language processing for spreadsheets
#cloud-based spreadsheet applications
#real-time collaboration
#machine learning in spreadsheet applications
#large dataset processing
#innovation
#GDP
#economic performance