Data Preparation and Management
September 23, 2024
ID | Animal |
---|---|
1 | Dog |
2 | Cat |
3 | Bird |
ID | Education Level |
---|---|
1 | Bachelor’s |
2 | Master’s |
3 | PhD |
Ordinal Data: Categorical data where the categories have a meaningful order or ranking.
Order Matters: Categories can be ranked or ordered, but the differences between categories are not necessarily uniform.
Examples:
ID | Temperature (°F) |
---|---|
1 | 70 |
2 | 80 |
3 | 90 |
Interval Data: Numeric data where the differences between values are meaningful, but there is no true zero point.
Meaningful Intervals: The difference between values is consistent.
No True Zero: Zero does not indicate the absence of the quantity.
Examples:
ID | Height (cm) | Weight (kg) |
---|---|---|
1 | 160 | 55 |
2 | 175 | 70 |
3 | 170 | 65 |
Ratio Data: Numeric data with a true zero point, allowing for a full range of mathematical operations.
Meaningful Ratios: Comparisons like twice as much or half as much are valid.
True Zero: Zero indicates the absence of the quantity.
Examples:
Data Source | Description | URL |
---|---|---|
Bureau of Labor Statistics (BLS) | Provides access to data on inflation and prices, wages and benefits, employment, spending and time use, productivity, and workplace injuries | BLS |
FRED (Federal Reserve Economic Data) | Provides access to a vast collection of U.S. economic data, including interest rates, GDP, inflation, employment, and more | FRED |
Yahoo Finance | Provides comprehensive financial news, data, and analysis, including stock quotes, market data, and financial reports | Yahoo Finance |
IMF (International Monetary Fund) | Provides access to a range of economic data and reports on countries’ economies | IMF Data |
World Bank Open Data | Free and open access to global development data, including world development indicators | World Bank Open Data |
OECD Data | Provides access to economic, environmental, and social data and indicators from OECD member countries | OECD Data |
Data Source | Description | URL |
---|---|---|
Data.gov | Portal providing access to over 186,000 government data sets, related to topics such as agriculture, education, health, and public safety | Data.gov |
CIA World Factbook | Portal to information on the economy, government, history, infrastructure, military, and population of 267 countries | CIA World Factbook |
U.S. Census Bureau | Portal to a huge variety of government statistics and data relating to the U.S. economy and its population | U.S. Census Bureau |
European Union Open Data Portal | Provides access to public data from EU institutions | EU Open Data Portal |
New York City Open Data | Provides access to datasets from New York City, covering a wide range of topics such as public safety, transportation, and health | NYC Open Data |
Los Angeles Open Data | Portal for accessing public data from the City of Los Angeles, including transportation, public safety, and city services | LA Open Data |
Chicago Data Portal | Offers access to datasets from the City of Chicago, including crime data, transportation, and health statistics | Chicago Data Portal |
Data Source | Description | URL |
---|---|---|
Healthdata.gov | Portal to 125 years of U.S. health care data, including national health care expenditures, claim-level Medicare data, and other topics | Healthdata.gov |
World Health Organization (WHO) | Portal to data and statistics on global health issues | WHO Data |
National Centers for Environmental Information (NOAA) | Portal for accessing a variety of climate and weather data sets | NCEI |
NOAA National Weather Service | Provides weather, water, and climate data, forecasts and warnings | NOAA NWS |
FAO (Food and Agriculture Organization) | Provides access to data on food and agriculture, including data on production, trade, food security, and sustainability | FAOSTAT |
Pew Research Center Internet & Technology | Portal to research on U.S. politics, media and news, social trends, religion, Internet and technology, science, Hispanic, and global topics | Pew Research |
Data for Good from Facebook | Provides access to anonymized data from Facebook to help non-profits and research communities with insights on crises, health, and well-being | Facebook Data for Good |
Data for Good from Canada | Provides open access to datasets that address pressing social challenges across Canada | Data for Good Canada |
Data Source | Description | URL |
---|---|---|
Amazon Web Services (AWS) public data sets | Portal to a huge repository of public data, including climate data, the million song dataset, and data from the 1000 Genomes project | AWS Datasets |
Gapminder | Portal to data from the World Health Organization and World Bank on economic, medical, and social issues | Gapminder |
Google Dataset Search | Helps find datasets stored across the web | Google Dataset Search |
Kaggle Datasets | A community-driven platform with datasets from various fields, useful for machine learning and data science projects | Kaggle Datasets |
UCI Machine Learning Repository | A collection of databases, domain theories, and datasets used for machine learning research | UCI ML Repository |
United Nations Data | Provides access to global statistical data compiled by the United Nations | UN Data |
Humanitarian Data Exchange (HDX) | Provides humanitarian data from the United Nations, NGOs, and other organizations | HDX |
Democratizing Data from data.org | A platform providing access to high-impact datasets, tools, and resources aimed at solving critical global challenges | Democratizing Data |
Justia Federal District Court Opinions and Orders database | A free searchable database of full-text opinions and orders from civil cases heard in U.S. Federal District Courts | Justia |