Data Preparation and Management
September 25, 2024
Characteristic | Description |
---|---|
Large | Holds billions of records and petabytes of data |
Multiple Sources | Data comes from many internal and external sources via the ETL process |
Historical | Typically contains data spanning 5 years or more |
Cross-Organizational Access and Analysis | Data accessed and analyzed by users across the organization to support multiple business processes and decision-making |
Supports Various Analyses and Reporting | Enables drill-down analysis, metric development, trend identification |
Feature | Data Mart | Data Warehouse |
---|---|---|
Scope | Specific department or business area | Entire enterprise |
Data Volume | Smaller | Larger |
Complexity | Less complex | More complex |
Implementation Time | Shorter | Longer |
Cost | Lower | Higher |
Feature | Data Lake | Data Warehouse |
---|---|---|
Data Processing | Schema-on-read (processed when accessed) | Schema-on-write (processed before storage) |
Data State | Raw, unprocessed | Cleaned, transformed |
Data Types | All data types | Primarily structured |
Flexibility | High | Moderate |
data.frame
for county-level data and data.frame
for geographic information, such as longitude and latitude.data.frame
based on common data values in those data.frame
.