Detailed scoring criteria per feature
Feature tree | Score | |||||
Data ingestion and integration | → | Data consolidation | → | Connectivity to N data sources | 5 | |
→ | Data extraction, transformation and loading (ETL) and ETL support | 5 | ||||
→ | Data modelling | 5 | ||||
→ | Data propagation | → | Data flow orchestration, enterprise application integration, exchange of messages and transactions | 5 | ||
→ | Enterprise data replication, transfer large amounts of data between databases | 5 | ||||
→ | Versioning and file management | 5 | ||||
→ | Data virtualisation | → | Data access | 5 | ||
→ | Data federation | → | Enterprise information integration | 5 | ||
| | Total | 40 | ||||
Data preparation and cleaning | → | Parsing and standardisation | → | Tagging data with keywords, descriptions or categories | 5 | |
→ | (Data scrubbing)/cleansing/handling blank values/reformatting values/threshold checking | 5 | ||||
→ | Data enhancement/enrichment/curation | 5 | ||||
→ | Natural language processing | 5 | ||||
→ | Address validation/geocoding | 5 | ||||
→ | Master data management | 5 | ||||
→ | Data masking | 5 | ||||
→ | Identity resolution, linkage, merging and consolidation | → | Data deduping | 5 | ||
→ | Machine learning/training a statistical model | 5 | ||||
→ | Data aggregation | 5 | ||||
→ | Data binning | 5 | ||||
→ | Grouping similar data/clustering | 5 | ||||
→ | Outlier detection and removal | 5 | ||||
→ | Master reference data management | → | ‘Hub’ infrastructure to source and distribute master/reference data | 5 | ||
→ | Master data versioning based on data history and timelines | 5 | ||||
→ | Workflow integrations to steward and publish the master/reference data | 5 | ||||
→ | Graph data stores to define relationships for creating a flexible knowledge graph | 5 | ||||
→ | Accessible API (Application Programming Interface) for real-time access to shared reference data | 5 | ||||
| | Total | 90 | ||||
Data profiling, exploration/ pattern detection | → | Relationship discovery | → | Cross-table redundancy analysis | 5 | |
→ | Performing data quality assessment, risk of performing joins on the data | 5 | ||||
→ | Identifying distributions, key candidates, foreign-key candidates, functional dependencies, embedded value dependencies and performing intertable analysis. | 5 | ||||
→ | Content discovery | → | Data pattern discovery | 5 | ||
→ | Domain analysis | 5 | ||||
→ | Discovering metadata and assessing its accuracy | 5 | ||||
→ | Structure discovery | → | Column value frequency analysis and statistics, collecting descriptive statistics like min, max, count and sum. | 5 | ||
→ | Table structure analysis, collecting data types, length and recurring patterns. | 5 | ||||
→ | Drill-through analysis | 5 | ||||
| | Total | 45 | ||||
Data monitoring | → | Monitoring and alerting | → | Time series data identified and collection by metric name and key/value pairs | 5 | |
→ | Flexible query language to leverage this dimensionality | 5 | ||||
→ | Graphing and dashboarding support | 5 | ||||
| | Total | 15 | ||||
Data use | → | Metadata management | → | Concept identification and naming | 5 | |
→ | Data categorisation | 5 | ||||
→ | Lineage | 5 | ||||
→ | Relationship with other metadata | 5 | ||||
→ | Comments and remarks | 5 | ||||
→ | Data statistics (profiles) | 5 | ||||
→ | Knowledge graph | 5 | ||||
→ | Privacy and security | → | Data anonymisation | 5 | ||
→ | Role based access control | 5 | ||||
→ | Secure environment setup and deployment | 5 | ||||
→ | Container-based deployment | 5 | ||||
→ | Data mining | → | Interactive data visualisation | 5 | ||
→ | Visual programming and analysis | 5 | ||||
→ | Visual illustrations and training documentation | 5 | ||||
→ | Sample data/generate fake data | 5 | ||||
→ | Add-ons and extension functionality | 5 |