Platform Docs / Data & Evidence
Data & Evidence Model
The epistemic foundation of Entrestate — how raw market signals become institutional-grade truths through the 5-Layer Evidence Stack, strict exclusion policies, and deterministic data governance. In our framework, a number is never just a "fact" — it is a "belief under pressure."
Core Philosophy: Sensors, Not Judges
External platforms (Property Finder, DLD, RERA) are regarded as sensors that detect market movement, but the Entrestate Adjudication Engine is the final judge. We do not simply aggregate data — we adjudicate it. We move beyond showing raw facts to exposing confidence levels.
Intent Collapse Risk
Treating a high-risk speculative play as a safe yield opportunity because underlying data wasn't weighted for reliability.
Signal Distortion Risk
Allowing noise — distressed sales, internal transfers, duplicates — to skew ROI calculations and market averages.
Accountability Risk
Being unable to defend a multi-million AED decision because the source of "truth" is untraceable or unverified.
The 5-Layer Evidence Stack
Data integrity increases as the layer number decreases. The Decision Tunnel only utilizes L1-L3 data for final recommendations. L4/L5 data is never presented as truth — it awaits adjudication.
Canonical (Audited Static Truths)
Highest reliabilityThe single source of truth. Includes normalized developer identities (481 canonical developers), verified AED prices, confirmed handover dates, and geospatial coordinates. Static Truth Finalization requires five fields: Location, Developer, Prices (From/To), Dates (Launch/Handover), and Status.
Examples: Emaar Properties (normalized), AED 3,510,000 (PF verified), 25.1234/55.5678 (confirmed coordinates)
Derived (Calculated Truths)
High reliabilityMetrics mathematically calculated from L1 data. Investment Scores (0-100), Stress Grades (A-F), and verified rental yield percentages. Powers the objective 65% Market Score component used in the Decision Tunnel's Judgment stage.
Examples: Investment Score: 85, Stress Grade: B+, Verified Yield: 12.3%
Dynamic (Living States)
Medium reliabilityReal-time signals reflecting the current market state. Projects are treated as states moving through a lifecycle: Market Context, Developer Execution, Financial Logic, Delivery Risk, Exit Reality. Adjusts Data Confidence based on market events.
Examples: BUY/HOLD timing signals, price momentum indicators, inventory level changes
External (Market Sensors)
Low-Medium reliabilityRaw sensor data from external sources. In our framework, external sources are treated as sensors, not judges. Data remains in 'external belief' status until verified through the 10-Phase Pipeline to reach L1 Canonical status.
Examples: DLD transaction history, RERA registration data, Property Finder listings, Bayut feeds
Raw (Unprocessed Extraction)
Lowest reliabilityUnprocessed information from initial ingestion. Contains regex artifacts, location fragments, and raw HTML/JSON noise. The entry point for Static Truth Recovery. Never used for direct decisioning — exposing L5 data to decision-makers introduces unacceptable risk.
Examples: Raw HTML snippets, regex artifacts (e.g., developer: 's'), PDF brochure extracts
Exclusion Policy
Maintaining data integrity requires a strict Exclusion Policy. The system automatically filters records that would corrupt the competitive landscape analysis or distort L1 pricing baselines.
| Category | Reason | Impact |
|---|---|---|
| Distressed Sales | Outliers that skew fair market value and corrupt L1 price integrity | Prevents false ROI projections from non-representative transactions |
| Internal Transfers | Non-market movements within organizations that don't reflect real demand | Protects supply-demand analysis from artificial volume inflation |
| Duplicates & Developer Noise | Regex artifacts, location fragments (e.g., 'At Aljada'), and identity collisions | Ensures canonical graph integrity and accurate developer track records |
Data Source Integration
All external sources enter the pipeline as L4 External sensors. They are never treated as truth until adjudicated through the 10-Phase Pipeline. Each source has a specific role in the evidence ecosystem.
Dubai Land Department (DLD)
Transaction authority and historical rental data
L4 External sensor adjudicated to L1 for yield calculation (Phase 5). Historical occupancy data used as circuit breaker (>85% threshold for Conservative profiles).
RERA
Regulatory compliance and project registration
L4 External sensor for project lifecycle validation. Registration status feeds into Stress Testing (Phase 6) and Developer Registry (Phase 3).
Property Finder
Market listing sensor and price verification
L4 External sensor providing price signals. Verified against canonical values during Price Verification (Phase 4). Scrape dates attached as provenance metadata.
Bayut
Secondary market sensor and availability tracking
L4 External sensor for inventory tracking and price momentum signals. Feeds L3 Dynamic layer for real-time market movement detection.
Static Truth Recovery
The L1 layer is currently 34.3% complete in raw form. To reach institutional grade, we employ Aggressive Field Extraction — a deterministic process that recovers missing truths from URL patterns, project briefs, and official brochures using Gemini 1.5 AI.
| Field | Raw Coverage | Target | Recovery Method |
|---|---|---|---|
| Developer Identity | 18% | 100% | URL pattern extraction + brochure AI parsing |
| Area Normalization | 54% | 100% | Regex cleaning + geospatial coordinate verification |
| Price Verification | 34.3% | >90% | Cross-source adjudication (PF, DLD, RERA) |
| Handover Dates | ~40% | >85% | Developer registry + project timeline analysis |
| Coordinates | ~60% | >95% | Address parsing + satellite imagery confirmation |
The Evidence Drawer
Every Decision Object includes an Evidence Drawer — the transparency layer that exposes the "why" behind every recommendation. This allows stakeholders to audit the decision readiness of any asset.
Auditability
Every number footnoted to its specific L-layer to prove it is a "static truth."
Data Confidence
Score reflecting completeness, freshness, and source agreement of underlying data.
Strategic Rationale
Explicit listing of filters, weights, and profile lens used to resolve the query.
Evidence Package Checklist
- Verified Footnotes: L-layer provenance for every value
- Source Provenance: Original sensors (DLD, RERA) with adjudication dates
- Decision Lens Transparency: Explicit 65/35 weights applied
- Stress Test Confirmation: L2 Stress Grade (A/B) proving volatility resistance