Platform Docs / Data & Evidence

Data & Evidence Model

The epistemic foundation of Entrestate — how raw market signals become institutional-grade truths through the 5-Layer Evidence Stack, strict exclusion policies, and deterministic data governance. In our framework, a number is never just a "fact" — it is a "belief under pressure."

Core Philosophy: Sensors, Not Judges

External platforms (Property Finder, DLD, RERA) are regarded as sensors that detect market movement, but the Entrestate Adjudication Engine is the final judge. We do not simply aggregate data — we adjudicate it. We move beyond showing raw facts to exposing confidence levels.

Intent Collapse Risk

Treating a high-risk speculative play as a safe yield opportunity because underlying data wasn't weighted for reliability.

Signal Distortion Risk

Allowing noise — distressed sales, internal transfers, duplicates — to skew ROI calculations and market averages.

Accountability Risk

Being unable to defend a multi-million AED decision because the source of "truth" is untraceable or unverified.

The 5-Layer Evidence Stack

Data integrity increases as the layer number decreases. The Decision Tunnel only utilizes L1-L3 data for final recommendations. L4/L5 data is never presented as truth — it awaits adjudication.

Canonical (Audited Static Truths)

Highest reliability

The single source of truth. Includes normalized developer identities (481 canonical developers), verified AED prices, confirmed handover dates, and geospatial coordinates. Static Truth Finalization requires five fields: Location, Developer, Prices (From/To), Dates (Launch/Handover), and Status.

Examples: Emaar Properties (normalized), AED 3,510,000 (PF verified), 25.1234/55.5678 (confirmed coordinates)

Derived (Calculated Truths)

High reliability

Metrics mathematically calculated from L1 data. Investment Scores (0-100), Stress Grades (A-F), and verified rental yield percentages. Powers the objective 65% Market Score component used in the Decision Tunnel's Judgment stage.

Examples: Investment Score: 85, Stress Grade: B+, Verified Yield: 12.3%

Dynamic (Living States)

Medium reliability

Real-time signals reflecting the current market state. Projects are treated as states moving through a lifecycle: Market Context, Developer Execution, Financial Logic, Delivery Risk, Exit Reality. Adjusts Data Confidence based on market events.

Examples: BUY/HOLD timing signals, price momentum indicators, inventory level changes

External (Market Sensors)

Low-Medium reliability

Raw sensor data from external sources. In our framework, external sources are treated as sensors, not judges. Data remains in 'external belief' status until verified through the 10-Phase Pipeline to reach L1 Canonical status.

Examples: DLD transaction history, RERA registration data, Property Finder listings, Bayut feeds

Raw (Unprocessed Extraction)

Lowest reliability

Unprocessed information from initial ingestion. Contains regex artifacts, location fragments, and raw HTML/JSON noise. The entry point for Static Truth Recovery. Never used for direct decisioning — exposing L5 data to decision-makers introduces unacceptable risk.

Examples: Raw HTML snippets, regex artifacts (e.g., developer: 's'), PDF brochure extracts

Exclusion Policy

Maintaining data integrity requires a strict Exclusion Policy. The system automatically filters records that would corrupt the competitive landscape analysis or distort L1 pricing baselines.

Category	Reason	Impact
Distressed Sales	Outliers that skew fair market value and corrupt L1 price integrity	Prevents false ROI projections from non-representative transactions
Internal Transfers	Non-market movements within organizations that don't reflect real demand	Protects supply-demand analysis from artificial volume inflation
Duplicates & Developer Noise	Regex artifacts, location fragments (e.g., 'At Aljada'), and identity collisions	Ensures canonical graph integrity and accurate developer track records

Data Source Integration

All external sources enter the pipeline as L4 External sensors. They are never treated as truth until adjudicated through the 10-Phase Pipeline. Each source has a specific role in the evidence ecosystem.

Dubai Land Department (DLD)

Transaction authority and historical rental data

L4 External sensor adjudicated to L1 for yield calculation (Phase 5). Historical occupancy data used as circuit breaker (>85% threshold for Conservative profiles).

RERA

Regulatory compliance and project registration

L4 External sensor for project lifecycle validation. Registration status feeds into Stress Testing (Phase 6) and Developer Registry (Phase 3).

Property Finder

Market listing sensor and price verification

L4 External sensor providing price signals. Verified against canonical values during Price Verification (Phase 4). Scrape dates attached as provenance metadata.

Bayut

Secondary market sensor and availability tracking

L4 External sensor for inventory tracking and price momentum signals. Feeds L3 Dynamic layer for real-time market movement detection.

Static Truth Recovery

The L1 layer is currently 34.3% complete in raw form. To reach institutional grade, we employ Aggressive Field Extraction — a deterministic process that recovers missing truths from URL patterns, project briefs, and official brochures using Gemini 1.5 AI.

Field	Raw Coverage	Target	Recovery Method
Developer Identity	18%	100%	URL pattern extraction + brochure AI parsing
Area Normalization	54%	100%	Regex cleaning + geospatial coordinate verification
Price Verification	34.3%	>90%	Cross-source adjudication (PF, DLD, RERA)
Handover Dates	~40%	>85%	Developer registry + project timeline analysis
Coordinates	~60%	>95%	Address parsing + satellite imagery confirmation

The Evidence Drawer

Every Decision Object includes an Evidence Drawer — the transparency layer that exposes the "why" behind every recommendation. This allows stakeholders to audit the decision readiness of any asset.

Auditability

Every number footnoted to its specific L-layer to prove it is a "static truth."

Data Confidence

Score reflecting completeness, freshness, and source agreement of underlying data.

Strategic Rationale

Explicit listing of filters, weights, and profile lens used to resolve the query.

Evidence Package Checklist

Verified Footnotes: L-layer provenance for every value
Source Provenance: Original sensors (DLD, RERA) with adjudication dates
Decision Lens Transparency: Explicit 65/35 weights applied
Stress Test Confirmation: L2 Stress Grade (A/B) proving volatility resistance