Picking a lobbying firm is guesswork for outsiders.
Directories like Leadership Directories list firms, but they don't help with fit. Small businesses, nonprofits, and individuals face a steep learning curve when seeking representation, while large corporations have in-house expertise and established relationships. Matchmaking relies on social or political connections, not merit.
Information asymmetry favors those already in the system. Decades of public Lobbying Disclosure Act filings sit in government databases, but nobody has turned that data into a tool that evaluates which firm is right for a specific client's needs.
What I prioritized, what I cut, and why
Building on public disclosure data required careful scoping to ship a working prototype:
client_self_select classification
A single field in LD-1 filings reliably distinguishes lobbying firms from self-filing corporations. Prevented weeks of false starts with keyword-based approaches.
Activity description parsing
Extracting freeform lobbying descriptions and matching bill numbers to user inputs. Parked because the structure and use case weren't clear yet.
Percentile-based relative scoring
Early versions used absolute thresholds where top firms all scored identically. Relative ranking exposes why one firm wins on Experience despite fewer filings.
Covered position text extraction
The count exists but not the actual descriptions (e.g., "Former Chief of Staff, Senate Finance"). Identified as a data gap, deferred rather than blocking launch.
Two-phase architecture
Server-side pre-computation of match scores paired with AI-generated narratives. This decision drove the 10x performance improvement.
Full committee validation
Current logic infers relationships from issue codes rather than parsing contribution recipients. Proper implementation would cross-reference Congress member databases.
Enter your issue area, get ranked firm recommendations with AI-generated rationale.
Users describe their organization and lobbying needs. The system matches against an enriched dataset of lobbying firms derived from public LDA filings, scoring each firm on experience, committee relationships, and issue relevance.
Results display with component scores that reveal tradeoffs: one firm might win on Experience (95) because they have 51 former officials with strong committee relationships for the user's specific issue, while another has higher volume but less targeted relationships.
Technical and product insights
Architecture decisions matter more than optimization
Initial response times were 45-76 seconds. The breakthrough came from rethinking the processing model (deterministic analytics + AI narrative), not from tweaking existing code. Result: 5-10 second responses.
Early exit logic transforms data pipelines
Classification script went from 24+ hours to 2-3 hours by adding "stop counting after 10 clients" and a 2025-only activity cutoff. Large firms with 1,000+ clients no longer require paginating through all filings.
Domain knowledge compounds
Understanding that client_self_select reliably classifies firms came from knowing how LD-1 forms work. This prevented weeks of false starts with keyword-based or name-matching approaches.
Honest scoring beats cosmetic scoring
Early versions used absolute thresholds where top firms all maxed out identically. Percentile ranking exposed underlying tradeoffs and made the recommendations genuinely useful.
What's next
If I continued development, these are the natural extensions: