Import profile data from LinkedIn data export #97
Labels
No labels
area:auth
area:ci
area:db
area:infra
area:native
area:pwa
area:service
epic
feature
foundation
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
james/carol#97
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Bootstrapping a Carol profile by hand is the obvious friction point — most users already have years of jobs / education / skills curated on LinkedIn. A one-shot import from a LinkedIn data export lets a new user get to a useful state in minutes instead of an afternoon of copy-paste.
Filing this brings LinkedIn import into scope;
CLAUDE.md's "Out of scope (today)" list should be updated to reflect that when the first PR for this work lands.Why the data-export path, not an API
LinkedIn's official API (Marketing / Talent / Sign In with LinkedIn) returns essentially nothing about a user's own profile data unless the calling app has gone through LinkedIn's partner approval — which is a non-starter for a self-hosted personal tool. Scraping is against LinkedIn's TOS and they actively block it.
The realistic path is LinkedIn's own Get a copy of your data export. Users request it from LinkedIn's privacy settings, wait 10 minutes to 24 hours, and get a
.zipof CSV files covering profile, positions, education, skills, certifications, languages, projects, recommendations, connections, messages, etc. The CSV schema is stable enough that a parser written today will keep working for a few years.This makes the import:
Scope (first pass)
Map what's in a typical LinkedIn data export onto Carol's existing entities:
Profile.csvPositions.csvEducation.csvSkills.csvCertifications.csvLanguages.csvProjects.csvOut of scope for the first pass (file as follow-ups when needed):
Connections.csv→ People / Organizations. Connections imply a network, not just a contact list; merging into existing People records, dedup by name+company, and avoiding spam-rate-of-creation needs design work that shouldn't block first-cut profile import.Recommendations.csv(given + received). No matching entity today.Messages.csv,Reactions.csv,Comments.csv,Likes.csv. Engagement data Carol doesn't model.Email Addresses.csvof contacts. Privacy-sensitive bulk-PII; explicit second-pass design.Acceptance criteria
.zipdata export, and see a preview of what Carol parsed: how many positions, education entries, skills, projects.Profile,Jobs,Education,Skills, andProjectsentities. All writes scoped to the callinguser_idper the per-user data isolation convention.Jobs=(company, title, start_date);Skills=(normalized_name); etc.)..zipis processed in-memory (or in a per-request tempdir wiped on completion). The data inside it is the user's; the.zipitself doesn't need to be retained.Design questions to settle before implementation
/settings/importpage is the obvious answer; alternatives include a one-shot first-run wizard, or attaching to the Profile edit page.CertificationsandLanguagesfirst-class entities or just typed skills? Calls for a quick look at how the Profile model is shaping up.Out of scope today
Connections.csvimport. Files a follow-up ticket once first-pass profile import lands.Part of epic #2.