This document provides an overview of the PostgreSQL database schema used by Q&A platform. It describes the tables, indexes, and denormalization decisions, as well as caching considerations made to optimize performance and reduce redundant data fetching.
My database consists of six primary tables to support the Q&A functionality:
-
person
- Stores user information (identified by a
uuid
). - Contains minimal fields:
id
,uuid
, andname
.
- Stores user information (identified by a
-
course
- Represents a course, each having an
id
, acreated_at
timestamp, and aname
.
- Represents a course, each having an
-
question
- Stores individual questions.
- References a
course_id
to indicate which course the question belongs to. question
text andcreated_at
are also stored.
-
questionLike
- A join table (many-to-many relationship) that associates
question
s withperson
s who have liked them. - Primary key is the combination
(question_id, person_id)
.
- A join table (many-to-many relationship) that associates
-
answer
- Represents answers to questions.
- Each
answer
references thequestion_id
it answers and has its owncreated_at
timestamp.
-
answerLike
- Another join table linking
answer
s toperson
s who have liked them. - Primary key is
(answer_id, person_id)
.
- Another join table linking
Schema Snapshot (V1___initial_schema.sql)
CREATE TABLE person (
id SERIAL PRIMARY KEY,
uuid TEXT NOT NULL,
name VARCHAR(100) NOT NULL
);
CREATE TABLE course (
id SERIAL PRIMARY KEY,
created_at TIMESTAMP DEFAULT NOW(),
name VARCHAR(100) NOT NULL
);
CREATE TABLE question (
id SERIAL PRIMARY KEY,
question TEXT NOT NULL,
course_id INTEGER REFERENCES course(id),
created_at TIMESTAMP DEFAULT NOW()
);
CREATE TABLE questionLike (
question_id INTEGER REFERENCES question(id),
person_id INTEGER REFERENCES person(id),
PRIMARY KEY (question_id, person_id)
);
CREATE TABLE answer (
id SERIAL PRIMARY KEY,
answer TEXT NOT NULL,
question_id INTEGER REFERENCES question(id),
created_at TIMESTAMP DEFAULT NOW()
);
CREATE TABLE answerLike (
answer_id INTEGER REFERENCES answer(id),
person_id INTEGER REFERENCES person(id),
PRIMARY KEY (answer_id, person_id)
);
I use a few key indexes to improve query performance:
-
idx_person_uuid
(Unique index onperson(uuid)
)- Ensures fast lookups when fetching user details by
uuid
. - Also enforces uniqueness of
uuid
, preventing duplicates.
- Ensures fast lookups when fetching user details by
-
idx_question_course_id
(Index onquestion(course_id)
)- Speeds up queries retrieving questions by course.
- Important for listing all questions associated with a given course.
-
idx_answer_question_id
(Index onanswer(question_id)
)- Used when fetching multiple answers for a single question.
- Improves performance in queries like “SELECT * FROM answer WHERE question_id = ?” especially under load.
person.uuid
: I often fetch or create a user by a unique UUID, so a unique index ensures both fast lookup and uniqueness constraints.question.course_id
: Since I frequently group or filter questions by course, an index on course_id is crucial for quick access.answer.question_id
: Answers are typically fetched by theirquestion_id
, so indexing that column helps reduce query time for read operations.
At this stage, the schema is largely normalized:
person
,course
,question
,answer
are each in their own tables, referencing each other via foreign keys (course_id
,question_id
).- The “like” relationships (
questionLike
andanswerLike
) use join tables for many-to-many relationships.
- By keeping the schema normalized, I avoid data anomalies when updating or deleting references.
- The queries remain straightforward. A typical Q&A site structure maps well to these relations.
- Because I rely on indexes (and caching in some cases), the overhead of multiple tables is manageable, and typical Q&A workloads are read-heavy, so normalization with good indexes is efficient.
Potential Denormalization could involve storing “number of likes” directly on question
or answer
for immediate retrieval. However, in my current design, I calculate likes from the join table. I rely on caching (see below) to keep this fast, rather than duplicating fields in multiple places.
I use Redis to cache selected database queries and manage rate-limiting, implemented as follows:
-
Separate Redis Databases
-
DB 0:
- Purpose: Caches question and answer data (e.g.,
findAllByCourseId
,getAnswersByQuestionId
). - Usage: All caching-related operations for read-heavy endpoints are directed here.
- Purpose: Caches question and answer data (e.g.,
-
DB 1:
- Purpose: Manages rate-limiting keys to enforce user posting limits.
- Usage: Stores keys like
post-limit:${userUuid}:question
andpost-limit:${userUuid}:answer
to restrict users to one question and one answer per minute.
-
-
Service-Specific Proxies
cachedCourseService
: Caches most read operations for courses (e.g.,findById
).cachedQuestionService
: Caches reads likefindAllByCourseId
,findById
; invalidates relevant cache entries in DB 0 when a new question (addQuestion
) is added.cachedAnswerService
: Caches read operations for answers (getAnswersByQuestionId
); invalidates relevant cache entries in DB 0 when a new answer (addAnswer
) is added.
-
Rate Limiting
- Implemented in DB 1, ensuring that rate-limiting keys are not affected by cache invalidation operations on DB 0.
- Rate Limit Keys: Structured as
post-limit:${userUuid}:${postType}
wherepostType
is eitherquestion
oranswer
. - Expiry: Each rate-limit key is set with a 60-second expiration to enforce the one-post-per-minute rule.
-
When We Do Not Cache
- User-related lookups: We do not currently cache user calls (
userController
usesuserService
directly). These are typically less frequent and revolve around a uniqueuuid
, so performance is already sufficient without caching. - Like operations: For
questionLike
andanswerLike
, we bypass caching because likes might change frequently (adding or removing likes). Maintaining a real-time count can be trickier with caching, so we prefer direct DB reads and writes for those.
- User-related lookups: We do not currently cache user calls (
I optionally seed some initial data (flyway/sql/V2__seed_data.sql):
- Courses: e.g. “Introduction to Programming,” “Database Design”
- Questions: e.g. “What is a variable?”, “What is normalization?”
- Answers: e.g. “A variable is a storage location...”, “Normalization is organizing data...”
This provides a quick starting set of records to test basic functionality (listing courses, retrieving questions, etc.).
- Granular Invalidation: Instead of
flushdb()
, I might refine caching keys to only remove stale entries upon insert or update. - Denormalization: If read loads grow, storing a
like_count
onquestion
oranswer
might reduce frequent join table queries. This would require carefully updatinglike_count
whenever a like is added or removed. - Sharding or Partitioning: If the question/answer volume grows extremely large, I might explore database sharding or partitioned tables to split data by course or date range.
- Advanced Indexing: For complex queries (filtering or searching text), I might add a GIN or full-text index on
question
/answer
fields.
My database design focuses on normalized tables for clarity and maintainability, leveraging indexes to handle typical read queries effectively. Caching (via Redis) accelerates common fetch operations, reducing direct DB load. As usage scales, further refinement of caching (granular invalidation) or limited denormalization (like storing aggregated “like” counts) may be introduced to keep performance robust under higher traffic.
Overall, this architecture balances simplicity with performance through strategic indexing and caching. If you have any questions or suggestions, feel free to raise them in the project’s issue tracker or contact the team.