SQL Interview Questions by Difficulty

SQL Interview Prep

Sign up for the latest in SQL interview resourcs from the Galaxy Team!
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
This Q&A compilation is best for job-seekers from entry-level to experienced who are preparing for SQL or database-related interviews – aspiring data analysts, software engineers, data scientists, DBAs, and anyone who needs a refresher on SQL concepts and best practices.

Preparing for an SQL interview can feel daunting, but practice makes perfect! Below is a comprehensive list of SQL interview questions from beginner to advanced, along with concise answers. These cover everything from basic definitions to tricky scenarios. Use this as a study guide to test your SQL knowledge and learn how to explain key concepts. (Pro tip: try answering each question in your own words, and run sample queries in Galaxy’s Playground to solidify your understanding.)

Basic SQL Questions

What is SQL?
SQL stands for Structured Query Language, a programming language used for managing and querying relational databases​datacamp.com. In simple terms, SQL lets you insert, retrieve, update, and delete data stored in tables. It’s the standard language for communicating with Relational Database Management Systems (RDBMS).

What are SQL dialects?
SQL dialects are the various implementations or “flavors” of SQL by different database systems​datacamp.com. All dialects share core SQL syntax with minor differences or extensions. Examples include MySQL, PostgreSQL, SQL Server (T-SQL), Oracle (PL/SQL), and SQLite. Knowing one SQL dialect makes it easier to learn others, since the fundamentals are similar.

What are the main applications of SQL?
SQL is used to create and modify database structures, and to manipulate and query data. Common applications include​datacamp.com:

  • Creating or deleting tables and defining schema (using DDL commands, see below).
  • Inserting new records and updating or deleting existing records (using DML commands).
  • Querying data with SELECT to retrieve information from one or multiple tables.
  • Summarizing data through aggregations (e.g. COUNT, SUM) and combining data via JOINs for analysis.
  • Managing user permissions and transactions to ensure data integrity.

What is an SQL statement? Give some examples.
An SQL statement (or command) is a text string that the database engine can execute to perform a specific task​datacamp.com. For example: SELECT (to retrieve data), CREATE (to create a new table or database object), UPDATE (to modify data), DELETE (to remove data), etc. Each statement belongs to a category of SQL commands (see next question). A valid SQL statement ends with a semicolon (;) in many systems.

What types of SQL commands do you know (DDL, DML, etc.)?
SQL commands are commonly grouped into subsetsdatacamp.com:

  • DDL (Data Definition Language) – Commands that define or alter the structure of the database (e.g. CREATE, ALTER, DROP table).
  • DML (Data Manipulation Language) – Commands for manipulating data within tables (e.g. INSERT, UPDATE, DELETE, and the query command SELECT).
  • DCL (Data Control Language) – Commands related to access control (e.g. GRANT or REVOKE permissions).
  • TCL (Transaction Control Language) – Commands for managing transactions (e.g. COMMIT to save a transaction, ROLLBACK to undo).
  • DQL (Data Query Language) – Sometimes SELECT is categorized separately as DQL for data querying.

These categories help describe the purpose of each SQL command. For example, SELECT is a DQL command to read data, while CREATE TABLE is a DDL command to define a new table.

Give some examples of common SQL commands in each category.

  • DDL: CREATE TABLE, ALTER TABLE, DROP TABLE, TRUNCATE (remove all rows from a table). These commands define or modify database objects. For instance, CREATE TABLE Employees (...) makes a new table, and ALTER TABLE Employees ADD COLUMN Email VARCHAR(255); adds a new column.
  • DML: INSERT, UPDATE, DELETE. For example, INSERT INTO Employees(name, role) VALUES ('Alice','Engineer'); adds a record. UPDATE Employees SET role='Manager' WHERE name='Alice'; modifies records that meet a condition.
  • DCL: GRANT and REVOKE (e.g., granting a user read-only access to a table).
  • TCL: COMMIT, ROLLBACK, SAVEPOINT. For example, after a series of INSERT statements, COMMIT; will permanently save the changes.

What is a database?
In simple terms, a database is an organized collection of data, typically stored in many related tables. A relational database arranges data into tables (rows and columns) and uses relationships to connect them​datacamp.com. The data is structured so that information can be easily accessed, managed, and updated. Think of it as an electronic filing system that SQL can query to find exactly what you need.

What is a table and a field in SQL?
A table is a structured set of data organized into rows and columns. Each table represents an entity (like Customers, Orders, etc.) in the database. A field is a single column within a table​simplilearn.com, representing one attribute of the entity. For example, in a Customers table, fields could be customer_id, name, and email. Each row (or record) in the table is a set of values for these fields representing one instance (one customer).

What is a primary key in SQL?
A primary key is a column (or combination of columns) that uniquely identifies each record in a table​simplilearn.com. Primary keys have two important properties: unique (no two rows can have the same primary key value) and not null (every row must have a value for the key). For example, a customer_id field often serves as a primary key in a Customers table. Primary keys enforce entity integrity by ensuring each row is distinct and can be referenced from other tables.

What is a foreign key?
A foreign key is a field in one table that references the primary key of another table​simplilearn.com. Foreign keys establish a relationship between tables, enforcing referential integrity. For instance, an Orders table might have a customer_id field that is a foreign key referencing Customers(customer_id). This ensures that every order is linked to a valid customer, and you cannot have an order with a customer_id that doesn’t exist in the Customers table.

What is the difference between DELETE and TRUNCATE commands?
Both DELETE and TRUNCATE remove data from a table, but they work differently​simplilearn.com:

  • DELETE removes rows one by one and can include a WHERE clause to specify which rows to remove. Each deletion is logged (which can be slower for large numbers of rows) and can be rolled back if inside a transaction.
  • TRUNCATE quickly deletes all rows from a table without logging individual row deletions. It cannot be used with a condition (no WHERE clause). TRUNCATE is faster and uses fewer system resources, but it’s essentially irreversible (in most systems it’s like dropping and re-creating the table). Use DELETE for selective removals and TRUNCATE for fast full-table wipes.

What is a JOIN in SQL, and what are its types?
A JOIN combines rows from two or more tables based on a related column between them​simplilearn.com. Joins allow you to query data across multiple tables as if they were one. The common types of SQL joins are:

  • INNER JOIN – Returns only the records with matching values in both tables (i.e., where the join condition is satisfied in all tables).
  • LEFT JOIN (or Left Outer Join) – Returns all records from the left table and the matching records from the right table. If there’s no match, the result will have NULLs for the right table’s columns​almabetter.com.
  • RIGHT JOIN (Right Outer Join) – Returns all records from the right table and the matching ones from the left. (Right join is less common; you can usually swap table order and use left join).
  • FULL JOIN (Full Outer Join) – Returns all records when there is a match in either table. Non-matching rows from each side will also be included, with NULLs for the missing values from the other table​almabetter.com.
  • CROSS JOIN – Returns the Cartesian product of the two tables (every combination of rows). Typically used rarely, when every pairing is needed or for generating combinations.

For example, an inner join between Orders and Customers on the customer_id will yield only those orders that have a corresponding customer. A left join would list all customers and include their orders if present (customers with no orders would still appear, with order fields NULL).

What do you mean by a NULL value in SQL?
NULL represents the absence of a value in a database field​simplilearn.com. It is not the same as zero or an empty string; NULL literally means “unknown” or “no value.” Any data type can have NULL instead of an actual value. Important things to remember: NULLs propagate through expressions (e.g., 5 + NULL yields NULL), and to check for NULL you must use IS NULL or IS NOT NULL (because = NULL doesn’t work as expected). In queries, you often have to explicitly handle NULLs to avoid them causing unexpected results.

What are aggregate functions? Can you name a few?
Aggregate functions perform a calculation on a set of values and return a single result​simplilearn.com. Common SQL aggregate functions include:

  • COUNT(*) – Counts the number of rows (or use COUNT(column) to count non-NULL values in a column).
  • SUM(column) – Calculates the sum of all values in a numeric column.
  • AVG(column) – Computes the average of numeric values in a column.
  • MIN(column) – Finds the minimum value in a column.
  • MAX(column) – Finds the maximum value in a column.

Aggregates are often used with the GROUP BY clause to summarize data by groups. For example, you might SELECT department, AVG(salary) FROM Employees GROUP BY department; to find the average salary per department.

What is a constraint in SQL? Name a few common constraints.
A constraint is a rule applied to a table’s column(s) that restricts the data values allowed, ensuring data integrity​simplilearn.com. Common SQL constraints include:

  • PRIMARY KEY – Uniquely identifies each row in a table (as discussed, unique + not null).
  • FOREIGN KEY – Ensures referential integrity between two tables by enforcing that a value in one table matches a primary key in another.
  • UNIQUE – Ensures all values in a column (or a set of columns) are distinct (no duplicates), but unlike a primary key, a table can have multiple unique constraints and they can accept a single NULL (depending on the RDBMS).
  • CHECK – Ensures that data in a column meets a specific condition. For example, CHECK (age >= 0) on an age column ensures no negative ages can be inserted.
  • NOT NULL – Ensures a column cannot have NULL values; every row must have a non-null value for that column.

These constraints are defined when creating or altering a table (CREATE TABLE ... (col INT NOT NULL, ... CONSTRAINT fk_example FOREIGN KEY(col) REFERENCES OtherTable(...), ...)). They help maintain accuracy and consistency of the data.

Intermediate SQL Questions

Explain the differences between SQL and NoSQL databases.
SQL databases (relational databases) use structured tables and predefined schemas, making them ideal for structured data and ensuring consistency (ACID compliance)​simplilearn.com. NoSQL databases are non-relational and handle unstructured or semi-structured data more flexibly, often using key-value pairs, documents, wide-column stores, or graphs. In short:

  • SQL databases: relational, structured schema (tables with rows/columns), use SQL; great for complex queries and transactions (e.g., MySQL, PostgreSQL).
  • NoSQL databases: schema-less or flexible schema, can scale out horizontally easily; use various data models (document, key-value, etc.), and often sacrifice some consistency for scalability (e.g., MongoDB, Cassandra).

In an interview, you might say SQL databases emphasize ACID transactions and structured relationships, whereas NoSQL emphasizes scalability and flexibility in data modeling.

What is normalization and denormalization in SQL?
Normalization is the process of organizing a database to reduce data redundancy and improve data integrity​simplilearn.com. This typically involves dividing large tables into smaller, related tables and defining foreign keys between them. For example, instead of one big table with duplicate customer info on every order, normalization would separate Customers and Orders and link them by customer ID. Normalization is done in stages (First Normal Form, Second Normal Form, etc.), each with specific rules to eliminate duplicates and ensure dependencies make sense.

Denormalization, conversely, is the deliberate introduction of some redundancy into a database design to optimize read performance​simplilearn.com. In other words, data that would be normalized into separate tables is kept in one table (or duplicated across tables) to reduce the need for JOINs in queries. While this can speed up reads, it requires extra care to keep duplicated data consistent. Interviews might ask when you would denormalize (answer: when read performance or complex reporting is more important than update efficiency, often in data warehousing scenarios).

Describe the difference between WHERE and HAVING in SQL.
WHERE and HAVING are both filters, but they apply at different stages of query processing​simplilearn.com:

  • WHERE is used to filter rows before any grouping or aggregation occurs. It applies to individual rows from the tables. For example, in SELECT * FROM Orders WHERE amount > 100, the WHERE clause filters out rows with amount ≤ 100 before any grouping or aggregation.
  • HAVING is used to filter groups after an aggregation (GROUP BY) has been applied​simplilearn.com. It filters aggregate results. For example, SELECT customer_id, SUM(amount) as total FROM Orders GROUP BY customer_id HAVING total > 1000; will first group orders by customer, then include only those groups whose summed total exceeds 1000.

In summary: use WHERE to restrict rows going into an aggregation, and use HAVING to restrict groups coming out of an aggregation.

What is the difference between UNION and UNION ALL?
UNION and UNION ALL both combine the results of two SELECT queries, but UNION performs an extra step of removing duplicate rows in the combined result​simplilearn.com. UNION ALL simply appends the results (thereby keeping duplicates if any). Because UNION has to check and eliminate duplicates, it is slightly slower than UNION ALL. If you know results are inherently unique or you don’t mind duplicates (or will handle them in application code), UNION ALL can be more efficient. In short: use UNION when you need a set of distinct records, and use UNION ALL when you need all records including any overlaps. (Note: Both require the SELECTs to have matching columns/types.)

What is a view in SQL?
A view is a virtual table defined by a SQL query​simplilearn.com. It does not store data itself (except in materialized views) but presents data from one or more tables through a stored SELECT statement. Views are used to simplify complex queries (you select from the view as if it were a table), to secure data by exposing only certain columns or rows (e.g., a view that hides sensitive data), and to present a consistent interface even if underlying table structures change. For example, one could create CREATE VIEW ActiveCustomers AS SELECT * FROM Customers WHERE active = TRUE;. Querying ActiveCustomers then shows only customers marked active, without writing that filter each time.

What is a stored procedure?
A stored procedure is a precompiled set of one or more SQL statements that is stored on the database server​simplilearn.com. Procedures can accept parameters, perform complex operations (including conditional logic in some SQL dialects), and return results or output parameters. They are used to encapsulate business logic at the database level, improve performance by reducing repeated SQL parsing, and ensure consistent operations. For example, a stored procedure sp_UpdateInventory might take a product ID and quantity and then execute multiple statements to adjust inventory and log the change. You create one with CREATE PROCEDURE and execute (call) it with an EXEC or database-specific call syntax.

What is a trigger in SQL?
A trigger is a special kind of stored program that automatically executes in response to certain events on a table (INSERT, UPDATE, or DELETE)​simplilearn.com. A trigger “fires” whenever the specified event occurs on the table. For instance, you might have a trigger on the Orders table that, after a new order is inserted, automatically inserts a record into an OrderAudit table or updates an Inventory table. Triggers are commonly used for maintaining data consistency, auditing changes, or enforcing complex constraints. Each trigger is associated with a single table and one or more events. (There are also DDL triggers that fire on schema changes​simplilearn.com, but interview questions usually focus on DML triggers for data changes.)

Bonus: Triggers can be BEFORE or AFTER the event, and can even cancel an operation (for BEFORE triggers) by raising errors. Use triggers judiciously; too many can complicate debugging.

What are indexes in SQL and why are they used?
An index is a data structure that improves the speed of data retrieval on a table at the cost of additional storage and write overhead. Think of it like a sorted index in a book – it helps you find information quickly without scanning every page. In SQL, an index on a table is often created on one or more columns. The database then maintains a sorted (typically B-tree) structure of those column values to enable fast lookups​simplilearn.com.

Indexes are essential for optimizing query performance. For example, if you frequently search an Employees table by last_name, an index on last_name will allow the database to find matching rows much faster (logarithmic time) than scanning the whole table. You create an index with CREATE INDEX idx_name ON Employees(last_name);. However, keep in mind that indexes need to be updated when data changes, so there is a trade-off: too many indexes can slow down inserts/updates. A good interview answer will mention this trade-off and possibly the concept of the query optimizer using indexes.

How do you optimize SQL queries?
SQL query optimization involves improving query performance by reducing resource usage and execution time. Some key strategies include​simplilearn.com:

  • Use indexes effectively: Ensure appropriate indexes exist for columns used in JOINs, WHERE filters, and ORDER BY. Avoid scans of large tables when an index can be used.
  • Optimize query structure: Select only the columns you need (avoid SELECT * if not necessary), and avoid unnecessary subqueries or subselects when a JOIN could do. Ensure your JOIN conditions and WHERE clauses are sargable (search arguments that allow index use).
  • Avoid unnecessary complexity: For example, use set-based operations instead of loops (one SQL set-based UPDATE is usually faster than row-by-row updates via a cursor).
  • Analyze the execution plan: See if there are bottlenecks like table scans or large sort operations, and then address them (perhaps by adding an index or rewriting the query).
  • Limit the data: Use WHERE to filter early, and use LIMIT/TOP or equivalent if you only need a subset of results.

In summary, identify what the query is doing that is costly (I/O, CPU) and adjust the approach. An example might be rewriting a subquery as a JOIN or adding an index so that a filter doesn’t result in a full table scan.

How do you prevent SQL injection?
SQL injection is a common security vulnerability where attackers input malicious SQL code through application inputs to manipulate the database. To prevent SQL injection​simplilearn.com:

  • Use parameterized queries (prepared statements) – Don’t concatenate user input directly into SQL strings. Instead, use placeholders (? or $1, etc.) and bind variables. This ensures user input is treated as data, not code​simplilearn.com.
  • Input validation – Validate and sanitize inputs in the application layer (e.g., allow only expected formats like emails, numbers within range, etc.). This reduces the chance of dangerous input reaching your SQL.
  • Least privilege – The database user account used by the application should have only necessary permissions. Even if injection occurs, a limited user won’t be able to DROP tables or such.
  • Stored procedures – In some cases, using stored procedures can encapsulate SQL logic and limit dynamic SQL construction. (Though if procedures build SQL unsafely, they can also be injected—so still use parameters inside them.)

In an interview, a strong answer will focus on prepared statements/parameter binding as the primary defense, since that directly addresses the root cause by separating code from data​simplilearn.com.

Advanced SQL Questions

What is a correlated subquery?
A correlated subquery is a subquery that depends on a column from the outer query for its value​simplilearn.com. It’s evaluated once for each row processed by the outer query. For example:

SELECT e.name
FROM Employees e
WHERE e.salary > (
   SELECT AVG(salary)
   FROM Employees
   WHERE department = e.department
);

Here the subquery (SELECT AVG(salary) ... WHERE department = e.department) is correlated with each employee row e from the outer query. It calculates the average salary for that employee’s department, allowing the outer query to pick employees whose salary is above their department’s average. Correlated subqueries can be powerful but may be less efficient than joins if they execute for every row. They are often used when you need to compare a row to an aggregate of a group it belongs to (as in this example).

Explain ACID properties in SQL.
ACID is an acronym that describes key properties of reliable database transactions​simplilearn.com:

  • Atomicity – Each transaction is “all or nothing.” If one part fails, the entire transaction is rolled back, so the database state remains as before the transaction (no partial writes).
  • Consistency – A transaction brings the database from one valid state to another, maintaining database invariants (constraints, triggers, etc. ensure that constraints are not violated). Essentially, any data written will adhere to all defined rules (e.g., no violation of primary keys, checks).
  • Isolation – Transactions operate as if they are the only one in the system, until they are committed. The intermediate state of a transaction is invisible to other transactions. This prevents issues like dirty reads. (Different isolation levels can allow some degrees of interaction, but generally each transaction’s changes are isolated until completion.)
  • Durability – Once a transaction is committed, its changes are permanent, even in the event of a system crash. The database will have saved the data (often to disk) in such a way that it can recover the committed state.

Together, ACID properties ensure reliability of transactions in database systems​simplilearn.com. In interviews, you might be asked to define ACID or to explain why each property is important (for example, why isolation levels matter for concurrent transactions).

What is a transaction in SQL?
A transaction is a sequence of one or more SQL operations treated as a single logical unit of work​simplilearn.com. A classic example is a bank transfer: deducting money from one account and adding it to another should happen in a transaction—either both changes happen, or neither does. Transactions begin with a BEGIN (explicitly, or implicitly by default for each statement in some DBs), and end with either a COMMIT (to save changes) or a ROLLBACK (to undo changes). Until committed, changes made in a transaction are typically not visible to other transactions. Transactions ensure the ACID properties described above.

In practice, you use transactions to maintain data integrity when multiple related updates must succeed or fail together. For instance, in an interview, explaining how you’d use a transaction for a multi-step business operation (and mentioning error handling with rollback) would demonstrate understanding.

What is SQL injection?
(SQL injection was addressed earlier in prevention; here we define it explicitly.) SQL injection is a code injection technique where an attacker provides malicious SQL input to manipulate a query. For example, supplying ' OR 1=1-- in a login form’s username field might trick an application into executing SELECT * FROM Users WHERE username = '' OR 1=1--' AND password = '...', which returns all users (the -- begins a comment, ignoring the rest of the query)​simplilearn.com. This could bypass authentication or damage data. Essentially, if user input is not handled safely, attackers can inject their own SQL and the database will execute it. This can lead to unauthorized data access (dumping tables), data corruption or deletion, and other security breaches​simplilearn.com. (See the prevention measures in the earlier question about preventing SQL injection.)

How do you use a Common Table Expression (CTE)?
A CTE (Common Table Expression) is a temporary named result set defined by a WITH clause that you can reference within a larger SQL query. It’s like a one-time view for that query. CTEs make complex queries easier to write and read by breaking them into parts. For example:

WITH SalesByProduct AS (
   SELECT product_id, SUM(amount) AS total_sales
   FROM Orders
   GROUP BY product_id
)
SELECT p.product_name, s.total_sales
FROM SalesByProduct s
JOIN Products p ON p.id = s.product_id
ORDER BY s.total_sales DESC;

Here, SalesByProduct is a CTE that computes total sales per product. The main query then joins that CTE with the Products table. CTEs are especially useful for recursive queries or when you need to use the same subquery result multiple times. They improve query modularity and readability​simplilearn.com. Unlike subqueries, CTEs can reference themselves (recursive CTEs) or be referenced multiple times in the subsequent query.

What is a cursor, and how is it used in SQL?
A cursor is a database object that allows row-by-row processing of the result of a query​simplilearn.com. Instead of operating on a complete set of rows at once (the normal SQL way), a cursor fetches one row at a time from a result set. This can be useful when you need to perform operations that SQL doesn’t support in a set-based way (or if you need to integrate with procedural logic). In a typical cursor usage, you: declare the cursor on a SELECT query, open the cursor, fetch rows one by one (in a loop, perhaps), and then close the cursor.

However, cursors are generally slow compared to set operations, because each fetch is like its own operation. They are often used within stored procedures or triggers when set-based operations aren’t feasible and you must process each row sequentially​simplilearn.com. In interviews, you might be asked when to use a cursor – it’s a last resort for when you cannot solve a problem with regular SQL (or perhaps in certain administrative tasks). You should also mention there are alternatives like window functions or correlated subqueries that can often replace the need for a cursor.

What’s the difference between ROW_NUMBER(), RANK(), and DENSE_RANK()?
These are all ranking window functions that assign a rank number to each row based on an ordering, but they handle ties differently​datalemur.com:

  • ROW_NUMBER(): Assigns a unique sequential number to each row in the result set, in the order specified. No ties are considered equal – even if values are the same, each row gets a distinct number (so the order is arbitrary among duplicates). For example, if two employees have the same salary and you order by salary, one will get row_number 1 and the other 2.
  • RANK(): Assigns the same rank to identical values (ties), leaving gaps in the ranking sequence. If two employees tie for highest salary, they both get rank 1, and the next employee gets rank 3 (rank 2 is skipped)​datalemur.com. Think of rank as “1224” style ranking when there’s a tie.
  • DENSE_RANK(): Similar to RANK(), but with no gaps in the sequence for ties. If two employees tie for highest salary, they both get rank 1, and the next employee gets rank 2 (not 3). It “densely” packs the rankings​datalemur.com.

In summary, ROW_NUMBER always increments by 1, RANK leaves gaps after ties, DENSE_RANK does not leave gaps. An interviewer may give you a scenario and ask which one to use. For instance, “If two people are tied for 1st place, what rank does the next person get?” If the answer is 2nd place, that implies dense_rank; if it’s 3rd place (because two people occupied ranks 1 and 2), that’s rank.

What is the difference between a clustered index and a non-clustered index?
A clustered index determines the physical order of data in a table (the table is ordered by this index), whereas a non-clustered index is a separate structure that references the table data without affecting its physical order​simplilearn.com. Key differences:

  • Each table can have only one clustered index (usually on the primary key), because you can’t order the table in more than one way. The clustered index stores the actual data rows in the index order. This makes retrieval of a range of rows (between some values) very fast.
  • Tables can have multiple non-clustered indexes, each providing an alternate sorting for lookup. A non-clustered index stores a copy of indexed columns and a pointer (like a reference) to the actual data row in the clustered index or heap. It’s like a separate lookup table that points to where the data is.
  • Performance implications: A clustered index is great for range queries and is generally faster to read when you are using that index, but updating a clustered index (inserting a new row, or changing a clustered key) can be a bit slower because it may need to reorder rows on disk. Non-clustered indexes make reads faster for specific queries (e.g., looking up by a non-primary-key column), but each non-clustered index must be updated when data changes. Also, if a query needs columns not covered by the non-clustered index, the database may perform an extra lookup (key lookup) to get the remaining data.

In an interview, you might also mention that on a clustered index, the leaf nodes of the index contain the actual data pages​simplilearn.com, whereas in a non-clustered index, the leaf nodes contain the indexed column value and a row locator (RID or clustered key) to find the actual row. This topic often comes up for roles focusing on database optimization or administration.