Preparing for an SQL interview can feel daunting, but practice makes perfect! Below is a comprehensive list of SQL interview questions from beginner to advanced, along with concise answers. These cover everything from basic definitions to tricky scenarios. Use this as a study guide to test your SQL knowledge and learn how to explain key concepts. (Pro tip: try answering each question in your own words, and run sample queries in Galaxy’s Playground to solidify your understanding.)
What is SQL?
SQL stands for Structured Query Language, a programming language used for managing and querying relational databasesdatacamp.com. In simple terms, SQL lets you insert, retrieve, update, and delete data stored in tables. It’s the standard language for communicating with Relational Database Management Systems (RDBMS).
What are SQL dialects?
SQL dialects are the various implementations or “flavors” of SQL by different database systemsdatacamp.com. All dialects share core SQL syntax with minor differences or extensions. Examples include MySQL, PostgreSQL, SQL Server (T-SQL), Oracle (PL/SQL), and SQLite. Knowing one SQL dialect makes it easier to learn others, since the fundamentals are similar.
What are the main applications of SQL?
SQL is used to create and modify database structures, and to manipulate and query data. Common applications includedatacamp.com:
What is an SQL statement? Give some examples.
An SQL statement (or command) is a text string that the database engine can execute to perform a specific taskdatacamp.com. For example: SELECT
(to retrieve data), CREATE
(to create a new table or database object), UPDATE
(to modify data), DELETE
(to remove data), etc. Each statement belongs to a category of SQL commands (see next question). A valid SQL statement ends with a semicolon (;
) in many systems.
What types of SQL commands do you know (DDL, DML, etc.)?
SQL commands are commonly grouped into subsetsdatacamp.com:
CREATE
, ALTER
, DROP
table).INSERT
, UPDATE
, DELETE
, and the query command SELECT
).GRANT
or REVOKE
permissions).COMMIT
to save a transaction, ROLLBACK
to undo).SELECT
is categorized separately as DQL for data querying.These categories help describe the purpose of each SQL command. For example, SELECT
is a DQL command to read data, while CREATE TABLE
is a DDL command to define a new table.
Give some examples of common SQL commands in each category.
CREATE TABLE
, ALTER TABLE
, DROP TABLE
, TRUNCATE
(remove all rows from a table). These commands define or modify database objects. For instance, CREATE TABLE Employees (...)
makes a new table, and ALTER TABLE Employees ADD COLUMN Email VARCHAR(255);
adds a new column.INSERT
, UPDATE
, DELETE
. For example, INSERT INTO Employees(name, role) VALUES ('Alice','Engineer');
adds a record. UPDATE Employees SET role='Manager' WHERE name='Alice';
modifies records that meet a condition.GRANT
and REVOKE
(e.g., granting a user read-only access to a table).COMMIT
, ROLLBACK
, SAVEPOINT
. For example, after a series of INSERT
statements, COMMIT;
will permanently save the changes.What is a database?
In simple terms, a database is an organized collection of data, typically stored in many related tables. A relational database arranges data into tables (rows and columns) and uses relationships to connect themdatacamp.com. The data is structured so that information can be easily accessed, managed, and updated. Think of it as an electronic filing system that SQL can query to find exactly what you need.
What is a table and a field in SQL?
A table is a structured set of data organized into rows and columns. Each table represents an entity (like Customers
, Orders
, etc.) in the database. A field is a single column within a tablesimplilearn.com, representing one attribute of the entity. For example, in a Customers
table, fields could be customer_id
, name
, and email
. Each row (or record) in the table is a set of values for these fields representing one instance (one customer).
What is a primary key in SQL?
A primary key is a column (or combination of columns) that uniquely identifies each record in a tablesimplilearn.com. Primary keys have two important properties: unique (no two rows can have the same primary key value) and not null (every row must have a value for the key). For example, a customer_id
field often serves as a primary key in a Customers table. Primary keys enforce entity integrity by ensuring each row is distinct and can be referenced from other tables.
What is a foreign key?
A foreign key is a field in one table that references the primary key of another tablesimplilearn.com. Foreign keys establish a relationship between tables, enforcing referential integrity. For instance, an Orders
table might have a customer_id
field that is a foreign key referencing Customers(customer_id)
. This ensures that every order is linked to a valid customer, and you cannot have an order with a customer_id that doesn’t exist in the Customers table.
What is the difference between DELETE and TRUNCATE commands?
Both DELETE
and TRUNCATE
remove data from a table, but they work differentlysimplilearn.com:
DELETE
removes rows one by one and can include a WHERE
clause to specify which rows to remove. Each deletion is logged (which can be slower for large numbers of rows) and can be rolled back if inside a transaction.TRUNCATE
quickly deletes all rows from a table without logging individual row deletions. It cannot be used with a condition (no WHERE
clause). TRUNCATE
is faster and uses fewer system resources, but it’s essentially irreversible (in most systems it’s like dropping and re-creating the table). Use DELETE
for selective removals and TRUNCATE
for fast full-table wipes.What is a JOIN in SQL, and what are its types?
A JOIN combines rows from two or more tables based on a related column between themsimplilearn.com. Joins allow you to query data across multiple tables as if they were one. The common types of SQL joins are:
For example, an inner join between Orders
and Customers
on the customer_id
will yield only those orders that have a corresponding customer. A left join would list all customers and include their orders if present (customers with no orders would still appear, with order fields NULL).
What do you mean by a NULL value in SQL?
NULL represents the absence of a value in a database fieldsimplilearn.com. It is not the same as zero or an empty string; NULL literally means “unknown” or “no value.” Any data type can have NULL instead of an actual value. Important things to remember: NULLs propagate through expressions (e.g., 5 + NULL
yields NULL), and to check for NULL you must use IS NULL
or IS NOT NULL
(because = NULL
doesn’t work as expected). In queries, you often have to explicitly handle NULLs to avoid them causing unexpected results.
What are aggregate functions? Can you name a few?
Aggregate functions perform a calculation on a set of values and return a single resultsimplilearn.com. Common SQL aggregate functions include:
*
) – Counts the number of rows (or use COUNT(column)
to count non-NULL values in a column).Aggregates are often used with the GROUP BY
clause to summarize data by groups. For example, you might SELECT department, AVG(salary) FROM Employees GROUP BY department;
to find the average salary per department.
What is a constraint in SQL? Name a few common constraints.
A constraint is a rule applied to a table’s column(s) that restricts the data values allowed, ensuring data integritysimplilearn.com. Common SQL constraints include:
CHECK (age >= 0)
on an age
column ensures no negative ages can be inserted.These constraints are defined when creating or altering a table (CREATE TABLE ... (col INT NOT NULL, ... CONSTRAINT fk_example FOREIGN KEY(col) REFERENCES OtherTable(...), ...)
). They help maintain accuracy and consistency of the data.
Explain the differences between SQL and NoSQL databases.
SQL databases (relational databases) use structured tables and predefined schemas, making them ideal for structured data and ensuring consistency (ACID compliance)simplilearn.com. NoSQL databases are non-relational and handle unstructured or semi-structured data more flexibly, often using key-value pairs, documents, wide-column stores, or graphs. In short:
In an interview, you might say SQL databases emphasize ACID transactions and structured relationships, whereas NoSQL emphasizes scalability and flexibility in data modeling.
What is normalization and denormalization in SQL?
Normalization is the process of organizing a database to reduce data redundancy and improve data integritysimplilearn.com. This typically involves dividing large tables into smaller, related tables and defining foreign keys between them. For example, instead of one big table with duplicate customer info on every order, normalization would separate Customers
and Orders
and link them by customer ID. Normalization is done in stages (First Normal Form, Second Normal Form, etc.), each with specific rules to eliminate duplicates and ensure dependencies make sense.
Denormalization, conversely, is the deliberate introduction of some redundancy into a database design to optimize read performancesimplilearn.com. In other words, data that would be normalized into separate tables is kept in one table (or duplicated across tables) to reduce the need for JOINs in queries. While this can speed up reads, it requires extra care to keep duplicated data consistent. Interviews might ask when you would denormalize (answer: when read performance or complex reporting is more important than update efficiency, often in data warehousing scenarios).
Describe the difference between WHERE and HAVING in SQL.WHERE
and HAVING
are both filters, but they apply at different stages of query processingsimplilearn.com:
SELECT * FROM Orders WHERE amount > 100
, the WHERE clause filters out rows with amount ≤ 100 before any grouping or aggregation.SELECT customer_id, SUM(amount) as total FROM Orders GROUP BY customer_id HAVING total > 1000;
will first group orders by customer, then include only those groups whose summed total exceeds 1000.In summary: use WHERE to restrict rows going into an aggregation, and use HAVING to restrict groups coming out of an aggregation.
What is the difference between UNION and UNION ALL?UNION
and UNION ALL
both combine the results of two SELECT queries, but UNION
performs an extra step of removing duplicate rows in the combined resultsimplilearn.com. UNION ALL
simply appends the results (thereby keeping duplicates if any). Because UNION
has to check and eliminate duplicates, it is slightly slower than UNION ALL
. If you know results are inherently unique or you don’t mind duplicates (or will handle them in application code), UNION ALL
can be more efficient. In short: use UNION
when you need a set of distinct records, and use UNION ALL
when you need all records including any overlaps. (Note: Both require the SELECTs to have matching columns/types.)
What is a view in SQL?
A view is a virtual table defined by a SQL querysimplilearn.com. It does not store data itself (except in materialized views) but presents data from one or more tables through a stored SELECT statement. Views are used to simplify complex queries (you select from the view as if it were a table), to secure data by exposing only certain columns or rows (e.g., a view that hides sensitive data), and to present a consistent interface even if underlying table structures change. For example, one could create CREATE VIEW ActiveCustomers AS SELECT * FROM Customers WHERE active = TRUE;
. Querying ActiveCustomers
then shows only customers marked active, without writing that filter each time.
What is a stored procedure?
A stored procedure is a precompiled set of one or more SQL statements that is stored on the database serversimplilearn.com. Procedures can accept parameters, perform complex operations (including conditional logic in some SQL dialects), and return results or output parameters. They are used to encapsulate business logic at the database level, improve performance by reducing repeated SQL parsing, and ensure consistent operations. For example, a stored procedure sp_UpdateInventory
might take a product ID and quantity and then execute multiple statements to adjust inventory and log the change. You create one with CREATE PROCEDURE
and execute (call) it with an EXEC
or database-specific call syntax.
What is a trigger in SQL?
A trigger is a special kind of stored program that automatically executes in response to certain events on a table (INSERT, UPDATE, or DELETE)simplilearn.com. A trigger “fires” whenever the specified event occurs on the table. For instance, you might have a trigger on the Orders
table that, after a new order is inserted, automatically inserts a record into an OrderAudit
table or updates an Inventory
table. Triggers are commonly used for maintaining data consistency, auditing changes, or enforcing complex constraints. Each trigger is associated with a single table and one or more events. (There are also DDL triggers that fire on schema changessimplilearn.com, but interview questions usually focus on DML triggers for data changes.)
Bonus: Triggers can be BEFORE or AFTER the event, and can even cancel an operation (for BEFORE triggers) by raising errors. Use triggers judiciously; too many can complicate debugging.
What are indexes in SQL and why are they used?
An index is a data structure that improves the speed of data retrieval on a table at the cost of additional storage and write overhead. Think of it like a sorted index in a book – it helps you find information quickly without scanning every page. In SQL, an index on a table is often created on one or more columns. The database then maintains a sorted (typically B-tree) structure of those column values to enable fast lookupssimplilearn.com.
Indexes are essential for optimizing query performance. For example, if you frequently search an Employees
table by last_name
, an index on last_name
will allow the database to find matching rows much faster (logarithmic time) than scanning the whole table. You create an index with CREATE INDEX idx_name ON Employees(last_name);
. However, keep in mind that indexes need to be updated when data changes, so there is a trade-off: too many indexes can slow down inserts/updates. A good interview answer will mention this trade-off and possibly the concept of the query optimizer using indexes.
How do you optimize SQL queries?
SQL query optimization involves improving query performance by reducing resource usage and execution time. Some key strategies includesimplilearn.com:
SELECT *
if not necessary), and avoid unnecessary subqueries or subselects when a JOIN could do. Ensure your JOIN conditions and WHERE clauses are sargable (search arguments that allow index use).LIMIT
/TOP
or equivalent if you only need a subset of results.In summary, identify what the query is doing that is costly (I/O, CPU) and adjust the approach. An example might be rewriting a subquery as a JOIN or adding an index so that a filter doesn’t result in a full table scan.
How do you prevent SQL injection?
SQL injection is a common security vulnerability where attackers input malicious SQL code through application inputs to manipulate the database. To prevent SQL injectionsimplilearn.com:
?
or $1
, etc.) and bind variables. This ensures user input is treated as data, not codesimplilearn.com.DROP
tables or such.In an interview, a strong answer will focus on prepared statements/parameter binding as the primary defense, since that directly addresses the root cause by separating code from datasimplilearn.com.
What is a correlated subquery?
A correlated subquery is a subquery that depends on a column from the outer query for its valuesimplilearn.com. It’s evaluated once for each row processed by the outer query. For example:
SELECT e.name
FROM Employees e
WHERE e.salary > (
SELECT AVG(salary)
FROM Employees
WHERE department = e.department
);
Here the subquery (SELECT AVG(salary) ... WHERE department = e.department
) is correlated with each employee row e
from the outer query. It calculates the average salary for that employee’s department, allowing the outer query to pick employees whose salary is above their department’s average. Correlated subqueries can be powerful but may be less efficient than joins if they execute for every row. They are often used when you need to compare a row to an aggregate of a group it belongs to (as in this example).
Explain ACID properties in SQL.
ACID is an acronym that describes key properties of reliable database transactionssimplilearn.com:
Together, ACID properties ensure reliability of transactions in database systemssimplilearn.com. In interviews, you might be asked to define ACID or to explain why each property is important (for example, why isolation levels matter for concurrent transactions).
What is a transaction in SQL?
A transaction is a sequence of one or more SQL operations treated as a single logical unit of worksimplilearn.com. A classic example is a bank transfer: deducting money from one account and adding it to another should happen in a transaction—either both changes happen, or neither does. Transactions begin with a BEGIN
(explicitly, or implicitly by default for each statement in some DBs), and end with either a COMMIT
(to save changes) or a ROLLBACK
(to undo changes). Until committed, changes made in a transaction are typically not visible to other transactions. Transactions ensure the ACID properties described above.
In practice, you use transactions to maintain data integrity when multiple related updates must succeed or fail together. For instance, in an interview, explaining how you’d use a transaction for a multi-step business operation (and mentioning error handling with rollback) would demonstrate understanding.
What is SQL injection?
(SQL injection was addressed earlier in prevention; here we define it explicitly.) SQL injection is a code injection technique where an attacker provides malicious SQL input to manipulate a query. For example, supplying ' OR 1=1--
in a login form’s username field might trick an application into executing SELECT * FROM Users WHERE username = '' OR 1=1--' AND password = '...'
, which returns all users (the --
begins a comment, ignoring the rest of the query)simplilearn.com. This could bypass authentication or damage data. Essentially, if user input is not handled safely, attackers can inject their own SQL and the database will execute it. This can lead to unauthorized data access (dumping tables), data corruption or deletion, and other security breachessimplilearn.com. (See the prevention measures in the earlier question about preventing SQL injection.)
How do you use a Common Table Expression (CTE)?
A CTE (Common Table Expression) is a temporary named result set defined by a WITH
clause that you can reference within a larger SQL query. It’s like a one-time view for that query. CTEs make complex queries easier to write and read by breaking them into parts. For example:
WITH SalesByProduct AS (
SELECT product_id, SUM(amount) AS total_sales
FROM Orders
GROUP BY product_id
)
SELECT p.product_name, s.total_sales
FROM SalesByProduct s
JOIN Products p ON p.id = s.product_id
ORDER BY s.total_sales DESC;
Here, SalesByProduct
is a CTE that computes total sales per product. The main query then joins that CTE with the Products table. CTEs are especially useful for recursive queries or when you need to use the same subquery result multiple times. They improve query modularity and readabilitysimplilearn.com. Unlike subqueries, CTEs can reference themselves (recursive CTEs) or be referenced multiple times in the subsequent query.
What is a cursor, and how is it used in SQL?
A cursor is a database object that allows row-by-row processing of the result of a querysimplilearn.com. Instead of operating on a complete set of rows at once (the normal SQL way), a cursor fetches one row at a time from a result set. This can be useful when you need to perform operations that SQL doesn’t support in a set-based way (or if you need to integrate with procedural logic). In a typical cursor usage, you: declare the cursor on a SELECT query, open the cursor, fetch rows one by one (in a loop, perhaps), and then close the cursor.
However, cursors are generally slow compared to set operations, because each fetch is like its own operation. They are often used within stored procedures or triggers when set-based operations aren’t feasible and you must process each row sequentiallysimplilearn.com. In interviews, you might be asked when to use a cursor – it’s a last resort for when you cannot solve a problem with regular SQL (or perhaps in certain administrative tasks). You should also mention there are alternatives like window functions or correlated subqueries that can often replace the need for a cursor.
What’s the difference between ROW_NUMBER(), RANK(), and DENSE_RANK()?
These are all ranking window functions that assign a rank number to each row based on an ordering, but they handle ties differentlydatalemur.com:
In summary, ROW_NUMBER always increments by 1, RANK leaves gaps after ties, DENSE_RANK does not leave gaps. An interviewer may give you a scenario and ask which one to use. For instance, “If two people are tied for 1st place, what rank does the next person get?” If the answer is 2nd place, that implies dense_rank; if it’s 3rd place (because two people occupied ranks 1 and 2), that’s rank.
What is the difference between a clustered index and a non-clustered index?
A clustered index determines the physical order of data in a table (the table is ordered by this index), whereas a non-clustered index is a separate structure that references the table data without affecting its physical ordersimplilearn.com. Key differences:
In an interview, you might also mention that on a clustered index, the leaf nodes of the index contain the actual data pagessimplilearn.com, whereas in a non-clustered index, the leaf nodes contain the indexed column value and a row locator (RID or clustered key) to find the actual row. This topic often comes up for roles focusing on database optimization or administration.