[go: up one dir, main page]

DEV Community

Cover image for Is Select * actually expensive
NightBird07
NightBird07

Posted on

Is Select * actually expensive

The impact of using SELECT * in a database query can be influenced by several factors, including the size of the database, the storage orientation (row or column-oriented), the indexing scheme employed, the network bandwidth, and the properties of the data itself (such as whether it involves large strings or binary objects). Therefore, a definitive answer regarding the performance implications of SELECT * is contingent on the specifics of the situation at hand and should be evaluated on a case-by-case basis.

some assumptions to make

1- the database is row oriented meaning it stores the data in tuples and keep it compacted as much as it can, and we can say Postgresql is really good at this.
2- we don’t use blob or large text other wise make a vertical partitioning where you have to keep these in another place don’t use Select * with blobs
3- there is a filter at least one criteria.

How postgreSQL works with Select Clause

When executing a SELECT statement in PostgreSQL, the system will typically start by scanning the table or tables involved in the query to find the relevant tuples. The system will first check if there are any applicable indexes that can be used to speed up the scan. If an index is available and appropriate for the query, PostgreSQL will use an "index scan" to quickly locate the relevant tuples.

If an index is not available or not appropriate for the query, PostgreSQL will instead perform a "sequential scan" (also known as a "sequential table scan" or "seqscan") of the table. During a sequential scan, PostgreSQL reads each page of the table in order, looking for tuples that match the query's WHERE clause.

Once the relevant tuples have been located, PostgreSQL will apply any additional query conditions (such as sorting or grouping) to the results and return the final result set to the client.

It's worth noting that PostgreSQL's query planner and optimizer will typically try to choose the most efficient query plan based on the available indexes and other factors, in order to minimize the overall query time. But even with the best query plan, the performance of a SELECT statement can still be impacted by factors such as table size, available memory, network bandwidth, and other system resources. notice the cost-based optimizing is compared using the cost we talked about earlier but it is missing the index scan cost LOL.

Why Select * is not that much costly

so assuming that filter criteria has an index and we look for that index inside the table, found boom, then look for another three inside the index heap, but we already pulled the table so it is not going to be that scary.

Never use Select *

It is generally not recommended to use SELECT * with column-oriented data storage systems, as this can result in the retrieval of large amounts of unnecessary data. This is because column-oriented storage systems store each column separately, meaning that a SELECT * query will retrieve all columns, even those that are not needed for the current query.

Retrieving unnecessary data can be expensive in terms of both time and system resources, as larger sets of pages need to be fetched to retrieve the additional data. Pages can be expensive because they can contain a large amount of irrelevant data, and only the relevant data can be retrieved after the entire page is fetched.

For example, consider a column-oriented database that stores customer data in separate columns, such as "customer_id", "name", "email", and "address". If a SELECT * query is used to retrieve customer data, all columns will be retrieved, even if only the customer's name and email are needed. This can result in the retrieval of unnecessary data and slower query performance. as we fetch large chunks of pages..

Top comments (0)