XRAY: Troubleshooting Xray DB disk space issue

XRAY: Troubleshooting Xray DB disk space issue

AuthorFullName__c
Jayanth Suresh
articleNumber
000005501
ft:sourceType
Salesforce
FirstPublishedDate
2022-12-14T15:30:03Z
lastModifiedDate
2023-01-22T11:07:25Z
VersionNumber
2

This KB article will help to perform basic/initial troubleshooting of Xray DB disk space issues.
As we know Xray DB is resource intensive and all the data related to the Xray application is stored in the Xray DB server and not in the Xray application server.
So while setting up Xray, please refer to our system requirements wiki page and accordingly set the resources for Xray DB server the Xray DB size keeps increasing based on indexing and scanning of number of artifacts or builds or release bundles size along with the daily delta DB sync.

In some situations, we see the DB disk size keeps on increasing rapidly, and below are the basic/initial troubleshooting that has to be followed.
 

  1. First, we need to check which tables are consuming more disk space with the SQL query, Below is one of the sample SQL from postgres wiki which returns tables in a database with space they use ordered from the ones using most (top 10). 
SELECT
   nspname || '.' || relname AS "relation",
   pg_size_pretty(pg_total_relation_size(C.oid)) AS "total_size" 
FROM
   pg_class C 
   LEFT JOIN
      pg_namespace N 
      ON (N.oid = C.relnamespace) 
WHERE
   nspname NOT IN 
   (
      'pg_catalog',
      'information_schema'
   )
   AND C.relkind <> 'i' 
   AND nspname !~ '^pg_toast' 
ORDER BY
   pg_total_relation_size(C.oid) DESC LIMIT 10;
2. Based on the above SQL we will get to know the tables that are consuming more disk space, there are a few tables like public_components* which are expected to have larger sizes but if you find other/many tables that are consuming more space then we suggest checking the tuples/bloat and purge it. In cases of intensive data updates, both with frequent UPDATE and with INSERT / DELETE operations, PostgreSQL tables and their indices become bloated. Bloat refers to disk space that was allocated by a table or index and is now available for reuse by the database, but has not been reclaimed. Because of this bloat, the performance of the PostgreSQL DB Server is degraded, which can lead to high CPU utilization scenarios.

Under normal PostgreSQL operations, tuples that are deleted or stale because of an update aren’t physically removed from the table— they’re stored there until the VACUUM command is issued. VACUUM releases the space occupied by “dead” tuples. Thus, it’s necessary to perform a VACUUM periodically, especially for tables that change often.

To check information about dead tuples, and when vacuum / autovacuum  was run for each table in the PostgreSQL DB Server for specific DB, connect to the DB and run the following query:
select
   schemaname,
   relname,
   n_tup_ins,
   n_tup_upd,
   n_tup_del,
   n_live_tup,
   n_dead_tup,
   last_vacuum,
   last_autovacuum,
   last_analyze,
   last_autoanalyze 
from
   pg_stat_all_tables 
where
   schemaname = 'public' 
order by
   n_dead_tup desc;

Get a list of tables (from the above SQL query) that were either never analyzed or vacuumed, or that were analyzed a long time ago, or that have had a lot of changes since the last time DB statistics were collected and vacuum run. Tune the autovacuum PostgreSQL process to ensure the more frequently a table or its indexes are getting changes, the more frequently vacuum and analysis will be performed.

To collect DB statistics and to do vacuum (regular, not FULL) on all the objects of some specific database of the PostgreSQL DB instance run the following command:
vacuumdb -h <db_host> -p <db_port> -U <db_user> -j 4 -z -v <db_name>
The above command will run a regular vacuum with analysis in 4 parallel threads.

If the above steps didn't help then please work with the DBA and run Full Vacuum on the Xray DB server which may clear the stale space, and dead tuples and reclaim the space. It is suggested to stop the Xray application while running the Full Vacuum, as It may delay some ongoing index jobs and might impact you if you have any pipelines that trigger an on-demand scanning during a build. Otherwise, it should not have a direct impact on your Artifactory usage given that block download is not enabled.

In case, if none of the above suggestions helps then please reach out to JFrog Support with the above SQL queries output.