Table of Contents
Introduction
1. Ensuring Integrity.
2. Comments.
3. Naming convention.
4. Split procedure into package.
5. Optimizing individual queries of the code.
6. Putting table/column comments.
7. Planning indexes.
8. Scalability and multi-user access.
9. Avoid redundant storage.
10. Code formatting.
11. Exception handling.
12. Parameterize for future use.
13. Reducing hard disk utilization.
14. Mentioning column names explicitly.
15. Recovery.
16. Single query.
17. Few additional practices listed by Sandeep Jain.
18. Selected small noteworthy miscellaneous points.
Conclusion
Introduction
The question is not ‘What are good PL/SQL Programming Practices’, the question is ‘Why do we need good PL/SQL Programming Practices’, and the answer is ‘Why not?’ You hire a resource, train him/her on PL/SQL, in few years he/she masters the art of writing procedures/functions that precisely perform the act that was instructed to him/her. But when you look at the code penned down by him/her, you get a feeling akin to eating one of your most relished dishes at a restaurant not too known for its quality – the vegetables are there, the condiments are there, simmered well, display not too bad, and you start wondering. The presentation could have had a few more touches, extra spices could have been added, the way the vegetables were cut could have been done in a better manner, and overall the lasting impression in your mouth could have certainly been better.
I have written this article based on my personal experiences. Whichever PL/SQL books I went through, they did teach me how to code, what they didn’t teach was how to code effectively, how to make your code look good.
You might have observed that I enclosed the first word of the article name in quotation marks, reason being that the definition of “good” is relative; the context I am referring here is that your PL/SQL code should be such that other developers should find it easy to comprehend, the users should be comfortable making changes into it, should be stable and recover easily in case of a crash, the desired functionalities should be performed in least amount of time possible with minimal usage of system resources, performance should remain consistent despite increase in data and users, and finally it should run so smoothly that the end user shouldn’t get an iota of doubt that the program could ever pose problems.
There must be a lot of similar articles on PL/SQL on the internet, I am not aware of them, nor did I go through any of them before writing this article. The intent behind this was that look, this is what I know, and I feel I should share it with you. Any time I can copy-paste information from other sources, but that is not my forte. I am not concerned about providing more/less information to you, the reader, I am concerned with sharing the knowledge that I believe in.
The target audience for this article includes
- PL/SQL developers not only for Oracle but other RDBMS languages as well as there are quite a few common elements across them.
- Team leads for increasing code efficiency.
- Project managers for code reviews.
- Students for educational purpose.
Let’s start with the first point.
1. Ensuring Integrity.
Ok okay, this is the most important thing. No matter whichever best practices you insert for implementation, whatever rules of the thumb do you follow, however well polished and awesome fabricated your code may be, never ever forget to ensure that your program is giving its desired output – performing the operations it was intended for.
We often hear about a big budget sci-fi flick in which the director got so carried away by the technology that he forgot to invest in the script. Make sure that you do not repeat this mistake.
I repeat, ensure integrity. The program should give the desired output.
Done, lets move on to the remaining points.
2. Comments.
Now now, this has to be perhaps the thing most Asian programmers tend to overlook, whereas the ones in Europe/America invest a significant amount of time in it.
I don’t need to tell what Comments in code are; all of us who have done a bit of programming or even have a basic knowledge about it have an idea over what they are. The only place in your code where you get to write things the way you want, without caring about compilation errors.
The amount of comments in a code is a commonly debated topic, with quite a few purists saying that one-third of your code should be filled with comments. Once I set about to test its validity, and started writing my code with as many comments as I could. The stuff I covered included
- Header specifying author name, date, program purpose, revision history.
- In variable declaration section, use of each variable along side its specification.
- Beginning of the main program body, specifying what was the path being followed along with ASCII art where I felt was necessary.
- Usage of each step, as and when they were happening.
- In case of if-else ladder, mentioning the condition in which the pointer would go and why.
- With each exception being handled, I mentioned the case where this exception could occur and the reason why I was handling it that way.
- Where I felt the program could suffer from performance implications due to lack of indexes in production, I wrote that down, so that in case the issue does arise in future then the onus wouldn’t be on me. If a table deliberately required Full Table Scan (FTS), I mentioned that as well.
- Various optimization techniques used and their purpose.
- The reason in case of joins and selected column values being used.
When my program got over, I did a calculation and noted 29% of my code was filled with comments. Not exactly 1/3rd, but I did have a sense of satisfaction while doing that.
Once when assisting a junior hire in my organization, I was told by my manager to do the code review of the work he had done. My first observation was that he hadn’t written any comments, and when I pointed out this thing to him, instead of doing positive addition he started arguing with me, saying that if there is already a technical document present for it then why did he need to write comments in his code? I didn’t feel justifying him and started wondering when someone else would have to modify his code after few years, he won’t refer the technical document every time and would have a torrid time trying to comprehend the entire code himself.
Also, once when going through someone else’s code, I went through several statements of the type
WHERE status IN ('in','oh');
Now what is the purpose of taking these values? The guy who wrote this code was no longer in the organization, and after much hit-and-trial we did make out what it meant, but it certainly wasn’t smooth as butter. With Database queries, the meaning of column values is an additional challenge compared to other languages, as in most of the other languages such as web based and console based you’d have plenty of technical reference specifying what each command does. With SQL and its derivations, you won’t easily get reference over the purpose of the columns being used, as no one would put their company’s internal matter on the internet.
One last thing, comments do give an impression that you put genuine efforts in writing your code. So if you wrote a program involving innumerable pristine practices sans comments, don’t whine if your manager gave the feedback that this code could have been easily done by someone else.
3. Naming convention.
One interesting difference between India and other countries is that, when you go to a Kindergarten and ask someone “What is your name?“, a lot of time the response from Indian kids is “My name is Amarjeet / Rajinder / Kushaldeep , ghar ka naam (name used at home) Bunty / Tony / Babloo“. That is, a significant number of people have a different official name than the one used at home by their fellows.
While Shakespeare’s much quoted “What’s in a name?” might be debatable, when it comes to writing your program, it does matter to an extent.
Sample this, if you will (you should :P) – A PL/SQL program was storing its three variables as v1, v2, v3 ; with v1 being a string, v2 number, v3 date field. The number of variables is less, fine. After some time they decided to enhance the program by adding 5 more fields. Now, if they followed the same convention and aptly named them v4, v5, v6, v7, v8; how confusing that might become, for you’d be spending less time writing the variable names in your program and more time remembering what number variable was corresponding to what. A clearly defined Naming convention would definitely help, even if this was of use only to the programmer and and not to the end user.
Having a discussion with your team mates and depending upon the requirements, you can fix up a set of analogies as per your convenience. Not only does this make your approach more systematic, it improves code readability as well. Once when coding a set of reports, this was the approach we followed.
| Prefix | Meaning |
| tab_ | Table |
| prc_ | Procedure |
| fnc_ | Function |
| idx_ | Index |
| lv_ | Local variable |
| gv_ | Global variable |
| cur_ | Cursor |
| wcur_ | Variable opening the cursor |
| op_ | Out parameter |
| ip_ | In parameter |
| iop_ | In Out parameter |
Don’t forget, in a subjective exam the examiner prefers giving more marks to the one who has described his/her answer in a better handwriting than the one who has written the same content but isn’t as legible as the former.
4. Split procedure into package.
April 2009 – Yaju Arya was applauding himself for coding a 3000 lines procedure, thinking what an achievement it was to write this much amount of stuff in a short time.
November 2009 – Yaju Arya was cursing himself for coding a 3000 lines procedure, thinking what a disaster it was to write this much amount of stuff in a single location.
September 2005 – Yaju Arya was perhaps sleeping in the class when his teacher was explaining that the length of a C++ procedure / function should not be more than 20 lines.
November 2009 – Yaju Arya realized the same applies to PL/SQL as well, just that the lines are substituted by the operation being performed.
Children (why am I saying that?), please note, do not bloat your procedure / function unnecessarily. Split it up into multiple functions / procedures and group all of them into a package. The reasons for doing that are
i. Code reuse. That is precisely the reason functions were invented, weren’t they?
ii. Error recovery. In case if your package stops in-between, you won’t have to restart it again, you can directly call the area where it had stopped. In case of a large procedure you may run out of luck.
iii. More organized. Splitting into multiple procedures/functions and putting into a package, this looks more organized, doesn’t it?
iv. More legible. The code looks. Refer last line of point number 3.
v. Lesser load on system memory. Loading a large procedure vs. a package that loads smaller procedures as and when needed, obviously lesser load in the latter case, right?
vi. You code seems more High Class when you put it in a package (Page 3 effect, eh? :P).
5. Optimizing individual queries of the code.
Oh yeah. Oh yeah. I love SQL tuning, am literally having butterflies in my stomach at the prospect of mentioning this point. 🙂
As you put SQL queries in your code, make sure that they are thoroughly optimized before you put them into production. You need not run all the queries just to check their run time, in most of the cases the Explain Plan would supply you with the direction you were looking for. As long as you are sure about the output you are getting, everything will do.
It’s a commonly observed scenario that the queries were written when the data was less, and as the data grew in size, the only option left was to re-factor the code. The job of a DBA is to maintain the data as it grows in size, don’t assume that the developer’s job is to write the code and the DBA’s job is to resolve all performance lags. The DBA can do that, but in the process don’t be surprised if he / she does lend out a mouthful of curses to the ones who wrote the original application.
6. Putting table/column comments.
Slightly minor but highly useful thing – not only should you put comments in your PL/SQL code, but also in the tables that you create for storing data. Developers often overlook this aspect; they do put teeny-weeny comments in their SP (stored procedure) but abstain from that process when writing table scripts.
I realized its importance when going through the tables in a billing system used by my client, where they had put comments on almost all tables and columns. While some were indeed helpful, there were quite a few that were generalized instead of being specific to the vendor it was installed. This increased confusion at times, but was definitely better than another application I came across that was having zero table and column comments, making things frustrating for us.
So, moral of the point is, do write table and column comments. Especially if you are having a limited set of values going in a column, then mention that as well as their purpose.
7. Planning indexes.
Remember, as mentioned few paragraphs ago, it is the responsibility of the application developer to develop application according to how data would be accessed, and if the need be he/she has to develop indexes as well. Otherwise the following vicious circle would follow.
Developer wrote code without future planning –> Performance issues –> Code given to DBA –> DBA resolves issue by creating indexes / re-factoring so as to use indexes –> DBA curses Developer –> Developer over heeds and continues writing without planning .
Here is an example. You want few details about Barrack Obama, assuming it is stored in your database (don’t ask me how, maybe Ctrl+C and Ctrl+V from Wikipedia). So you write your query as
SELECT age, location, designation FROM tab_mis WHERE name = 'BARRACK OBAMA';
If no index was present on the field NAME in table TAB_MIS, you’d encounter a FTS (what is that? Refer point 2). Upon increase of data in this table, you know what melodious songs you are going to hear from the DBA in case of implications. So either you plan index from the beginning only, or re-factor the application in such a way that usage of some other index gives the same result, or do some Research & Development to come up with a ground breaking solution that catapults your name into history ( !!).
8. Scalability and multi-user access.
The process of measuring the length of your code with the help of a metric scale ______ called Scalability.
Your options are
i. is
ii. is not
iii. Jhingalala hurr hurr
Those who selected the second option, fill in the next blank.
Designing your application in such a way that they remain accessible to a large number of users without having a significant drop in performance _____ called Scalability.
Your options are
i. is
ii. is not
iii. Jhingalala hurr hurr
Those who selected the first option, congratulations, you have ____ a million dollars. Before I give the options, I am giving away the answer, it is ‘NOT won’. This is my article, not Who Wants To Be A Millionare. 😛
Ok, enough of silver screen, back to Computer screen. When designing your application, take into consideration how many people are going to use it, and how.
I had a reporting application that generated report data into a table, after truncating the existing contents. The downside of this was that two people accessing it at the same time could lead to one operation canceling out the other. We made use of Global Temporary Tables (GTT), so that session wise information would be stored in the tables.
There was another application that couldn’t have multiple instances of the same running at the same time. So what we did was the moment the application started we inserted some data into a table to indicate that the program is running, and upon completion of our program we updated that value. There was a check in our program which used to verify that value in the table, to ensure the program won’t run again in case value was already present there.
In one more case we made use of reference cursors (ref cursors) to return back result to the front end so as to avoid double data insertion into table.
There was a case with our application having increased processing at database end. Upon investigation it turned out that the front end used to execute query fetching all the records from a table and display them via pagination (page 1, 2, 3, etc.). Now, when table data was more, this’d naturally lead to more amount of data getting fetched. Upon doing a bit of analysis, we discovered that the user used to view only first few records, not all of them. So we modified code to fetch only the amount of records as mentioned page wise (1-10 for page 1, 11-20 for page 2, and so on). While this slightly increased time for fetching data, it significantly reduced database processing.
Please note that the definition of Scalability provided above is based on what I learned in college and my personal experience; you can anytime refer the internet for a more technically profound version. 😛
9. Avoid redundant storage.
Confession 1 – I was quite weak at pointers in C++. I just couldn’t make enough sense out of why were they actually needed, and even with their plusses I discovered more negatives with their implementation.
Confession 2 – When I started programming in PL/SQL, I was elated at the fact I won’t have to touch pointers. Yipee yipee yay yay.
Confession 3 – When I see pointers being used in databases, I realize they are actually a good thing, when implemented properly.
No pointers in PL/SQL, but pointers in a database, huh? Ok, one hint – foreign key constraints.
When you put a FK (foreign key) constraint on a column, you are telling your database to refer the value of that column from another column. Simply put, point the value of that column to another column, boss. Hence storage saved.
When you create Bitmap Indexes in Oracle, they store pointers corresponding to that value. While this is good in a read only database, when it comes to transactional DB with updates happening frequently, the performance goes for a toss, boss. But, storage is saved where it matters.
In Oracle, when you declare your data type as varchar(100) (100 is an arbitrary value), it allocates the entire 100 bytes storage space for that variable, when data of 25 bytes (arbitrary value) is stored then the remaining 75 bytes remain reserved. But when you use varchar2(100) and store 25 bytes of data in it, then only 25 bytes are used and the rest get freed. This is a very common thing; I don’t think anyone uses plain varchar now.
One more use of converting your reusable code into functions, it saves space. I need not explain that, IMO (in my opinion).
In Oracle, you can always perform data compression to reduce storage space. Plus you can make use of Compressed Key Indices, in which you combine most often-appearing values of a / set of columns in a composite index to form a Compressed Key one, decreasing execution time and reducing space used by normal composite indexes.
10. Code formatting.
Point 3, last line. Its all about cleanliness, dude. Oh wait, the word is archaic. Rewind. It’s all about cleanliness, mate (This is what happens when you have Australian friends).
Now put yourself into the shoes of a Code reviewer (don’t bother about the size not suiting your feet) and evaluate which one is better.
i. select name,age,address from employee where emp_id='100';
ii. SELECT name, age, address FROM employee WHERE emp_id = '100';
If you rewarded the second one more handsomely, you have got the point I am trying to convey. And if your preference is the first one, then go get a life mate. 😛
Code formatting doesn’t mean formatting your hard disk containing the code (that is called disk formatting BTW), it means re-arranging the tit-bits of your code to make it look more presentable and easier to go through. It also makes your code appear more high class. 😛
Anyway, this is a must, whatever code you write ensure that it is properly formatted. All good IDE’s have the option of code formatting, discover what that option is and make use of it. And if you use console based tools that don’t have this option, then use whatever comes to your mind – tabs, extra spaces, etc. but do format the code.
11. Exception handling.
While designing your code, you should be very clear about what all areas are there in which your code can fail. I am illustrating few cases of how I did that.
i. In a section of my procedure, I was calculating average monthly payment of a customer over the last 6 months. When the customer hadn’t made any payment in the last 6 months, a divide-by-zero error was bound to occur, hence we handled that.
ii. In another section of the same procedure, I was checking the date when the customer had activated a particular service. There could be case that the customer wouldn’t have ever activated that service, hence handled no-data-found.
iii. While inserting records in a table that had a composite primary key consisting of employee ID and date (without time component). By chance had the procedure run twice in the same day, unique-constraint-violation would have occurred. Hence handled.
iv. There was a case where we prevented occurrence of an exception. Our procedure used to receive information from another source and insert data into our table. Now, the ones who coded the front end didn’t have check for length of fields, and we had a field having maximum length of 200, whose length if exceeded would have given out-of-bounds exception. So either we requested the other party to put a check at their end or we increased the column length of our table. But as our table was being referred by few other external sources, we weren’t sure of the impact it could have on them. So before inserting, we truncated the length of the text field using substring function limiting it to 200.
v. Null pointer exception. This doesn’t exist in PL/SQL, which makes me feel blessed of having escaped the wrath of C++ pointers. Hoo hoo hee hee haa haa.
A common mistake made by PL/SQL developers is that they handle all the exceptions in the same manner. Oracle developers refer ‘WHEN OTHERS’ and handle all of them there. Please, don’t do that. It’s a lazy way of doing your stuff. If you are storing your exceptions in a table, then you’d see a lot of repeated cases after some time, and realize you should have handled those exceptions individually. This’d save some storage data as well.
Also, when you have a query block where you know exception could occur, put that in a begin-end loop. In case an exception occurs in one part of the query then it won’t affect rest of the parts in the same code fragment.
12. Parameterize for future use.
There are quite a few real world experiences, which when we encounter, we remain shocked over how could they actually exist, and worse when we try narrating the same to others nor are we able to aptly explain the reason for our state of shock, nor do others believe that such a thing could exist in real world.
Similar real world experience happened to me when I went through code snippets developed by my peers and was in a state of disbelief that how could they write such a code by directly hard coding some of the values?
Look at this where clause condition for picking records on which action was performed in the last one week.
WHERE date_of_action > SYSDATE - 7;
Now when our client demanded change in the program, to pick up records in the last two weeks, we were left with no option but to check out the code, modify the hard coded value, compile, check in, deploy.
What I did was, created a table USER_PARAMETERS, having fields PARAM_NAME and PARAM_VALUE. I inserted a value for (PARAM_NAME, PARAM_VALUE) as (‘DATE_OF_AC’, ’14’) and modified my code as
WHERE date_of_action > SYSDATE - (SELECT TO_NUMBER(param_value) FROM user_parameters WHERE PARAM_NAME = 'DATE_OF_AC');
After that, I granted select and update rights on this table to my user, so that if in future the value has to be modified then he/she would be able to do it himself/herself without assigning for a change request.
On similar lines, I saw a piece of code in which all queries were stored in a table instead of procedure. What was happening in the procedure was, those queries were being picked from the table and executed using EXECUTE IMMEDIATE.
13. Reducing hard disk utilization.
I remember the year when we were first taught about Computers. One of the first lessons emphasized there were two types of memory – Primary Memory and Secondary Memory. Every guy having knowledge of computers should know that. And if he/she doesn’t, then come on learn it mate, what’s the harm in doing that? 😛
Without going into details, we know it is faster to access data from Primary Memory rather than Secondary Memory. Also we hear more about hard disk crashes than RAM crash (at least this is what I have observed). Once when providing assistance to a team having a program suffering from performance implications, the DBA’s analysis stated that the CPU utilization was hardly 3% but HD utilization often touched 100%, because of which several other processes accessing the same data had to wait for the system to grant it enough resources.
Now, how to reduce HD utilization?
i. Avoid FTS. This operation scans the entire contents of your table, increasing load on the resources unnecessarily. If usage of an index or performing parallel process satisfies, go for it. Not that I used the word “if”, as index can at times increase load when more than 20% of the rows are being fetched, and unless you are using something like an Index Clustered Table, then your first row would be fetched from one location, second from some other, third somewhere else, and so on.
ii. Wherever possible, reduce the amount of accesses to the table. One advantage of cursors is that they fetch information, store them in primary memory, and for each reference don’t look up to the table for information. That’s the reason why using cursors turns out to be faster in quite a few cases.
There are a lot of other ways around as well, of which I am not aware yet, and if god’s grace be, would learn them with the passage of time.
14. Mentioning column names explicitly.
INSERT INTO tab_users SELECT * FROM tab_sysinfo;
This is putting entire data from the table tab_sysinfo into tab_users. Somewhere down the line I decided to put an additional column in the tab_sysinfo. What would happen? The procedure containing this query would become invalid.
Don’t believe me? Try it yourself.
The query should be written as
INSERT INTO tab_users (sname, sage, saddress) SELECT name, age, address FROM tab_sysinfo;
This won’t lead to compilation errors upon addition of a column.
15. Recovery.
Your application is running fine. Tic tic tic tic.
Crash. Whaaaaaaaaa.
Run the application again. Now what?
Would the application run from the point it crashed or do the entire things all over again? What if the data was generated but it didn’t get saved? What if the application crashes again?
Discontinuing the chain of endless would be and what if, by now you must have realized what happens if your application stops midway due to one reason or the other.
The process I normally follow in such cases is that, check for data from a table so as to determine if any operations were performed before (data to be inserted into that table by my program only). Depending on that, go to the program area that has to execute the required operations and continue process as normal. This is another use of splitting up your procedure into package containing multiple smaller procedures or a procedure into multiple begin-end loops.
You can make use of autonomous transactions as well, depending upon what the functionality is, to ensure two phased commits.
16. Single query.
Now this is one slightly debatable topic. It is considered a good practice to club multiple queries into a single query as far as possible. I have tried that, while it makes things more systematic and most of the times reduce the computation time, when it comes to large complex queries involving multiple joins, it can lead your application to crash throwing Rollback Segment (RBS) Too Small error.
While the RBS error can be rectified by asking your DBA to increase the size of RBS, after a while he/she would stop doing that and ask you to change the program logic. This is right if you look at the situation from DBA’s perspective.
In this case, you can try removing involvement of multiple queries. This may work, may not work as well. Query may be computed in less time, might take more time. Always keep your options open and reveal the right card at the right time.
17. Few additional practices listed by Sandeep Jain
My manager, Mr. Sandeep Jain, enlisted a further set of PL/SQL programming practices after I showed him my article. In order to highlight his contribution, I decided to add a separate section for that.
The points enlisted were
i. Module name in queries
Ok, you read quite at length about comments, here comes another use of them. The DBA has an option of viewing what all queries are running in your database at a time and which are the ones that are taking more time than others. What becomes difficult to know is the source of these queries.
That is, you’d face a hard time knowing about which program ran which query. Especially when you have multiple programs running on the same source
An ecstatic practice in this case is to mention the module name in your query; this’d make it easier for the DBA team to identify the source.
e.g.
SELECT /* User CUSER Type REPORT Name prc_custrep*/
a.customer, b.id, b.class, c.entdate
FROM
--rest of the query here
ii. Explain Plan as artifact
We talked about the importance of checking the explain plan of your queries before implementing them in production, now make it a practice to store the explain plan as an artifact as well.
Purpose of this is, implementing your code in production after checking its performance in test environment is like going on a blind date with someone you’ve interacted with quite a few times before on the internet / phone, but you are unsure over whether your partner would turn out the way he/she had projected on the medium of your communication. Similarly when your code goes into production, it may not perform the way it was intended to be, or after some time the DBA team may modify some configurations on the database leading to adverse effects, like changing system parameters or dropping indexes. In this case if they come back at you then you’d have a proof that you had given thoughtful implementation to your code before implementing it.
iii. Check-in check-out
One common practice (or shall I say malpractice) I have observed among database developers is that, they don’t maintain their code using a Revision Control Software such as IBM Rational ClearCase, Microsoft Visual SourceSafe, etc. unlike their counterparts who develop applications for the web as well as Desktop. They feel more comfortable launching their IDE’s such as Quest Toad / SQL Developer, opening up the relevant procedure / function to be modified, make changes, compile, deploy. Ok, fine.
And when two people open the same code around the same time, whichever saves his / her changes at a later time (assuming the statement CREATE OR REPLACE is there in their procedure) would have his / her version as the final one. This can lead to conflict. Also if one wishes to go back to a previous version of code for some reasons, then until unless you have manually saved it elsewhere, the clock won’t move in anti-clockwise direction for you.
It may appear a tedious process, to first check in your code, then make changes, then check out; but helps preventing a lot of potential risks that can take place.
Note, if in your work arena there is a lack of Source Control practices and it isn’t possible to implement them due to reasons like budget and licensing, then you must remain ever prepared for the case that your code may get lost. It once happened with me in a similar situation, when we were deploying code directly on the database and then one day we discovered that the system was no longer available – the team responsible for maintaining servers had shut down the server without informing us. After a lot of fight we did get our code back, but realized the importance of taking timely backups. Following this practice helped us prevent the prevalence of a feeling of déjà vu in case of data refresh from another source (in our case data was refreshed from production).
iv. Separate schema for each developer
This is something that might appear debatable but should be followed if given a choice – each developer should get a separate schema for himself / herself to work upon the code.
While working with shared schemas, I have often encountered the following issues – someone changed the password without informing the entire team, the account got locked, fear that someone could erase our efforts of hard work (especially when no Revision Control Software is implemented), not being able to login due to maximum sessions limit getting exceeded, not being able to save data due to lack of space as someone else loaded humongous amount of data in the schema.
The major advantage with having an individual user is peace of mind, perhaps that’s the reason why one must implement separate schema for each developer.
v. Non-production data masking
There was a time, from 1998 to 2005, when we used to hear a lot of cases involving data theft pertaining to credit card numbers, bank accounts, passwords, etc. while these cases still happen, their occurrence has decreased now, one prime reason being due to the implementation of Non-production data masking across organizations.
The term implies putting sort of a mask on the data available in non-production (e.g. test) environment so that the developer doesn’t get to see the actual data stored in tables. Various tools such as IBM Optim and dataguise apply masking techniques such as putting dummy values / populating info using some logic / shuffling data values so that the fellows working on code referring that data in tables won’t get to see the actual figures.
I once read a whitepaper at dataguise web site indicating the benefits of Non-production data masking, won’t be quoting the exact figures but would say that this has definitely helped reduce data theft.
vi. Logging time of processes
When you have programs such as MIS Reports and Bulk processing procedures, their time of execution can be quite large (at times resulting in couple of hours), the most common reason being attributed to the incremental growth of data. However, there could be an additional reason as well, which may not appear straightaway unless some concrete proof is produced.
Once when providing assistance to a team having performance issue in one of their reports, my first analysis was that the issue was more to do with configurations at database level rather than the application level. How else would you explain that the report which used to take half to one hour for execution suddenly started taking more than four hours?
The team having issue in their application didn’t give us the times, nor was achieve data provided by the DBA team. We did that analysis on our own. How, let me tell you.
Their report used to maintain a table that can be classified as a process log, containing info over when a particular step in a procedure was executed, having one field for process name and another for the time when that happened. The reporting procedure had 5 steps – start, populate customer data, perform computations, update some set of existing values, end. With each of these steps data was inserted into the log table. We took data for that process consisting of start time and end time and provided it to a manager who is like a semi-god in Microsoft Excel (I haven’t seen anyone being so good at the software the way he is). Within minutes he made some amazing graphs showing steep increase in execution of the procedure in recent times along with the deviation. One interesting thing we observed was that the process time used to vary a lot; within the same week it once took 4:15 hours, 6:10 hours, 2:35 hours, 7:15 hours, 3:45 hours. Observing similar patterns inferred that the time wasn’t consistent, even though the amount of data remained pretty much the same. Providing roller-coaster analysis of this data to the DBA team further cemented what we had been trying to convey, and ultimately the issue was resolved by implementing modifications at database level.
During the process, we tried more than 15 permutations and combinations for the same program but the efforts went in vein. Before doing these we had informed the other teams about our initial analysis (issue at DB level) but they didn’t believe what we were saying and insisted on going for program modifications. Later on it was our detailed analysis using data stored in a table that helped us in getting what we had intended for.
vii. Usage of standard pre-defined templates
For maintaining consistency among the different segments of your code such as procedures, triggers, functions, etc. you can define a template citing the path to be followed in terms of list items such as variable declaration, exceptions, comments (like revision history, code purpose, etc.). The advantage of this is that once you have a set of best practices in place, a lot of your proceeding work is akin to filling in the blanks of your desired functionality. This also prevents pinpointing of faults by your code reviewer listing out the commonly missed areas such as improper exception handling and comments.
18. Selected small noteworthy miscellaneous points
i. Back up plan
Wherever possible prepare a small technical document having back up plan of the things to do in your absence. A lot of people don’t prepare them, fearing this’d reduce their criticalness by sharing information with others that only they know. But this also increases load on them at times and poses issues in cases when they are unable to attend office due to various reasons.
ii. Hard disk failure
Ensure that hard disk failure of your computer doesn’t affect development process, try maintaining multiple backups of the code as far as possible.
iii. Database migration
Be prepared to recode your program in case your database gets migrated from one vendor to another, like SQL Server to Oracle, MySQL to Informix, etc. Even though vendors provide you all the support and assistance in migration citing that almost all features are compatible, still if you don’t have detailed understanding of the system you are going to work on, you won’t get to know how exactly is your data going to be processed and what issues it may pose.
iv. Database upgrade
I have observed performance issues with applications when database was migrated from a lower version to higher one, which indeed isn’t a cherished scenario, and in many cases you may end up recoding stuff to make use of the latest in-built functions and features. In case that isn’t possible, you may do what a friend of mine did with an application facing performance issues post migration from Oracle 8i to Oracle 10g – in the beginning of the procedure he changed the session variable setting for the optimizer to use 8i features (using alter session), executed the remaining part, then changed the session variable back to use 10g features.
v. Database inventory
Maintain a separate document keeping account of which programs are access which database, at what time (for a scheduled job), what are the major SQL queries involved, etc. Once when assisting a team on performance issue in one of their reports, I observed that it used to take more time in the night (when it was scheduled) as compared to evening, with the difference bring nearly two fold. Reason for this was, during night time a plethora of several other applications accessing the same database used to run as well, thus taking away quite a lot of resources from the report in question, making it spend more time waiting for resources rather than performing actual computation. We requested other application owners to provide details of their programs running on the same database so as to gain a more fine-grained analysis, but except one no one else cooperated in the process, as they themselves didn’t have a log of their programs running on the database along with the scheduled time.
vi. Bad data input
Bad input is the major cause of security issues and application crashes; if you aren’t careful about the data being inserted into your system by the user then he / she can misuse information to quite an extent. Be very cautious with this point. You can put check on the type of characters being inserted, their length, occurrence of reserved words that may create potential issues, etc.
vii. Software lifecycle model
Quite a few projects turn into vaporware as the developers didn’t pay a lot of attention to the Software lifecycle model being implemented in their application. What was deemed as Iterative waterfall should have been Prototype, and the list goes on. This can be covered under feasibility study and risk analysis.
viii. Direct code changes in production
We faced this issue that quite often there is a gap with the code we developed and how it was in production few months after our work. Reason was that the client used to contact the Operations team for all sorts of issues and they used to do that in a jiffy, without maintaining an inventory of the same and informing the development team about it. In case of PL/SQL code and changes related to databases, one can do things on the fly without bringing in resolute mechanisms for source control. These changes can vary from one modified word to a completely new stored procedure.
In this case it is advisable to store all changes made in a shared location so that every one is at the same page.
ix. Too many similar procedures
A colleague of mine primarily works with J2EE. While working on the architecture of an application with an Oracle PL/SQL developer, he suggested a solution more organized and space optimized than the one suggested by the one more proficient in the field. Though I was aware of such an approach, the fact that he could think of such an methodology, despite not being directly involved with databases, made me write mention his profound path.
For performing various functionalities being invoked via a web based front end, the PL/SQL guy wrote a lot of separate procedures for each activity. One noticeable paradigm was that almost all procedures were similar, differentiated by a single query that formed the prime area of execution. Looking at this my J2EE friend suggested creation of a parameterized procedure that’d take a number as input. The procedure would use that number to fetch a query stored in the text field of a table and execute it using EXECUTE IMMEDIATE. What was taking 50 different pieces of stored procedures took only one after the implementation of this solution. This also prevented changes in all 50 procedures depending on the need e.g. formatting code and handling exceptions in a different manner.
Conclusion
Ok guys and girls, I shared what all came to my mind that I felt I should have shared. Hope that my purpose got fulfilled.
But the major purpose of following the conventions stated by me is that you should enjoy what you are doing. If you are following these just for the sake that your manager/lead told you to, then there is perhaps no use of doing that. Every few minutes somewhere in this world a human dies due to lack of food, that is not the case with you, so why not feel blessed about being alive?
-Yaju Arya.
(The author, a Bachelor of Engineering in Computer Science from Manipal University, is currently working as an Application Developer @ IBM India Pvt. Ltd., India)


One thought on ““Good” PL/SQL Programming Practices”