Calculating Percentage Coverage

I wanted to discuss some confusion of percentage Test Coverage. I have noticed that different organizations calculate test coverage very differently.  This can be very confusing when using contractors, off-shoring and simply comparing best practices. Lets say you have a simple service that returns Name, Phone Number and Address and you asked to create test cases for 100% Test Coverage. What exactly does that mean?

Would a simple unit test entering the following be considered 100% Coverage

  • Bob Smith
  • 555.555.5555
  • 55 street
  • City
  • QC
  • M5M 5L5

Or would you need to break the service down into each of its functions. Name, Phone Number Street, City, Province, Postal Code. Testing each of these independently?

But how many test cases do you need to perform, for you to consider it 100% coverage. Lets take Postal Codes. Would a single Postal code be considered 100% coverage? Or would you need one from each of the 18 starting letters ( Y, X, V, T, S, R etc)? Perhaps you require some random number of say 10 or 100 postal codes? Or do you need to enter every defined Canadian Postal code.  Lets consider testing name function, how long a name does the app need to support, how many names can a person have, what if we include title, what if the persons name has a suffix.

What about negative scenarios? Do you need to test postal code that does not exist, or one in the wrong format before one can consider the test coverage to be 100%? With space after first 3 letters, without space, or with a hyphen. What about letter number letter or what if all letters, number or some other possible combination? How many of these negative scenarios does one need to run to say you covered 100%?

What about testing these functions as they relate to each-other or as this service relates to other services?  Do you need to test that a Postal Code starting with letter V, is not used for a city that resides in Quebec? Do you need to confirm that this address service when used in one chained request, responds the same was as when used in another? So often I hear of companies unit testing services as they developed, but never running a final systems and integration end to end test. What if one service requires that postal code to have a hyphen and the other a space?

Understand if your organization is manually testing a service, entering even 18 postal codes will take significant time directly impacting costs. Entering all positive, negative scenarios including chained services is just not feasible. Does increasing the number of test cases actually effect the percentage coverage, or is a single test case enough? All the possible boundaries for a simple service like postal codes could result in a large number of tests. Does testing  each service once, without considering all the boundaries and negative scenario’s constitute 100% coverage? More importantly perhaps, is when QA testers give a percentage coverage, does it really mean the same thing to the everyone?

I would like to invite everyone to weigh in and share their thoughts on the subject. Please select and option and comment if you will below. So far the majority selected test every function once. So I broke this into boundaries and positive and negatives to see if we can get further clarification.

***Please note The form is submitted privately and is not automatically published. If you wish your response published, use the comment link at the end of any post***

Warning: strpos() expects parameter 1 to be string, array given in /home/content/13/11164213/html/ST3PP/wp-includes/shortcodes.php on line 193
[contact-form-7 404 "Not Found"]

7. SOAPSonar – Baseline and Regression Testing

We all have had experiences were “someone” decided to make a slight “tweak” to some code then promptly forgot to mention it to anyone else or at least the right someone else. That slight tweak causing some change, be it expected or not, that causes other parties to spend hours trying to trace the cause.

One of the key benefits of automation is the ability to identify any changes to XML by doing a XML diff. Comparing one version (Baseline) to that of another, (Regression Test). With web services API, we interested in the Request and Responses to ensure that they are not different, or rather that only expected differences are there. We need the flexibility to ensure that a some of the parameters be ignored, when they expected to be different each time.  Take for instance a file reference number or a service that returns the time. We may want to check to make sure the fields for Hour, Minute, Day, Month, Year etc remain unchanged, but perhaps we wish to accept changes to values in the fields, or limit these to certain parameters. Establishing what to check against the baseline and what not to, is an important part of regression testing.

Here is a Very Simple Baseline and regression test, using the SOAP Calculate Service.

1. Run SOAPSonar (with Admin Rights). Paste

http://www.html2xml.nl/Services/Calculator/Version1/Calculator.asmx?wsdl

into the Capture WSDL bar. Select Capture WSDL.

2. Lets use the Add Service. Select Add_1 and enter a=3 and b=3. Commit, Send. Hopefully your response was 6. If not, perhaps I could suggest some services? Rename it baseline.

2 Project View

3. Now lets select Run View, Drag Baseline into the DefaultGroup. Now select the Icon Generate New Regression Baseline Response Set.

3 Generate

4. Select XML Diff of Entire Baseline Response Document. This option matches both nodes and values.. Select OK, (Commit and Send if you need to)

4 XML diff

5. After the test is run, you will see the Test Suite Regression Baseline Editor. This is were you can select what you wish to watch or ignore. Automatically a base rule is generated. If you select Index 1, you should have 1 rule XPath Match. If you select XPath Match, you should see all the nodes graphically laid out for you. At the bottom you have 3 tabs. Baseline Criteria, Baseline Request (Captured Request), Baseline Response (Captured Response). For now, lets not change anything and just select OK.
5 Baseline editor

6. Lets go back to Project View, and change our b= to 9. The response should now be 12. Commit and send to Check. Then select Run View and change the Success Criteria to Regression Baseline Rules (See cursor). Commit and Send. This time, did your Success Criteria Evaluation Fail? It should, as it was expecting 6 as a response and not 12. Analyse Results in Report View.

6. Run baseline

7. If you now select that failed test case and then select the tab Success Criteria Evaluation, you see that Regression Baseline XML Node and Value Match failed, and it was the AddResult value.

7. Failed

8. Select Generate Report, then [HTML] Baseline Regression XML Diff Report and generate the report. Then View the report. Select Response Diff for Index 1. 1 Change found and you can clearly see it marked in Red.

8 report

9. Now lets ignore the response value, but maintain regression for the rest of the test case. Select Run View, then Edit Current Baseline Settings.

10 ignore

10. You should be back in the Test Suite Regression Baseline Editor. Select Index 1, then your rule and right click on AddResult in the visual tree. Select Exclude Fragment Array. It should show now in Red as excluded. Ok, Commit and Send. Your Regression Test should now pass, as everything but that value is still the same.

9 change

Conclusion

Automation of Regression Testing is far more than running an XML Diff. It involves selecting what aspects are expected to change and what aspects are not. By eliminating expected changes, any failures in future regression tests can receive the focus they deserve. Once automated, this can be run, hourly, daily, weekly or as needed, consuming little to know human interaction. Many of our customers, maintain a baseline and consistent regression test on 3rd party code. Any service their systems rely on, which they are not personally aware of development cycle. Continually testing through automated process to ensure they aware of any changes to the code.

Questions, Thoughts?

6. SOAPSonar – Report View

An important reason for Automation, can be the time saved over generating manual reports. Comparing expected results with actual on a case by case basis and sorting through this date to combine it in a meaningful way. Making sense from pages for XML, in an attempt to filter the few key issues. One of the top 5 issues shared with me is false positives, or QA reporting an issue, that was either incorrectly diagnosed or not considered a issue. Much of this can be mistake in what was entered, but just as often, a mistake in understanding the expected behaviour.

SOAPSonar Test Cycle

So lets take a look at SOAPSonar’s report view, carrying on from the previous Tutorial #5 defining success criteria.

1. Lets start this time with JSON Google maps. In QA Mode, Project View. Please confirm you have the service and the [ADS] and that service works. Switch to Run-View and clear any tests under the DefaultGroup and drag over only the Google Maps test case we did in Tutorial 5.

1 Starting out

2. look at the area to the right, In QA Mode there are 2 tabs. Suite Settings, and Group Settings. If you switch to Performance mode, there are 3 different tabs. In QA mode, lets change the Result File name to Tut6_1.xml and leave the location at C:\Program Files\Crosscheck Networks\SOAPSonar Enterprise 6\log\QA. We have not captured a baseline for regression testing, so select Test Case Success Criteria. Result Logging as Log All results and Verbose and Use Optimized XML logging. Warning, Logging all and verbose logs in large test environments can greatly effect performance and seriously load any workstation. We usually recommend log errors fails and errors. HP Quality Centre and other options, leave unselected.

2. Verbose

3. On the Group Settings tab, lets leave things default. Here you can define a Data Source table to run through multiple tests. Commit and Run Suite.

4. Realtime Run Monitor shows that I ran 6 test cases of which 4 failed and 2 passed. Select Analyse Results in Report View at the top of the page.

4. Realtime

5. In Report View, we can now see on the far left, under Today, I have Tut6_1xml Log file at the location we entered in step 2. Right Clicking on the file allows you to export Request and Response Data as Files and Export Results to Normalized XML. Select Export results in Normalized XML and you have details of each test case run.

5 Log files

6. In the main section you can see the first 2 tests are green and the rest red. Selecting the first test, and the tab Test Iteration Summary and the Request Response Tabs populate with the what was sent and received. Do you notice the exact time for and size of each message is also reported along with teh response code?

6 First Test

7. Select Success Criteria Evaluation, and you can see the Response time and exact Match rule we created, both were a success.

7 Success Criteria

8. If select the 3rd line (first to fail) we see that it used the ADS Index  3rd row) the independent Response time and the response code was 200 (normally a pass). By selecting Success Criteria Evaluation tab, we can see that the exact match success criteria failed. That is, the csv value we were expecting was different from what we received.

8. first failed

9. We can generate a number of PDF reports via the drop down menu at the top of the page. These PDF reports can also be exported in a variety of formats. Take a look at a few.9 reports

10. Running the same test in performance mode and not QA mode, generates an alternate set of reports. The same is true if its is against a baseline.

These results can also be published to HP Quality Center.

Conclusion

Between management level reports and detailed level request and responses, reporting can consume a lot of time. Testers tend to focus on how long it takes to run a manual test vs automate the test cycle, often forgetting about the time taken to generate reports in a manual testing environment and how long it takes to capture and supply the required information with any issues. The ability to supply the test case and log, is key to troubleshooting issues, and the first thing we ask our customers for when they have any technical support questions. That is because automation enables the exact repeat of the same test case, and therefore any issue should be simple to replicate.

Comments?

 

 

 

 

 

 

 

Performance and Load Testing

A second theme of interest that came up repeatedly at STAR Conference last week was Performance and Load testing. Many of those raising the question, had mobile applications or some form of mash-up or worked in Agile environments were performance and functionality were important.

In the SOA or API world, when I refer to Performance, I am referring to a single functional service request to response time taken. The performance of the service as part of the API or web service itself. In the below diagram, it would the time it leaves the client to the time a response is received. The additional API and Identity requests that happen behind the API 1, included. These I refer to as enablers. API 2 has a DB and its identity system, and API 3 is on a Enterprise Services Bus, and has multiple enablers on the bus. Each API may have a number of services associated with it, and each of these may require different enablers, or complete different functions, and so will have different performance characteristics. Granular performance information is therefore important for troubleshooting.

Load Testing, is the performance group of services at a given load. Modelled, using expected behaviour. If function 1 in API 1 is expected to be accessed 5 times that of function 1 of API 2, then the model needs to load Function 1 in API 1 5 x that of function 1 API 2. Load testing can either be throttled to evaluate performance times at a planned TPS or simply increased till errors start occurring, to understand maximum TPS possible.

User experience performance is the perceived performance via a given client. Here we add the performance of a given client to that of the network, API and enabler. User experience does not embrace device / client diversity. Caching, partial screen refreshes, and a variety client tweaks, may hide some perceived performance issues. That said, unless the API performance is know, a poorly performing client can be difficult to identify.

Performance

The most common performance issues that tend to come up, are problems not with the API themselves, but with the enablers. Some back-end database, identity system or ESB that may have some other process running on it at a given time (e.g. backup), has a network issue or requires tuning. Often these issues are due to changes in the environment or only at a given time. A single load or performance test run, a few days before final acceptance, often fails to identify these issues, or the issues occur in production at some later date.

I previously wrote a long multi-part series about performance troubleshooting in mobile API and I have no intent to repeat that. The constant surprise however, when I show a shared test case being used for functional and performance testing, is why  wanted to add some clarification. Usually I get a blank stare during a demo for a few minutes before a sudden understanding.  So many QA testers have being trained to think of different tools and teams for functional and load testing, that the concept of a integrated tool can be difficult to grasp at first, requiring some adjustment in thinking.

After the adjustment occurs, I consistently get the same 2 questions

  1. “Does that mean you can define performance as a function of success criteria?” Yes, each test case for each service in each API, can have a minimum or maximum response time configured in success criteria. Say you set that value as 1 second along with any other criteria for success. If at any time later on that test is run, including load testing, the test case will fail. There is no need  to create new test scripts, data sources, variables etc for load testing in a separate tool. If its a new team, just give them the test case to run.
  2. Does that mean you can do continual testing or regression testing, on production system and identify any changes in functionality AND performance at the same time?  Yes. If the value is set at 1 second response in success criteria, and  you configure a automated regression or functional test every hour/day/week/ whatever. If at any point, performance or functionality changes, the test case would now fail as the response would be different than expected or previously. There is no need to run 2 separate applications to continually test service for changes in functionality and performance.

At this point I usually point out the benefit of physically distributed load agents vs. just virtual users.  The ability to trigger a central test from multiple locations in your network and compare response times, allows not only the simulation of Server, but also the Network. Larger companies often break out network performance turning into another team, and don’t consider it an “application issue”. I believe any performance issues is functionally important. Smaller companies, and senior executives, are however quick to the benefits or consolidating this into a single tool and report.

Conclusion

Regardless of if your performance/load team is a separate group or part of your role, sharing a test case, and actually building performance in to the success criteria in the same tool can offer huge benefits in time savings and in identifying performance issues earlier in development cycle and during maintenance. Why not try it yourself? Here is a two tutorials on Load Testing and Geographically Distributed Load Testing.