Best Practices for Returning Large Data Sets and How to Focus Your REST API Queries if Returned Data Too Large

Aternity provides support for REST API queries that contain very large data sets. You can use Aternity REST API with many BI and other tools, such as Tableau.

There are some APIs that have a limit on the maximum number of rows they can return and some APIs that can return an unlimited number of rows (See here what APIs have a maximum number of rows limit). Returning large data sets causes long response times.

The REST API server processes quickly up to 25,000 entries if all the fields in the API are queried. For APIs that support unlimited number of rows processing more than 25,000 entries becomes slow and takes some time to complete, and for APIs that have a maximum number of rows limit it is impossible to retrieve more than 25,000 rows when querying all the fields in the API.

When querying a subset of API fields by using $select, the REST API server can quickly process up to 1,000,000 rows. For APIs that support unlimited number of rows, processing more than 1,000,000 entries when querying a subset of the API fields becomes slow and takes some time to complete. For APIs that have a maximum number of rows limit it is impossible to retrieve more than 1,000,000 rows when querying a subset of the API fields.

You can always narrow the focus of your query by using $filter to return fewer results and retrieve them faster. You can also use $top to retrieve just a sample of the results and retrieve them faster.
Important

$force_Query_Result function is deprecated. The server will ignore it if it is used in a query.

The REST API automatically splits large results into pages. Browsers return only 100 entries per page by default. Excel or PowerBI return up to 10,000 entries per page by default. Each page includes a link to the next page. Programmers can use link near the top of each page: <a:link rel="next" href="next_url"/>.

See the below table for recommendations and best practices.

To... Do this...

To return the first N entries

Add $top=N to the URL to filter the returned data and to return only the first N entries.

($top is a new parameter that you should use instead of a deprecated $force_Query_Result.)

To return up to 25,000 rows quickly

REST API server returns data QUICKLY in the following cases:
  • Database includes maximum of 25,000 rows

  • Database includes maximum of 1 million rows and your query contains $select=

To return up to 25,000 rows quickly, you are NOT required to add $select= to the URL.

Queries for REST APIs that have a maximum number of rows limit can return up to 25,000 rows without adding $select= to the URL. When the amount of data in the database is larger than 25,000 rows, you will receive an error message.

If you are receiving an error like Returned data is too large: Use $select= to increase the record limit or use $top=N to return a sample of the first N rows.

You can also optionally narrow the timeframe of the query, or return only rows where a column has a specific value.

Queries without $select= for REST APIs that can return unlimited number of rows return up to 25,000 rows very fast. Larger data sets have longer response time. There is no limit to amount of data that those APIs can return: the bigger query, the longer response time.

To increase the maximum number of returned rows UP TO 1 million for APIs that have a maximum row limit

Add $select= to the URL to return the data for UP TO 1 million rows.

Use $top=N to return a sample of the first N rows.

If the query already includes $select=, but you are still receiving an error like Returned data is too large: Remove some columns from the $select= function, or add another $filter= function to narrow your query and to avoid receiving an error like Returned data is too large.

You can also optionally narrow the timeframe of the query, or return only rows where a column has a specific value.

For APIs that do not support aggregation, $select= will return the selected columns without aggregation.

For APIs that support aggregation, the returned rows might be aggregated by default (this option is the default for most APIs that support aggregation). If the API does not aggregate by default, then $select will not aggregate, unless you explicitly set $aggregate.

Note

All REST APIs return quite fast UP TO 1 million rows if queries contain $select=.

To return an unlimited number of rows SLOWLY (not applicable for APIs that have a maximum number of rows limit)

To return an unlimited data, you are NOT required to force results by adding $select= to the URL.

Some REST APIs can return an unlimited number of rows (see the table): the bigger query, the longer response time. Processing more than 25,000 rows becomes slow and takes time to complete.

REST API server returns data SLOW in the following cases:
  • Database contains more than 25,000 rows and your query does NOT contain $select=

REST API server returns data QUICKLY in the following cases:
  • Database includes maximum of 25,000 rows and the query does not contain $select

  • Database includes maximum 1 million rows and the query contains $select

For the large data sets, do NOT use $orderby= or $top= functions. If the query contains any of them, you will receive an error message like Returned data is too large. Remove the functions that limit the amount of returned results and bring all available data.

When adding $select to the API query that supports aggregation and the number of results exceeds 1,000,000 rows, the internal mechanism performs several database queries to retrieve the complete data set, and therefore, aggregation is performed per a query. As a result, the data set might contain more than one entry for each combination of aggregated dimensions. To normalize such data set, a further aggregation has to be performed on the client side. To avoid multiple results, use $top with a very large limit (for example, $top=10,000,000). You will get the normalized aggregated result if the data set has less than 1,000,000 rows or you will get an error like Returned data is too large if the data set has more than 1,000,000 rows.

You can also reduce the number of fields with $select and change the aggregation that may result in less rows; use $filter to narrow the focus of your query, or use $top to return a sample of the first N records.

To view only specific columns

Add $select= to the URL to make your query more efficient, like ...API_NAME?$select=COL1,COL2,COL3. This has the additional benefit of summarizing (aggregating) the data of your selected columns.

Note that the amount of returned rows is data-sensitive, and depends on the selected columns. Different data is being aggregated differently, so having different columns in a query will cause different aggregations. Thus, depending on your selected columns, the actual number of returned rows for the same API might be different.

To change the default page size

Insert $page_size into your query. For example, to increase page size in browsers from 100 (default) to 500 entries per page, enter $page_size=500.

Similarly, you can reduce the page size in Excel from 10,000 entries (default) to just 500 entries per page with the same $page_size=500.

You can theoretically go up to 10,000 entries per page, but we recommend to use modest page sizes in browsers, as larger pages can cause browsers to crash as they attempt to format the very large XML response.

Note

For REST APIs that support unlimited number of rows, use $page_size function in a query without specifying $stream_id. If a query contains both $page_size and $stream_id, the $page_size function has no effect on a query and the server defines the page size and ignores the $page_size function.

To display the number of entries (<m:count>)

Add $count=true to the URL (which slightly differs from the standard OData implementation).

To view only specific rows when a condition is true

Use $filter to insert conditions that narrow down the data, to return only entries where those conditions are true.

Create conditions with operators: and, or, eq (equals) gt (greater than), ge (greater than or equal), lt (less than), le (less than or equal), ge (greater than or equal to), ne (not equal to), le (less than or equal to), not and contains. Use operators with parentheses to group conditions logically: .../API_NAME?$filter=(COLUMN1 eq 'value1' or COL2 neq 'val2') and (COL3 gt number) and not (COL4 eq 'val4' or contains(COL5,'val5'))

To limit the timeframe of your query

Add $filter=relative_time() like, .../API_NAME?$filter=relative_time(last_24_hours). Possible values are:

  • last_x_hours (like last_6_hours or last_24_hours): Displays data starting from exactly this time several hours ago, displaying it summarized (aggregated) according to the timeframe view you selected.

  • today: Displays data starting from 00:00AM today according to the time zone of the account.

  • yesterday: Displays data starting from 00:00AM yesterday according to the time zone of the account.

  • last_x_days (like last_7_days): Displays data starting from 00:00AM several days ago in the time zone of the Aternity Management Server, displaying it summarized (aggregated) according to the timeframe view you selected.

  • this_week: Displays data for this week, beginning from 00:00 AM on Sunday morning according to the time zone of the Aternity Management Server, until the current time.

  • last_week: Displays data from 00:00AM on Sunday morning to 23:59 on the most recent Saturday night, displaying it summarized (aggregated) according to the timeframe view you selected.. Use this view to compare consistent weekly results.

Alternatively, you can limit the scope of a query to the period between two static times, by creating a filter of a timeframe greater than or equal to (ge) the start time and less than or equal to (le) the end time.

For example: $filter=((TIMEFRAME gt 2018-06-09T16:00:00+01:00 and TIMEFRAME lt 2018-06-11T18:00:00+01:00) or (TIMEFRAME gt 2018-06-13T16:00:00+01:00 and TIMEFRAME lt 2018-06-15T18:00:00+01:00))

To build URL

Build your URL incrementally, adding elements one at a time, then testing it, to locate and fix any syntax errors or typos.

Test your URL

Test your URLs in a browser for faster results. Then, use your OData application to view the data in an easier format.

Handle special characters

Special characters must be encoded in a percent-encoding format, meaning as a character triplet, consisting of the percent character "%" followed by the two hexadecimal digits. For example, %20 is the percent-encoding for the space character, and %27 for the single-apostrophe character. That is the basic URL encoding (https://tools.ietf.org/html/rfc3986#section-2.1) that applies to REST API in general. If you use a browser, Excel or PowerBI, they will automatically encode any special character for you. But if you copy examples from the articles, make sure to manually add the encoding for spaces and apostrophes. Otherwise, your query might fail.

Learn more about the OData parameters mentioned in this article by visiting Microsoft's OData documentation.