Best Practices for Returning Large Data Sets and How to Focus Your REST API Queries if Returned Data Too Large

Aternity provides support for REST API queries that contain very large data sets.

You can use Aternity REST API with many BI and other tools, that do not work well with streaming mechanism, such as Tableau.

Aternity REST APIs are of two types:
  • REST APIs, where a request is made and a response is given (learn more)

  • Streaming-like REST APIs where the server sends information to a client when an update is ready (learn more)

Aternity treats these two types of REST API differently if returned data is too large. Streaming-like REST APIs can automatically return an unlimited number of rows; whereas, other REST APIs can return up to one million rows of data. (See here what APIs support streaming). Returning large data sets causes long response times.

The REST API server processes quickly up to 25,000 entries. Processing more than 25,000 entries becomes slow and takes some time to complete. You can always narrow the focus of your query with $select, $top, or $filter to return results faster.

Important

$force_Query_Result function is deprecated. The server will ignore it if it is used in a query.

The REST API automatically splits large results into pages. Browsers return only 100 entries per page by default. Excel or PowerBI return up to 10,000 entries per page by default. Each page includes a link to the next page. Programmers can use link near the top of each page: <a:link rel="next" href="next_url"/>.

See the below table for recommendations and best practices.

To... Do this...

To return the first N entries

Add $top=N to the URL to filter the returned data and to return only the first N entries.

($top is a new parameter that you should use instead of a deprecated $force_Query_Result.)

To view only specific columns

Add $select= to the URL to make your query more efficient, like ...API_NAME?$select=COL1,COL2,COL3. This has the additional benefit of summarizing (aggregating) the data of your selected columns.

Note that the amount of returned rows is data-sensitive, and depends on the selected columns. Different data is being aggregated differently, so having different columns in a query will cause different aggregations. Thus, depending on your selected columns, the actual number of returned rows for the same API might be different.

(For regular REST APIs that do NOT support streaming)

To change the default page size

Insert $page_size into your query. For example, to increase page size in browsers from 100 (default) to 500 entries per page, enter $page_size=500.

Similarly, you can reduce the page size in Excel from 10,000 entries (default) to just 500 entries per page with the same $page_size=500.

You can theoretically go up to 10,000 entries per page, but we recommend to use modest page sizes in browsers, as larger pages can cause browsers to crash as they attempt to format the very large XML response.

Note

$page_size function has no effect on streaming-like REST APIs. For streaming REST APIs, the server defines the page size and ignores this function.

To return up to 25,000 rows

To return up to 25,000 rows, you are NOT required to add $select= to the URL.

The regular REST API queries without $select= can return up to 25,000 rows. When the amount of data in the database is larger than 25,000 rows, you will receive an error message.

If you are receiving an error like Returned data is too large: Use $select= to increase the record limit or use $top=N to return the exact number of the first N rows.

Note

All REST APIs return quite fast up to 25,000 rows.

The streaming-like REST API queries without $select= return up to 25,000 rows very fast. Larger data sets have longer response time. There is no limit to amount of data that streaming-like REST APIs can return: the bigger query, the longer response time.

(For regular REST APIs that do NOT support streaming)

To increase the number of allowed entries UP TO 1 million

Add $select= to the URL to return the data for UP TO 1 million rows.

You can also optionally narrow the timeframe of the query, or return only rows where a column has a specific value.

If the query already includes $select=, but you are still receiving an error like Returned data is too large: Remove some columns from the $select= function, or add another $filter= function to narrow your query and to avoid receiving an error like Returned data is too large. Use $top=N to return the exact number of the first N rows.

For APIs that do not support aggregation, $select= will return the selected columns without aggregation.

For APIs that support aggregation, the returned rows might be aggregated by default (this option is the default for most APIs that support aggregation). If the API does not aggregate by default, then $select will not aggregate, unless you explicitly set $aggregate.

Note

All REST APIs return quite fast UP TO 1 million rows if queries contain $select=.

(For streaming-like REST API)

To return data QUICKLY

REST API server returns data QUICKLY in the following cases:
  • Database includes maximum of 25,000 rows

  • Database includes maximum of 1 million rows and your query contains $select=

(For streaming-like REST API)

To return an unlimited data SLOW

To return an unlimited data, you are NOT required to force results by adding $select= to the URL.

There is no limit to amount of data that streaming-like REST APIs can return: the bigger query, the longer response time. Processing more than 25,000 rows becomes slow and takes time to complete.

REST API server returns data SLOW in the following cases:
  • Database includes more than 25,000 rows and your query does NOT contain $select=

For the large data sets, do NOT use $orderby= or $top= functions. If the query contains any of them, you will receive an error message like Returned data is too large. Remove the functions that limit the amount of returned results and bring all available data.

The internal mechanism splits results into several database queries, what means that aggregations will be performed for several separate queries, so that in some cases the result might bring multiple redundant rows of the same data. To get the correct aggregation of a large data set, use $top= and add a very big limit (for example, $top=10,000,000).

You can always narrow the focus of your query with $select, $top, or $filter to return results faster.

To display the number of entries (<m:count>)

Add $count=true to the URL (which slightly differs from the standard OData implementation).

To view only specific rows when a condition is true

Use $filter to insert conditions that narrow down the data, to return only entries where those conditions are true.

Create conditions with operators: and, or, eq (equals) gt (greater than), ge (greater than or equal), lt (less than), le (less than or equal), ge (greater than or equal to), ne (not equal to), le (less than or equal to), not and contains. Use operators with parentheses to group conditions logically: .../API_NAME?$filter=(COLUMN1 eq 'value1' or COL2 neq 'val2') and (COL3 gt number) and not (COL4 eq 'val4' or contains(COL5,'val5'))

To limit the timeframe of your query

Add $filter=relative_time() like, .../API_NAME?$filter=relative_time(last_24_hours). Possible values are:

  • last_x_hours (like last_6_hours or last_24_hours): Displays data starting from exactly this time several hours ago, displaying it summarized (aggregated) according to the timeframe view you selected.

  • today: Displays data starting from 00:00AM today according to the time zone of the account.

  • yesterday: Displays data starting from 00:00AM yesterday according to the time zone of the account.

  • last_x_days (like last_7_days): Displays data starting from 00:00AM several days ago in the time zone of the Aternity Management Server, displaying it summarized (aggregated) according to the timeframe view you selected.

  • this_week: Displays data for this week, beginning from 00:00 AM on Sunday morning according to the time zone of the Aternity Management Server, until the current time.

  • last_week: Displays data from 00:00AM on Sunday morning to 23:59 on the most recent Saturday night, displaying it summarized (aggregated) according to the timeframe view you selected.. Use this view to compare consistent weekly results.

Alternatively, you can limit the scope of a query to the period between two static times, by creating a filter of a timeframe greater than or equal to (ge) the start time and less than or equal to (le) the end time.

For example: $filter=((TIMEFRAME gt 2018-06-09T16:00:00+01:00 and TIMEFRAME lt 2018-06-11T18:00:00+01:00) or (TIMEFRAME gt 2018-06-13T16:00:00+01:00 and TIMEFRAME lt 2018-06-15T18:00:00+01:00))

To accumulate detailed data (ETL), like gathering raw measurements for longer than offered by Aternity

Perform an ETL by running a detailed API (like RAW or HOURLY) automatically at regular intervals, like every two hours. Transfer and integrate the new data cumulatively to your target repository.

For VIEWING, use <base_url>/latest/API_NAME; for INTEGRATIONS, use <base_url>/<version number>/API_NAME (for example, <base_url>/v1/API_NAME, or <base_url>/v1.0/API_NAME, or <base_url>/v2/API_NAME, or <base_url>/v2.0/API_NAME).

For example, perform a query like:

<base_url>/v2.0/<API_name>?$filter=((TIMEFRAME gt <last_entry_transferred> and TIMEFRAME lt <two_hours_ago>) where:

  • <last_entry_transferred> is MAX(timeframe) in the target repository.

  • <two_hours_ago> is the current timeframe minus two hours.

Do NOT use relative_time. Ensure the query follows all next page links to retrieve the complete result set for that time period.

To build URL

Build your URL incrementally, adding elements one at a time, then testing it, to locate and fix any syntax errors or typos.

Test your URL

Test your URLs in a browser for faster results. Then, use your OData application to view the data in an easier format.

Handle special characters

Special characters must be encoded in a percent-encoding format, meaning as a character triplet, consisting of the percent character "%" followed by the two hexadecimal digits. For example, %20 is the percent-encoding for the space character, and %27 for the single-apostrophe character. That is the basic URL encoding (https://tools.ietf.org/html/rfc3986#section-2.1) that applies to REST API in general. If you use a browser, Excel or PowerBI, they will automatically encode any special character for you. But if you copy examples from the articles, make sure to manually add the encoding for spaces and apostrophes. Otherwise, your query might fail.

Learn more about the OData parameters mentioned in this article by visiting Microsoft's OData documentation.